unitxt.fusion module¶
- class unitxt.fusion.BaseFusion(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, caching: bool = None, origins: ~typing.List[~unitxt.operator.SourceOperator] | ~typing.Dict[str, ~unitxt.operator.SourceOperator], include_splits: ~typing.List[str] | None = None)¶
Bases:
SourceOperator
BaseFusion operator that combines multiple multistreams into one.
- Parameters:
origins – a dict of named SourceOperator objects (each to yield a MultiStream) or a list thereof, each is specified along with its input, so can generate a MultiStream
include_splits – List of splits to include from each input MultiStream. If None, all splits are included.
- class unitxt.fusion.FixedFusion(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, caching: bool = None, origins: ~typing.List[~unitxt.operator.SourceOperator] | ~typing.Dict[str, ~unitxt.operator.SourceOperator], include_splits: ~typing.List[str] | None = None, max_instances_per_origin_split: int | None = None)¶
Bases:
BaseFusion
FixedFusion operator that combines multiple multistreams into one, limiting the number of instances taken from each split of each input multistream.
- Parameters:
origins – Dict of named SourceOperator objects (each to yield a MultiStream), or a list thereof
splits – List of splits (stream_names) to include, over all input multistreams. If None, all splits are included.
max_instances_per_origin_split – Number of instances to take from each input split of each input multistream. If None, all instances of each split (that is specified in include_splits) are included in the result.
- class unitxt.fusion.WeightedFusion(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None, caching: bool = None, origins: Dict[str, SourceOperator] | List[SourceOperator] = None, include_splits: List[str] | None = None, weights: Dict[str, float | int] | List[int | float] = None, max_total_examples: int = None, ignore_origin_groups: List[str] = ['unitxt'])¶
Bases:
BaseFusion
Fusion operator that combines multiple MultiStream-s.
- Parameters:
origins – Dict of named MultiStream objects, or a list thereof
weights – Dict of named weights for each origin, or a list thereof
max_total_examples – Total number of instances to return per returned split. If None, all instances are returned
- ignore_origin_groups: List[str] = ['unitxt']¶