unitxt.fusion moduleΒΆ
- class unitxt.fusion.BaseFusion(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, subsets: List[unitxt.operator.SourceOperator] | Dict[str, unitxt.operator.SourceOperator] = __required__, include_splits: List[str] | NoneType = None)[source]ΒΆ
Bases:
SourceOperator
BaseFusion operator that combines multiple multistreams into one.
- Parameters:
subsets β a dict of named SourceOperator objects (each to yield a MultiStream) or a list thereof, each is specified along with its input, so can generate a MultiStream
include_splits β List of splits to include from each input MultiStream. If None, all splits are included.
- class unitxt.fusion.FixedFusion(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, subsets: List[unitxt.operator.SourceOperator] | Dict[str, unitxt.operator.SourceOperator] = __required__, include_splits: List[str] | NoneType = None, max_instances_per_subset: int | NoneType = None)[source]ΒΆ
Bases:
BaseFusion
FixedFusion operator that combines multiple multistreams into one, limiting the number of instances taken from each split of each input multistream.
- Parameters:
subsets β Dict of named SourceOperator objects (each to yield a MultiStream), or a list thereof
splits β List of splits (stream_names) to include, over all input multistreams. If None, all splits are included.
max_instances_per_subset β Number of instances to take from each input split of each input multistream. If None, all instances of each split (that is specified in include_splits) are included in the result.
- class unitxt.fusion.WeightedFusion(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, subsets: Dict[str, unitxt.operator.SourceOperator] | List[unitxt.operator.SourceOperator] = None, include_splits: List[str] | NoneType = None, weights: Dict[str, float | int] | List[int | float] = None, max_total_samples: int = None)[source]ΒΆ
Bases:
BaseFusion
Fusion operator that combines multiple MultiStream-s.
- Parameters:
subsets β Dict of named MultiStream objects, or a list thereof
weights β Dict of named weights for each origin, or a list thereof
max_total_examples β Total number of instances to return per returned split. If None, all instances are returned