unitxt.fusion moduleΒΆ

class unitxt.fusion.BaseFusion(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, subsets: List[unitxt.operator.SourceOperator] | Dict[str, unitxt.operator.SourceOperator] = __required__, include_splits: List[str] | NoneType = None)[source]ΒΆ

Bases: SourceOperator

BaseFusion operator that combines multiple multistreams into one.

Parameters:
  • subsets – a dict of named SourceOperator objects (each to yield a MultiStream) or a list thereof, each is specified along with its input, so can generate a MultiStream

  • include_splits – List of splits to include from each input MultiStream. If None, all splits are included.

class unitxt.fusion.FixedFusion(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, subsets: List[unitxt.operator.SourceOperator] | Dict[str, unitxt.operator.SourceOperator] = __required__, include_splits: List[str] | NoneType = None, max_instances_per_subset: int | NoneType = None)[source]ΒΆ

Bases: BaseFusion

FixedFusion operator that combines multiple multistreams into one, limiting the number of instances taken from each split of each input multistream.

Parameters:
  • subsets – Dict of named SourceOperator objects (each to yield a MultiStream), or a list thereof

  • splits – List of splits (stream_names) to include, over all input multistreams. If None, all splits are included.

  • max_instances_per_subset – Number of instances to take from each input split of each input multistream. If None, all instances of each split (that is specified in include_splits) are included in the result.

class unitxt.fusion.WeightedFusion(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, subsets: Dict[str, unitxt.operator.SourceOperator] | List[unitxt.operator.SourceOperator] = None, include_splits: List[str] | NoneType = None, weights: Dict[str, float | int] | List[int | float] = None, max_total_samples: int = None)[source]ΒΆ

Bases: BaseFusion

Fusion operator that combines multiple MultiStream-s.

Parameters:
  • subsets – Dict of named MultiStream objects, or a list thereof

  • weights – Dict of named weights for each origin, or a list thereof

  • max_total_examples – Total number of instances to return per returned split. If None, all instances are returned