unitxt.splitters module

class unitxt.splitters.DiverseLabelsSampler(*argv, **kwargs)

Bases: Sampler

Selects a balanced sample of instances based on an output field.

(used for selecting demonstrations in-context learning)

The field must contain list of values e.g [‘dog’], [‘cat’], [‘dog’,’cat’,’cow’]. The balancing is done such that each value or combination of values appears as equals as possible in the samples.

The choices param is required and determines which values should be considered.

Example

If choices is [‘dog,’cat’] , then the following combinations will be considered. [‘’] [‘cat’] [‘dog’] [‘dog’,’cat’]

If the instance contains a value not in the ‘choice’ param, it is ignored. For example, if choices is [‘dog,’cat’] and the instance field is [‘dog’,’cat’,’cow’], then ‘cow’ is ignored then the instance is considered as [‘dog’,’cat’].

Parameters:
  • extract (sample_size - number of samples to) –

  • on (choices - name of input field that contains the list of values to balance) –

  • balanced (labels - name of output field with labels that must be) –

class unitxt.splitters.RandomSampler(*argv, **kwargs)

Bases: Sampler

class unitxt.splitters.RenameSplits(*argv, **kwargs)

Bases: Splitter

class unitxt.splitters.Sampler(*argv, **kwargs)

Bases: Artifact

random_generator: Random = <random.Random object>
class unitxt.splitters.SeparateSplit(*argv, **kwargs)

Bases: Splitter

Separates a split (e.g. train) into several splits (e.g. train1, train2).

sizes must indicate the size of every split except the last. If no size is give for the last split,

it includes all the examples not allocated to any split.

class unitxt.splitters.SliceSplit(*argv, **kwargs)

Bases: Splitter

class unitxt.splitters.SplitRandomMix(*argv, **kwargs)

Bases: Splitter

class unitxt.splitters.Splitter(*argv, **kwargs)

Bases: MultiStreamOperator

class unitxt.splitters.SpreadSplit(*argv, **kwargs)

Bases: InstanceOperatorWithMultiStreamAccess