unitxt.splitters module
- class unitxt.splitters.DiverseLabelsSampler(*argv, **kwargs)
Bases:
SamplerSelects a balanced sample of instances based on an output field.
(used for selecting demonstrations in-context learning)
The field must contain list of values e.g [‘dog’], [‘cat’], [‘dog’,’cat’,’cow’]. The balancing is done such that each value or combination of values appears as equals as possible in the samples.
The choices param is required and determines which values should be considered.
Example
If choices is [‘dog,’cat’] , then the following combinations will be considered. [‘’] [‘cat’] [‘dog’] [‘dog’,’cat’]
If the instance contains a value not in the ‘choice’ param, it is ignored. For example, if choices is [‘dog,’cat’] and the instance field is [‘dog’,’cat’,’cow’], then ‘cow’ is ignored then the instance is considered as [‘dog’,’cat’].
- Parameters:
extract (sample_size - number of samples to) –
on (choices - name of input field that contains the list of values to balance) –
balanced (labels - name of output field with labels that must be) –
- class unitxt.splitters.Sampler(*argv, **kwargs)
Bases:
Artifact- random_generator: Random = <random.Random object>
- class unitxt.splitters.SeparateSplit(*argv, **kwargs)
Bases:
SplitterSeparates a split (e.g. train) into several splits (e.g. train1, train2).
- sizes must indicate the size of every split except the last. If no size is give for the last split,
it includes all the examples not allocated to any split.
- class unitxt.splitters.Splitter(*argv, **kwargs)
Bases:
MultiStreamOperator
- class unitxt.splitters.SpreadSplit(*argv, **kwargs)