unitxt.standard module

class unitxt.standard.BaseRecipe(data_classification_policy: List[str] = None, max_steps: int | NoneType = None, steps: List[unitxt.operator.StreamingOperator] = [], _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, card: unitxt.card.TaskCard = None, task: unitxt.task.Task = None, template: unitxt.templates.Template | List[unitxt.templates.Template] | unitxt.templates.TemplatesList = None, system_prompt: unitxt.system_prompts.SystemPrompt = None, format: unitxt.formats.Format = None, serializer: unitxt.serializers.SingleTypeSerializer | List[unitxt.serializers.SingleTypeSerializer] = None, template_card_index: int = None, metrics: List[str] = None, postprocessors: List[str] = None, group_by: List[str | List[str]] = [], loader_limit: int = None, max_train_instances: int = None, max_validation_instances: int = None, max_test_instances: int = None, train_refiner: unitxt.operators.StreamRefiner = None, validation_refiner: unitxt.operators.StreamRefiner = None, test_refiner: unitxt.operators.StreamRefiner = None, demos_pool_size: int = None, num_demos: int | List[int] | NoneType = 0, demos_removed_from_data: bool = True, demos_pool_name: str = 'demos_pool', demos_taken_from: str = 'train', demos_field: str = 'demos', sampler: unitxt.splitters.Sampler = None, augmentor: unitxt.augmentors.Augmentor | List[unitxt.augmentors.Augmentor] = None)[source]

Bases: Recipe, SourceSequentialOperator

group_by: List[str | List[str]] = []
property has_custom_demos_pool
property max_demos_size
property use_demos
class unitxt.standard.CreateDemosPool(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, from_split: str = __required__, to_split_names: List[str] = __required__, to_split_sizes: List[int] = __required__, remove_targets_from_source_split: bool = True)[source]

Bases: SeparateSplit

class unitxt.standard.StandardRecipe(data_classification_policy: List[str] = None, max_steps: int | NoneType = None, steps: List[unitxt.operator.StreamingOperator] = [], _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, card: unitxt.card.TaskCard = None, task: unitxt.task.Task = None, template: unitxt.templates.Template | List[unitxt.templates.Template] | unitxt.templates.TemplatesList = None, system_prompt: unitxt.system_prompts.SystemPrompt = None, format: unitxt.formats.Format = None, serializer: unitxt.serializers.SingleTypeSerializer | List[unitxt.serializers.SingleTypeSerializer] = None, template_card_index: int = None, metrics: List[str] = None, postprocessors: List[str] = None, group_by: List[str | List[str]] = [], loader_limit: int = None, max_train_instances: int = None, max_validation_instances: int = None, max_test_instances: int = None, train_refiner: unitxt.operators.StreamRefiner = None, validation_refiner: unitxt.operators.StreamRefiner = None, test_refiner: unitxt.operators.StreamRefiner = None, demos_pool_size: int = None, num_demos: int | List[int] | NoneType = 0, demos_removed_from_data: bool = True, demos_pool_name: str = 'demos_pool', demos_taken_from: str = 'train', demos_field: str = 'demos', sampler: unitxt.splitters.Sampler = None, augmentor: unitxt.augmentors.Augmentor | List[unitxt.augmentors.Augmentor] = None)[source]

Bases: StandardRecipeWithIndexes

This class represents a standard recipe for data processing and preparation.

This class can be used to prepare a recipe. with all necessary steps, refiners and renderers included. It allows to set various parameters and steps in a sequential manner for preparing the recipe.

card

TaskCard object associated with the recipe.

Type:

TaskCard

template

Template object to be used for the recipe.

Type:

Template, optional

system_prompt

SystemPrompt object to be used for the recipe.

Type:

SystemPrompt, optional

loader_limit

Specifies the maximum number of instances per stream to be returned from the loader (used to reduce loading time in large datasets)

Type:

int, optional

format

SystemFormat object to be used for the recipe.

Type:

SystemFormat, optional

metrics

list of catalog metrics to use with this recipe.

Type:

List[str]

postprocessors

list of catalog processors to apply at post processing. (Not recommended to use from here)

Type:

List[str]

group_by

list of task_data or metadata keys to group global scores by.

Type:

List[Union[str, List[str]]]

train_refiner

Train refiner to be used in the recipe.

Type:

StreamRefiner, optional

max_train_instances

Maximum training instances for the refiner.

Type:

int, optional

validation_refiner

Validation refiner to be used in the recipe.

Type:

StreamRefiner, optional

max_validation_instances

Maximum validation instances for the refiner.

Type:

int, optional

test_refiner

Test refiner to be used in the recipe.

Type:

StreamRefiner, optional

max_test_instances

Maximum test instances for the refiner.

Type:

int, optional

demos_pool_size

Size of the demos pool.

Type:

int, optional

num_demos

Number of demos to be used.

Type:

int, optional

demos_pool_name

Name of the demos pool. Default is “demos_pool”.

Type:

str, optional

demos_taken_from

Specifies from where the demos are taken. Default is “train”.

Type:

str, optional

demos_field

Field name for demos. Default is “demos”.

Type:

str, optional

demos_removed_from_data

whether to remove the demos from the source data, Default is True

Type:

bool, optional

sampler

The Sampler used to select the demonstrations when num_demos > 0.

Type:

Sampler, optional

steps

List of StreamingOperator objects to be used in the recipe.

Type:

List[StreamingOperator], optional

augmentor

Augmentor to be used to pseudo randomly augment the source text

Type:

Augmentor

instruction_card_index

Index of instruction card to be used for preparing the recipe.

Type:

int, optional

template_card_index

Index of template card to be used for preparing the recipe.

Type:

int, optional

prepare()[source]

This overridden method is used for preparing the recipe by arranging all the steps, refiners, and renderers in a sequential manner.

Raises:

AssertionError – If both template and template_card_index are specified at the same time.

class unitxt.standard.StandardRecipeWithIndexes(data_classification_policy: List[str] = None, max_steps: int | NoneType = None, steps: List[unitxt.operator.StreamingOperator] = [], _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, card: unitxt.card.TaskCard = None, task: unitxt.task.Task = None, template: unitxt.templates.Template | List[unitxt.templates.Template] | unitxt.templates.TemplatesList = None, system_prompt: unitxt.system_prompts.SystemPrompt = None, format: unitxt.formats.Format = None, serializer: unitxt.serializers.SingleTypeSerializer | List[unitxt.serializers.SingleTypeSerializer] = None, template_card_index: int = None, metrics: List[str] = None, postprocessors: List[str] = None, group_by: List[str | List[str]] = [], loader_limit: int = None, max_train_instances: int = None, max_validation_instances: int = None, max_test_instances: int = None, train_refiner: unitxt.operators.StreamRefiner = None, validation_refiner: unitxt.operators.StreamRefiner = None, test_refiner: unitxt.operators.StreamRefiner = None, demos_pool_size: int = None, num_demos: int | List[int] | NoneType = 0, demos_removed_from_data: bool = True, demos_pool_name: str = 'demos_pool', demos_taken_from: str = 'train', demos_field: str = 'demos', sampler: unitxt.splitters.Sampler = None, augmentor: unitxt.augmentors.Augmentor | List[unitxt.augmentors.Augmentor] = None)[source]

Bases: BaseRecipe