unitxt.standard module

class unitxt.standard.AddDemosPool(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, demos_pool: List[Dict[str, Any]] = __required__, demos_pool_field_name: str = '_demos_pool_')[source]

Bases: MultiStreamOperator

class unitxt.standard.CreateDemosPool(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, from_stream: str = None, demos_pool_size: int = None, demos_removed_from_data: bool = None, to_field: str = '_demos_pool_')[source]

Bases: MultiStreamOperator

class unitxt.standard.DatasetRecipe(data_classification_policy: List[str] = None, max_steps: int | NoneType = None, steps: List[unitxt.operator.StreamingOperator] = [], _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, card: unitxt.card.TaskCard = None, task: unitxt.task.Task = None, template: unitxt.templates.Template | List[unitxt.templates.Template] | unitxt.templates.TemplatesList = None, system_prompt: unitxt.system_prompts.SystemPrompt = None, format: unitxt.formats.Format = None, serializer: unitxt.serializers.SingleTypeSerializer | List[unitxt.serializers.SingleTypeSerializer] = None, template_card_index: int = None, metrics: List[str] = None, postprocessors: List[str] = None, group_by: List[str | List[str]] = [], loader_limit: int = None, max_train_instances: int = None, max_validation_instances: int = None, max_test_instances: int = None, train_refiner: unitxt.operators.StreamRefiner = None, validation_refiner: unitxt.operators.StreamRefiner = None, test_refiner: unitxt.operators.StreamRefiner = None, demos_pool_size: int = None, demos_pool: List[Dict[str, Any]] = None, num_demos: int | List[int] | NoneType = 0, demos_removed_from_data: bool = True, demos_pool_field_name: str = '_demos_pool_', demos_taken_from: str = 'train', demos_field: str = 'demos', sampler: unitxt.splitters.Sampler = None, skip_demoed_instances: bool = False, augmentor: unitxt.augmentors.Augmentor | List[unitxt.augmentors.Augmentor] = None)[source]

Bases: SourceSequentialOperator

This class represents a standard recipe for data processing and preparation.

This class can be used to prepare a recipe. with all necessary steps, refiners and renderers included. It allows to set various parameters and steps in a sequential manner for preparing the recipe.

Parameters:
  • card (TaskCard) – TaskCard object associated with the recipe.

  • template (Template, optional) – Template object to be used for the recipe.

  • system_prompt (SystemPrompt, optional) – SystemPrompt object to be used for the recipe.

  • loader_limit (int, optional) – Specifies the maximum number of instances per stream to be returned from the loader (used to reduce loading time in large datasets)

  • format (SystemFormat, optional) – SystemFormat object to be used for the recipe.

  • metrics (List[str]) – list of catalog metrics to use with this recipe.

  • postprocessors (List[str]) – list of catalog processors to apply at post processing. (Not recommended to use from here)

  • group_by (List[Union[str, List[str]]]) – list of task_data or metadata keys to group global scores by.

  • train_refiner (StreamRefiner, optional) – Train refiner to be used in the recipe.

  • max_train_instances (int, optional) – Maximum training instances for the refiner.

  • validation_refiner (StreamRefiner, optional) – Validation refiner to be used in the recipe.

  • max_validation_instances (int, optional) – Maximum validation instances for the refiner.

  • test_refiner (StreamRefiner, optional) – Test refiner to be used in the recipe.

  • max_test_instances (int, optional) – Maximum test instances for the refiner.

  • demos_pool_size (int, optional) – Size of the demos pool. -1 for taking the whole of stream ‘demos_taken_from’.

  • demos_pool (List[Dict[str, Any]], optional) – a list of instances to make the demos_pool

  • num_demos (int, optional) – Number of demos to add to each instance, to become part of the source to be generated for this instance.

  • demos_taken_from (str, optional) – Specifies the stream from where the demos are taken. Default is “train”.

  • demos_field (str, optional) – Field name for demos. Default is “demos”. The num_demos demos selected for an instance are stored in this field of that instance.

  • demos_pool_field_name (str, optional) – field name to maintain the demos_pool, until sampled from, in order to make the demos. Defaults to constants.demos_pool_field.

  • demos_removed_from_data (bool, optional) – whether to remove the demos taken to demos_pool from the source data, Default is True

  • sampler (Sampler, optional) – The Sampler used to select the demonstrations when num_demos > 0.

  • skip_demoed_instances (bool, optional) – whether to skip pushing demos to an instance whose demos_field is already populated. Defaults to False.

  • steps (List[StreamingOperator], optional) – List of StreamingOperator objects to be used in the recipe.

  • augmentor (Augmentor) – Augmentor to be used to pseudo randomly augment the source text

  • instruction_card_index (int, optional) – Index of instruction card to be used for preparing the recipe.

  • template_card_index (int, optional) – Index of template card to be used for preparing the recipe.

prepare()[source]

This overridden method is used for preparing the recipe by arranging all the steps, refiners, and renderers in a sequential manner.

Raises:

AssertionError – If both template and template_card_index are specified at the same time.

group_by: List[str | List[str]] = []
property has_card_templates
property has_custom_demos_pool
property has_no_templates
property max_demos_size
produce(task_instances)[source]

Use the recipe in production to produce model ready query from standard task instance.

property use_demos