unitxt.templates module¶

class unitxt.templates.ApplyRandomTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, demos_field: str | NoneType = None, templates: List[unitxt.templates.Template] = __required__)[source]¶: Bases: ApplyTemplate

class unitxt.templates.ApplySingleTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, demos_field: str | NoneType = None, template: unitxt.templates.Template = __required__)[source]¶: Bases: ApplyTemplate

class unitxt.templates.ApplyTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, demos_field: str | NoneType = None)[source]¶: Bases: InstanceOperator

class unitxt.templates.DialogFieldsData(data_classification_policy: List[str] = None, user_role_label: str = __required__, assistant_role_label: str = __required__, system_role_label: str = __required__, dialog_field: str = __required__)[source]¶: Bases: Artifact

class unitxt.templates.DialogPairwiseChoiceTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None, input_format: str = __required__, choice_a_field: str = __required__, choice_b_field: str = __required__, answer_field: str = __required__, choice_a_label: str = __required__, choice_b_label: str = __required__, choice_tie_label: str = __required__, shuffle: bool = __required__, dialog_fields: List[unitxt.templates.DialogFieldsData] = __required__, turns_separator: str = '\n\n', label_separator: str = ' ')[source]¶: Bases: DialogTemplate, PairwiseChoiceTemplate

class unitxt.templates.DialogTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None, input_format: str = __required__, dialog_fields: List[unitxt.templates.DialogFieldsData] = __required__, turns_separator: str = '\n\n', label_separator: str = ' ')[source]¶: Bases: InputOutputTemplate

class unitxt.templates.InputFormatTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, input_format: str = __required__)[source]¶: Bases: Template

class unitxt.templates.InputOutputTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None, input_format: str = __required__)[source]¶

Bases: InputFormatTemplate, OutputFormatTemplate

Generate field ‘source’ from fields designated as input, and fields ‘target’ and ‘references’ from fields designated as output, of the processed instance.

Args specify the formatting strings with which to glue together the input and reference fields of the processed instance into one string (‘source’ and ‘target’), and into a list of strings (‘references’).

class unitxt.templates.InputOutputTemplateWithCustomTarget(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None, input_format: str = __required__, reference: str = __required__)[source]¶: Bases: InputOutputTemplate

class unitxt.templates.JsonOutputFormatTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_fields: Dict[str, str] = __required__, wrap_with_list_fields: List[str] = __required__)[source]¶: Bases: Template

class unitxt.templates.JsonOutputTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_fields: Dict[str, str] = __required__, wrap_with_list_fields: List[str] = __required__, input_format: str = __required__)[source]¶

Bases: InputFormatTemplate, JsonOutputFormatTemplate

Generate field ‘source’ from fields designated as input, and fields ‘target’ and ‘references’ from fields designated as output, of the processed instance.

Args specify the formatting strings with which to glue together the input and reference fields of the processed instance into one string (‘source’ and ‘target’), and into a list of strings (‘references’).

class unitxt.templates.KeyValTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, pairs_separator: str = ', ', key_val_separator: str = ': ', use_keys_for_inputs: bool = True, outputs_key_val_separator: str = ': ', use_keys_for_outputs: bool = False)[source]¶

Bases: Template

Generate field ‘source’ from fields designated as input, and fields ‘target’ and ‘references’ from fields designated as output, of the processed instance.

Args specify with what separators to glue together the input and output designated fields of the processed instance into one string (‘source’ and ‘target’), and into a list of strings (‘references’).

class unitxt.templates.MultiLabelTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_list_by_comma'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = '{labels}', input_format: str = __required__, labels_field: str = 'labels', labels_separator: str = ', ', empty_label: str = 'None')[source]¶

Bases: InputOutputTemplate

postprocessors: List[str] = ['processors.to_list_by_comma']¶

class unitxt.templates.MultiReferenceTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None, input_format: str = __required__, references_field: str = 'references', random_reference: bool = False)[source]¶: Bases: InputOutputTemplate

class unitxt.templates.MultiTurnTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None, input_format: str = '', references_field: str = 'references', random_reference: bool = False, turns_field: str = __required__)[source]¶: Bases: MultiReferenceTemplate

class unitxt.templates.MultipleChoiceTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, input_format: str = __required__, choices_field: str = 'choices', target_field: str = 'label', choices_separator: str = ', ', source_choice_format: str = '{choice_numeral}. {choice_text}', target_choice_format: str = '{choice_numeral}', enumerator: str = 'capitals', shuffle_choices: bool = False, shuffle_choices_seed: int = None, sort_choices_by_length: bool = False, sort_choices_alphabetically: bool = False, reverse_choices: bool = False, place_correct_choice_position: int = None)[source]¶

Bases: InputFormatTemplate

Formats the input that specifies a multiple-choice question, with a list of possible answers to choose from, and identifies the correct answer.

Parameters:

target_prefix (str) – Optional prefix that can be added before the target label in generated prompts or outputs.
choices_field (str) – The key under which the multiple choices are stored in the input and reference dictionaries.
target_field (str) – The key under which the correct choice is stored in the reference dictionary (can be integer index or textual label).
choices_separator (str) – A string used to join formatted choices (e.g. “, “).
source_choice_format (str) – A Python format string used for displaying each choice in the input fields (e.g. “{choice_numeral}. {choice_text}”).
target_choice_format (str) – A Python format string used for displaying each choice in the target or final output (e.g. “{choice_numeral}”).
enumerator (str) – Determines how choice numerals are enumerated. Possible values include “capitals”, “lowercase”, “numbers”, or “roman”.
shuffle_choices (bool) – If True, shuffle the choices. The shuffling seed can be set with shuffle_choices_seed.
shuffle_choices_seed (int, optional) – If provided, the choices are shuffled with this fixed integer seed for reproducibility.
sort_choices_by_length (bool) – If True, sorts choices by their length (ascending).
sort_choices_alphabetically (bool) – If True, sorts choices in alphabetical order.
reverse_choices (bool) – If True, reverses the order of the choices after any sorting has been applied. Defaults to False to preserve backward compatibility.

class unitxt.templates.NullTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = [], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None)[source]¶

Bases: Template

Templates that returns empty prompt and no references.

postprocessors: List[str] = []¶

class unitxt.templates.OutputFormatTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None)[source]¶: Bases: Template

class unitxt.templates.OutputQuantizingTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.MultiTypeSerializer = None, output_format: str = None, input_format: str = __required__, quantum: float | int = 0.1)[source]¶: Bases: InputOutputTemplate

class unitxt.templates.PairwiseChoiceTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None, input_format: str = __required__, choice_a_field: str = __required__, choice_b_field: str = __required__, answer_field: str = __required__, choice_a_label: str = __required__, choice_b_label: str = __required__, choice_tie_label: str = __required__, shuffle: bool = __required__)[source]¶

Bases: InputOutputTemplate

PairwiseChoiceTemplate.

Requirements:: The answer field value should be of type Literal[“choice_a”, “choice_b”, “tie”]

Parameters:

choice_a_field (str) – The field which contains choice_a value
choice_b_field (str) – The field which contains choice_b value
answer_field (str) – The field which contains the answer value. Should be of type Literal[“choice_1”, “choice_2”, “tie”]
choice_a_label (str) – The label of choice A answer as it is verbalized in the template.
choice_b_label (str) – The label of choice B answer as it is verbalized in the template.
choice_tie_label (str) – The label of a tie answer as it should be verbalized in the template.
shuffle (bool) – whether to shuffle the choices or not. This is done to take into account position bias.

shuffle: 50% of the time:

The values of choice_a_field and choice_b_field will be swapped.
If the values of answer_field is choice_a_label, set it to choice_b_label. Else if the values of answer_field is choice_b_label, set it to choice_a_label. Else if the value of answer_field is choice_tie_label, do nothing.

class unitxt.templates.PairwiseComparativeRatingTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = None, input_format: str = __required__, choice_a_field: str = __required__, choice_b_field: str = __required__, choice_a_id_field: str = __required__, choice_b_id_field: str = __required__, answer_field: str = __required__, shuffle: bool = __required__)[source]¶

Bases: InputOutputTemplate

PairwiseChoiceTemplate.

Parameters:

choice_a_field (str) – The field which contains choice_a value
choice_b_field (str) – The field which contains choice_b value
answer_field (str) – The field which contains the answer value. The value should be an int.
choice_a (Positive for preferring) –
choice_b (and negative for preferring) –
shuffle (bool) – whether to shuffle the choices or not. This is done to take into account position bias.

shuffle: 50% of the time: | 1) The values of choice_a_field and choice_b_field will be swapped. | 2) Replace the values of answer_field with its mapped value according to the reverse_preference_map Dict.

class unitxt.templates.SpanLabelingBaseTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_list_by_comma'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = '{labels}', input_format: str = __required__, labels_field: str = 'labels', labels_separator: str = ', ', empty_label: str = 'None', spans_starts_field: str = 'spans_starts', spans_ends_field: str = 'spans_ends', text_field: str = 'text', labels_support: list = None)[source]¶: Bases: MultiLabelTemplate

class unitxt.templates.SpanLabelingJsonTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.load_json', 'processors.dict_of_lists_to_value_key_pairs'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = '{labels}', input_format: str = __required__, labels_field: str = 'labels', labels_separator: str = ', ', empty_label: str = 'None', spans_starts_field: str = 'spans_starts', spans_ends_field: str = 'spans_ends', text_field: str = 'text', labels_support: list = None)[source]¶

Bases: SpanLabelingBaseTemplate

postprocessors: List[str] = ['processors.load_json', 'processors.dict_of_lists_to_value_key_pairs']¶

class unitxt.templates.SpanLabelingTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_span_label_pairs'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, output_format: str = '{labels}', input_format: str = __required__, labels_field: str = 'labels', labels_separator: str = ', ', empty_label: str = 'None', spans_starts_field: str = 'spans_starts', spans_ends_field: str = 'spans_ends', text_field: str = 'text', labels_support: list = None, span_label_format: str = '{span}: {label}', escape_characters: List[str] = [':', ','])[source]¶

Bases: SpanLabelingBaseTemplate

escape_characters: List[str] = [':', ',']¶

postprocessors: List[str] = ['processors.to_span_label_pairs']¶

class unitxt.templates.Template(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None)[source]¶

Bases: InstanceOperator

The role of template is to take the fields of every instance and verbalize it.

Meaning the template is taking the instance and generating source, target and references.

Parameters:

skip_rendered_instance (bool) – if “source”, “target”, and “references” are already defined fields in the instance, skip its processing
postprocessors – a list of strings being artifact names of text processors, to be applied on the model output
instruction – a formatting string that yields an instruction with potential participation of values from the “input_fields” part of the instance
target_prefix – a string to be used to format the prompt. Not a formatting string.

exception unitxt.templates.TemplateFormatKeyError(template, data, data_type, format_str, format_name)[source]¶: Bases: UnitxtError

class unitxt.templates.TemplatesDict(data_classification_policy: List[str] = None, items: Dict[str, unitxt.artifact.Artifact] = {})[source]¶: Bases: DictCollection

class unitxt.templates.TemplatesList(data_classification_policy: List[str] = None, items: List[unitxt.artifact.Artifact] = [])[source]¶: Bases: ListCollection

class unitxt.templates.YesNoTemplate(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, skip_rendered_instance: bool = True, postprocessors: List[str] = ['processors.to_string_stripped'], instruction: str = '', target_prefix: str = '', title_fields: List[str] = [], serializer: unitxt.serializers.Serializer = None, input_format: str = None, class_field: str = None, label_field: str = None, yes_answer: str = 'Yes', no_answer: str = 'No')[source]¶

Bases: InputFormatTemplate

A template for generating binary Yes/No questions asking whether an input text is of a specific class.

Parameters:

input_format – Defines the format of the question.
class_field – Defines the field that contains the name of the class that this template asks of.
label_field – Defines the field which contains the true label of the input text. If a gold label is equal to the value in class_name, then the correct output is self.yes_answer (by default, “Yes”). Otherwise the correct output is self.no_answer (by default, “No”).
yes_answer – The output value for when the gold label equals self.class_name. Defaults to “Yes”.
no_answer – The output value for when the gold label differs from self.class_name. Defaults to “No”.

unitxt.templates.escape_chars(s, chars_to_escape)[source]¶

unitxt.templates.random() → x in the interval [0, 1).¶