unitxt.augmentors module

class unitxt.augmentors.AugmentPrefixSuffix(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7854ecddc790>, prefixes: Union[List[str], Dict[str, int], NoneType] = {' ': 20, '\\t': 10, '\\n': 40, '': 30}, prefix_len: Union[int, NoneType] = 3, suffixes: Union[List[str], Dict[str, int], NoneType] = {' ': 20, '\\t': 10, '\\n': 40, '': 30}, suffix_len: Union[int, NoneType] = 3, remove_existing_whitespaces: Union[bool, NoneType] = False)[source]

Bases: TextAugmentor

Augments the input by prepending and appending randomly selected patterns (typically, whitespace).

Parameters:
  • prefixes (list or dict or None) – the potential patterns (typically, whitespace) to select prefix from. The dictionary version allows the specification of relative weights for the different patterns. Set to None if not needed (i.e., only suffixes are needed).

  • suffixes (list or dict or None) – the potential patterns (typically, whitespace) to select suffix from. The dictionary version allows the specification of relative weights for the different patterns. Set to None if not needed (i.e., only prefixes are needed).

  • prefix_len (positive int) – the length of the prefix to be added.

  • suffix_len (positive int) – The length of the suffix to be added.

  • remove_existing_whitespaces (bool) – Clean any existing leading and trailing whitespaces. The selected pattern(s) are then prepended and/or appended to the potentially trimmed input.

Examples

To prepend the input with a prefix made of 4 \n-s or \t-s, employ AugmentPrefixSuffix(augment_model_input=True, prefixes=['\n','\t'], prefix_len=4, suffixes = None).

To append the input with a suffix made of 3 \n-s or \t-s, with \n being preferred over \t, at 2:1 ratio, employ AugmentPrefixSuffix(augment_model_input=True, suffixes={'\n':2,'\t':1}, suffix_len=3, prefixes = None) which will append \n-s twice as often as \t-s.

prefixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}
suffixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}
class unitxt.augmentors.AugmentWhitespace(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7854ecddc790>)[source]

Bases: TextAugmentor

Augments the inputs by replacing existing whitespaces with other whitespaces.

Currently, each whitespace is replaced by a random choice of 1-3 whitespace characters (space, tab, newline).

class unitxt.augmentors.Augmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]

Bases: FieldOperator

A stream operator that augments the values of either the task input fields before rendering with the template, or the input passed to the model after rendering of the template.

class unitxt.augmentors.NullAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]

Bases: TaskInputsAugmentor

Does not change the input string.

class unitxt.augmentors.TaskInputsAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]

Bases: Augmentor

class unitxt.augmentors.TextAugmentor(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7854ecddc790>)[source]

Bases: TypeDependentAugmentor

augmented_type()[source]
class unitxt.augmentors.TypeDependentAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = __required__)[source]

Bases: TaskInputsAugmentor