unitxt.augmentors module

class unitxt.augmentors.AugmentPrefixSuffix(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7f635af2a9d0>, prefixes: Union[List[str], Dict[str, int], NoneType] = {' ': 20, '\\t': 10, '\\n': 40, '': 30}, prefix_len: Union[int, NoneType] = 3, suffixes: Union[List[str], Dict[str, int], NoneType] = {' ': 20, '\\t': 10, '\\n': 40, '': 30}, suffix_len: Union[int, NoneType] = 3, remove_existing_whitespaces: Union[bool, NoneType] = False)[source]

Bases: TextAugmentor

Augments the input by prepending and appending randomly selected (typically, whitespace) patterns.

Parameters:
  • prefixes (list or dict) – the potential (typically, whitespace) patterns to select from. The dictionary version allows the specification relative weights for the different patterns.

  • suffixes (list or dict) – the potential (typically, whitespace) patterns to select from. The dictionary version allows the specification relative weights for the different patterns.

  • prefix_len (positive int) – The added prefix or suffix will be of a certain length.

  • suffix_len (positive int) – The added prefix or suffix will be of a certain length.

  • remove_existing_whitespaces – Clean any existing leading and trailing whitespaces. The strings made of repetitions of the selected pattern(s) are then prepended and/or appended to the potentially trimmed input.

  • needed (If only either just prefixes or just suffixes are) –

  • None. (set the other to) –

Examples

To prepend the input with a prefix made of 4 ‘n’-s or ‘t’-s, employ AugmentPrefixSuffix(augment_model_input=True, prefixes=[’n’,’t’], prefix_len=4, suffixes = None) To append the input with a suffix made of 3 ‘n’-s or ‘t’-s, with triple ‘n’ suffixes being preferred over triple ‘t’, at 2:1 ratio, employ AugmentPrefixSuffix(augment_model_input=True, suffixes={’n’:2,’t’:1}, suffix_len=3, prefixes = None) which will append ‘n’-s twice as often as ‘t’-s.

prefixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}
suffixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}
class unitxt.augmentors.AugmentWhitespace(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7f635af2a9d0>)[source]

Bases: TextAugmentor

Augments the inputs by replacing existing whitespaces with other whitespaces.

Currently, each whitespace is replaced by a random choice of 1-3 whitespace characters (space, tab, newline).

class unitxt.augmentors.Augmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]

Bases: FieldOperator

A stream operator that augments the values of either the task input fields before rendering with the template, or the input passed to the model after rendering of the template.

class unitxt.augmentors.NullAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]

Bases: Augmentor

Does not change the input string.

class unitxt.augmentors.TaskInputsAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]

Bases: Augmentor

class unitxt.augmentors.TextAugmentor(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7f635af2a9d0>)[source]

Bases: TypeDependentAugmentor

augmented_type()[source]
class unitxt.augmentors.TypeDependentAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = __required__)[source]

Bases: TaskInputsAugmentor