unitxt.augmentors module¶
- class unitxt.augmentors.AugmentPrefixSuffix(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7854ecddc790>, prefixes: Union[List[str], Dict[str, int], NoneType] = {' ': 20, '\\t': 10, '\\n': 40, '': 30}, prefix_len: Union[int, NoneType] = 3, suffixes: Union[List[str], Dict[str, int], NoneType] = {' ': 20, '\\t': 10, '\\n': 40, '': 30}, suffix_len: Union[int, NoneType] = 3, remove_existing_whitespaces: Union[bool, NoneType] = False)[source]¶
Bases:
TextAugmentor
Augments the input by prepending and appending randomly selected patterns (typically, whitespace).
- Parameters:
prefixes (list or dict or None) – the potential patterns (typically, whitespace) to select prefix from. The dictionary version allows the specification of relative weights for the different patterns. Set to None if not needed (i.e., only suffixes are needed).
suffixes (list or dict or None) – the potential patterns (typically, whitespace) to select suffix from. The dictionary version allows the specification of relative weights for the different patterns. Set to None if not needed (i.e., only prefixes are needed).
prefix_len (positive int) – the length of the prefix to be added.
suffix_len (positive int) – The length of the suffix to be added.
remove_existing_whitespaces (bool) – Clean any existing leading and trailing whitespaces. The selected pattern(s) are then prepended and/or appended to the potentially trimmed input.
Examples
To prepend the input with a prefix made of 4
\n
-s or\t
-s, employAugmentPrefixSuffix(augment_model_input=True, prefixes=['\n','\t'], prefix_len=4, suffixes = None)
.To append the input with a suffix made of 3
\n
-s or\t
-s, with\n
being preferred over\t
, at 2:1 ratio, employAugmentPrefixSuffix(augment_model_input=True, suffixes={'\n':2,'\t':1}, suffix_len=3, prefixes = None)
which will append\n
-s twice as often as\t
-s.- prefixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}¶
- suffixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}¶
- class unitxt.augmentors.AugmentWhitespace(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7854ecddc790>)[source]¶
Bases:
TextAugmentor
Augments the inputs by replacing existing whitespaces with other whitespaces.
Currently, each whitespace is replaced by a random choice of 1-3 whitespace characters (space, tab, newline).
- class unitxt.augmentors.Augmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
A stream operator that augments the values of either the task input fields before rendering with the template, or the input passed to the model after rendering of the template.
- class unitxt.augmentors.NullAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
TaskInputsAugmentor
Does not change the input string.
- class unitxt.augmentors.TaskInputsAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
Augmentor
- class unitxt.augmentors.TextAugmentor(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7854ecddc790>)[source]¶
Bases:
TypeDependentAugmentor
- class unitxt.augmentors.TypeDependentAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = __required__)[source]¶
Bases:
TaskInputsAugmentor