unitxt.augmentors module¶
- class unitxt.augmentors.AugmentPrefixSuffix(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7f635af2a9d0>, prefixes: Union[List[str], Dict[str, int], NoneType] = {' ': 20, '\\t': 10, '\\n': 40, '': 30}, prefix_len: Union[int, NoneType] = 3, suffixes: Union[List[str], Dict[str, int], NoneType] = {' ': 20, '\\t': 10, '\\n': 40, '': 30}, suffix_len: Union[int, NoneType] = 3, remove_existing_whitespaces: Union[bool, NoneType] = False)[source]¶
Bases:
TextAugmentor
Augments the input by prepending and appending randomly selected (typically, whitespace) patterns.
- Parameters:
prefixes (list or dict) – the potential (typically, whitespace) patterns to select from. The dictionary version allows the specification relative weights for the different patterns.
suffixes (list or dict) – the potential (typically, whitespace) patterns to select from. The dictionary version allows the specification relative weights for the different patterns.
prefix_len (positive int) – The added prefix or suffix will be of a certain length.
suffix_len (positive int) – The added prefix or suffix will be of a certain length.
remove_existing_whitespaces – Clean any existing leading and trailing whitespaces. The strings made of repetitions of the selected pattern(s) are then prepended and/or appended to the potentially trimmed input.
needed (If only either just prefixes or just suffixes are) –
None. (set the other to) –
Examples
To prepend the input with a prefix made of 4 ‘n’-s or ‘t’-s, employ AugmentPrefixSuffix(augment_model_input=True, prefixes=[’n’,’t’], prefix_len=4, suffixes = None) To append the input with a suffix made of 3 ‘n’-s or ‘t’-s, with triple ‘n’ suffixes being preferred over triple ‘t’, at 2:1 ratio, employ AugmentPrefixSuffix(augment_model_input=True, suffixes={’n’:2,’t’:1}, suffix_len=3, prefixes = None) which will append ‘n’-s twice as often as ‘t’-s.
- prefixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}¶
- suffixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}¶
- class unitxt.augmentors.AugmentWhitespace(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7f635af2a9d0>)[source]¶
Bases:
TextAugmentor
Augments the inputs by replacing existing whitespaces with other whitespaces.
Currently, each whitespace is replaced by a random choice of 1-3 whitespace characters (space, tab, newline).
- class unitxt.augmentors.Augmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
A stream operator that augments the values of either the task input fields before rendering with the template, or the input passed to the model after rendering of the template.
- class unitxt.augmentors.NullAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
Augmentor
Does not change the input string.
- class unitxt.augmentors.TaskInputsAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
Augmentor
- class unitxt.augmentors.TextAugmentor(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <function NewType.<locals>.new_type at 0x7f635af2a9d0>)[source]¶
Bases:
TypeDependentAugmentor
- class unitxt.augmentors.TypeDependentAugmentor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = __required__)[source]¶
Bases:
TaskInputsAugmentor