unitxt.augmentors module¶
- class unitxt.augmentors.AugmentPrefixSuffix(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None, caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | None = None, to_field: str | None = None, field_to_field: List[List[str]] | Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, prefixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}, prefix_len: int | None = 3, suffixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}, suffix_len: int | None = 3, remove_existing_whitespaces: bool | None = False)¶
Bases:
FieldOperatorAugments the input by prepending and appending randomly selected (typically, whitespace) patterns.
- Parameters:
prefixes (list or dict) – the potential (typically, whitespace) patterns to select from. The dictionary version allows the specification relative weights for the different patterns.
suffixes (list or dict) – the potential (typically, whitespace) patterns to select from. The dictionary version allows the specification relative weights for the different patterns.
prefix_len (positive int) – The added prefix or suffix will be of a certain length.
suffix_len (positive int) – The added prefix or suffix will be of a certain length.
remove_existing_whitespaces – Clean any existing leading and trailing whitespaces. The strings made of repetitions of the selected pattern(s) are then prepended and/or appended to the potentially trimmed input.
needed (If only either just prefixes or just suffixes are) –
None. (set the other to) –
Examples
To prepend the input with a prefix made of 4 ‘n’-s or ‘t’-s, employ AugmentPrefixSuffix(augment_model_input=True, prefixes=[’n’,’t’], prefix_len=4, suffixes = None) To append the input with a suffix made of 3 ‘n’-s or ‘t’-s, with triple ‘n’ suffixes being preferred over triple ‘t’, at 2:1 ratio, employ AugmentPrefixSuffix(augment_model_input=True, suffixes={’n’:2,’t’:1}, suffix_len=3, prefixes = None) which will append ‘n’-s twice as often as ‘t’-s.
- prefixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}¶
- suffixes: List[str] | Dict[str, int] | None = {'': 30, ' ': 20, '\\n': 40, '\\t': 10}¶
- class unitxt.augmentors.AugmentWhitespace(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None, caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | None = None, to_field: str | None = None, field_to_field: List[List[str]] | Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)¶
Bases:
FieldOperatorAugments the inputs by replacing existing whitespaces with other whitespaces.
Currently, each whitespace is replaced by a random choice of 1-3 whitespace characters (space, tab, newline).
- class unitxt.augmentors.Augmentor(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, field: str | None = None, to_field: str | None = None, field_to_field: ~typing.List[~typing.List[str]] | ~typing.Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = False, get_default: ~typing.Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, operator: ~unitxt.operators.FieldOperator)¶
Bases:
FieldOperatorA stream operator that augments the values of either the task input fields before rendering with the template, or the input passed to the model after rendering of the template.
- class unitxt.augmentors.FinalStateInputsAugmentor(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, field: str | None = None, to_field: str | None = None, field_to_field: ~typing.List[~typing.List[str]] | ~typing.Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = False, get_default: ~typing.Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, operator: ~unitxt.operators.FieldOperator)¶
Bases:
Augmentor
- class unitxt.augmentors.Identity(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None, caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | None = None, to_field: str | None = None, field_to_field: List[List[str]] | Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)¶
Bases:
FieldOperator
- class unitxt.augmentors.ImagesAugmentor(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, field: str | None = 'media/images', to_field: str | None = None, field_to_field: ~typing.List[~typing.List[str]] | ~typing.Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = True, get_default: ~typing.Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, operator: ~unitxt.operators.FieldOperator)¶
Bases:
FinalStateInputsAugmentor
- class unitxt.augmentors.ModelInputAugmentor(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, field: str | None = 'source', to_field: str | None = None, field_to_field: ~typing.List[~typing.List[str]] | ~typing.Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = False, get_default: ~typing.Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, operator: ~unitxt.operators.FieldOperator)¶
Bases:
FinalStateInputsAugmentor
- class unitxt.augmentors.NullAugmentor(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None, caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | None = None, to_field: str | None = None, field_to_field: List[List[str]] | Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, operator: FieldOperator = Identity(__type__='identity', __description__=None, __tags__={}, __id__=None, data_classification_policy=None, _requirements_list=[], caching=None, apply_to_streams=None, dont_apply_to_streams=None, field=None, to_field=None, field_to_field=None, use_query=None, process_every_value=False, get_default=None, not_exist_ok=False, not_exist_do_nothing=False))¶
Bases:
AugmentorDoes not change the input string.
- operator: FieldOperator = Identity(__type__='identity', __description__=None, __tags__={}, __id__=None, data_classification_policy=None, _requirements_list=[], caching=None, apply_to_streams=None, dont_apply_to_streams=None, field=None, to_field=None, field_to_field=None, use_query=None, process_every_value=False, get_default=None, not_exist_ok=False, not_exist_do_nothing=False)¶
- class unitxt.augmentors.TaskInputsAugmentor(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, field: str | None = None, to_field: str | None = None, field_to_field: ~typing.List[~typing.List[str]] | ~typing.Dict[str, str] | None = None, use_query: bool | None = None, process_every_value: bool = False, get_default: ~typing.Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, operator: ~unitxt.operators.FieldOperator)¶
Bases:
Augmentor