πŸ“„ Normalized SacrebleuΒΆ

metrics.normalized_sacrebleu

MetricPipeline(
    main_score="sacrebleu",
    prediction_type="str",
    preprocess_steps=[
        Copy(
            field="task_data/target_language",
            to_field="task_data/tokenize",
            not_exist_ok=True,
            get_default="en",
        ),
        Lower(
            field="task_data/tokenize",
        ),
        MapInstanceValues(
            mappers={
                "task_data/tokenize": {
                    "german": None,
                    "deutch": None,
                    "de": None,
                    "french": None,
                    "fr": None,
                    "romanian": None,
                    "ro": None,
                    "english": None,
                    "en": None,
                    "spanish": None,
                    "es": None,
                    "portuguese": None,
                    "pt": None,
                    "arabic": "intl",
                    "ar": "intl",
                    "korean": "ko-mecab",
                    "ko": "ko-mecab",
                    "japanese": "ja-mecab",
                    "ja": "ja-mecab",
                },
            },
            strict=True,
        ),
    ],
    metric=NormalizedSacrebleu(),
)
[source]

from unitxt.metrics import NormalizedSacrebleu
from unitxt.operators import Copy, MapInstanceValues
from unitxt.processors import Lower

Explanation about CopyΒΆ

Copies values from specified fields to specified fields.

Args (of parent class):

field_to_field (Union[List[List], Dict[str, str]]): A list of lists, where each sublist contains the source field and the destination field, or a dictionary mapping source fields to destination fields.

Examples:

An input instance {β€œa”: 2, β€œb”: 3}, when processed by Copy(field_to_field={"a": "b"}) would yield {β€œa”: 2, β€œb”: 2}, and when processed by Copy(field_to_field={"a": "c"}) would yield {β€œa”: 2, β€œb”: 3, β€œc”: 2}

with field names containing / , we can also copy inside the field: Copy(field="a/0",to_field="a") would process instance {β€œa”: [1, 3]} into {β€œa”: 1}

Explanation about MapInstanceValuesΒΆ

A class used to map instance values into other values.

This class is a type of InstanceOperator, it maps values of instances in a stream using predefined mappers.

Args:
mappers (Dict[str, Dict[str, Any]]):

The mappers to use for mapping instance values. Keys are the names of the fields to undergo mapping, and values are dictionaries that define the mapping from old values to new values. Note that mapped values are defined by their string representation, so mapped values are converted to strings before being looked up in the mappers.

strict (bool):

If True, the mapping is applied strictly. That means if a value does not exist in the mapper, it will raise a KeyError. If False, values that are not present in the mapper are kept as they are.

process_every_value (bool):

If True, all fields to be mapped should be lists, and the mapping is to be applied to their individual elements. If False, mapping is only applied to a field containing a single value.

Examples:

MapInstanceValues(mappers={"a": {"1": "hi", "2": "bye"}}) replaces "1" with "hi" and "2" with "bye" in field "a" in all instances of all streams: instance {"a": 1, "b": 2} becomes {"a": "hi", "b": 2}. Note that the value of "b" remained intact, since field-name "b" does not participate in the mappers, and that 1 was casted to "1" before looked up in the mapper of "a".

MapInstanceValues(mappers={"a": {"1": "hi", "2": "bye"}}, process_every_value=True): Assuming field "a" is a list of values, potentially including "1"-s and "2"-s, this replaces each such "1" with "hi" and "2" – with "bye" in all instances of all streams: instance {"a": ["1", "2"], "b": 2} becomes {"a": ["hi", "bye"], "b": 2}.

MapInstanceValues(mappers={"a": {"1": "hi", "2": "bye"}}, strict=True): To ensure that all values of field "a" are mapped in every instance, use strict=True. Input instance {"a":"3", "b": 2} will raise an exception per the above call, because "3" is not a key in the mapper of "a".

MapInstanceValues(mappers={"a": {str([1,2,3,4]): "All", str([]): "None"}}, strict=True) replaces a list [1,2,3,4] with the string "All" and an empty list by string "None".

Read more about catalog usage here.