πŸ“„ Jaccard Index WordsΒΆ

JaccardIndex metric that operates on prediction and references that are strings.

It first splits the the string into words using space as a separator.

For each prediction, it calculates the ratio Intersect(prediction_words,reference_words)/Union(prediction_words,reference_words). If multiple references exist, it takes the best ratio achieved by one of the references.

metrics.jaccard_index_words

JaccardIndexString(
    splitter=RegexSplit(
        by="\s+",
    ),
)
[source]

from unitxt.string_operators import RegexSplit

Explanation about JaccardIndexStringΒΆ

Calculates JaccardIndex on strings.

Requires setting the β€˜splitter’ to a FieldOperator (such as Split or RegexSplit) to tokenize the predictions and references into lists of strings tokens.

These tokens are passed to the JaccardIndex as lists.

Read more about catalog usage here.