πŸ“„ NdcgΒΆ

metrics.ndcg

MetricPipeline(
    main_score="nDCG",
    single_reference_per_prediction=True,
    preprocess_steps=[
        CastFields(
            fields={
                "prediction": "float",
                "references/0": "float",
            },
            failure_defaults={
                "prediction": None,
            },
        ),
    ],
    metric=NDCG(),
)
[source]

from unitxt.metrics import NDCG
from unitxt.operators import CastFields

Explanation about CastFieldsΒΆ

Casts specified fields to specified types.

Args:
fields (Dict[str, str]):

A dictionary mapping field names to the names of the types to cast the fields to. e.g: β€œint”, β€œstr”, β€œfloat”, β€œbool”. Basic names of types

defaults (Dict[str, object]):

A dictionary mapping field names to default values for cases of casting failure.

process_every_value (bool):

If true, all fields involved must contain lists, and each value in the list is then casted. Defaults to False.

Example:
CastFields(
    fields={"a/d": "float", "b": "int"},
    failure_defaults={"a/d": 0.0, "b": 0},
    process_every_value=True,
)

would process the input instance: {"a": {"d": ["half", "0.6", 1, 12]}, "b": ["2"]} into {"a": {"d": [0.0, 0.6, 1.0, 12.0]}, "b": [2]}.

Explanation about NDCGΒΆ

Normalized Discounted Cumulative Gain: measures the quality of ranking with respect to ground truth ranking scores.

As this measures ranking, it is a global metric that can only be calculated over groups of instances. In the common use case where the instances are grouped by different queries, i.e., where the task is to provide a relevance score for a search result w.r.t. a query, an nDCG score is calculated per each query (specified in the β€œquery” input field of an instance) and the final score is the average across all queries. Note that the expected scores are relevance scores (i.e., higher is better) and not rank indices. The absolute value of the scores is only meaningful for the reference scores; for the predictions, only the ordering of the scores affects the outcome - for example, predicted scores of [80, 1, 2] and [0.8, 0.5, 0.6] will receive the same nDCG score w.r.t. a given set of reference scores.

See also https://en.wikipedia.org/wiki/Discounted_cumulative_gain

Read more about catalog usage here.