unitxt.metric_utils module

class unitxt.metric_utils.FromPredictionsAndOriginalData(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None, caching: bool = None)

Bases: StreamInitializerOperator

class unitxt.metric_utils.InstanceInput(references: List[Any], additional_inputs: Dict | None = None)

Bases: Dataclass

A single instance inputted to a metric service.

class unitxt.metric_utils.MetricRecipe(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, max_steps: int | None = None, steps: ~typing.List[~unitxt.operator.StreamingOperator], caching: bool = None, calc_confidence_intervals: bool = True, number_of_fusion_generations: int = 2)

Bases: SequentialOperatorInitializer

class unitxt.metric_utils.MetricRequest

Bases: Dataclass

A request to a metrics service, includes a list of input instances.

class unitxt.metric_utils.MetricResponse(global_score: Dict[str, Any])

Bases: Dataclass

A response produced by a metrics service, includes the computed scores.

class unitxt.metric_utils.MultiStreamScoreMean(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None, caching: bool = None)

Bases: MultiStreamOperator

Given a multi-stream where each stream is already scored globally, generate a nested global score for the whole multi-stream.

The whole-ms-global-score is a nested structure, specifying (also) the individual global scores of the individual streams participating in the input multi_stream. The instances of all these individual streams are assumed to have the “group” field indicate the stream they belong to. Potentially, these individual streams were produced from a SplitByNestedGroup operator that did not use the full length of the value in field “group” of the instances, but only the first g components thereof, indicated by argument ‘number_of_fusion_generations’ of operator SplitByNestedGroup. At any rate, a distinguishing prefix of the “group” value is recorded, by operator SplitByNestedGroup, in the stream_name. The nested structure of the whole-ms-global-score is induced by these distinguishing prefixes, by virtue of the global score of each individual stream sitting in the nested whole-ms-global-score, deep in that dictionary, at the leaf lead to by a path being the distinguishing prefix indicated in the stream_name. Thus, the global score of the stream becomes a leaf (though a dict by itself) of the whole-ms-global-score.

The ancestor nodes of the above leaves, in the whole-ms-global-score, contain each (in addition to dicts leading down to leaves) a field named “score” whose value is set to be the mean of the values sitting in field “score” of its immediate children nodes, and a field named “score_name” whose value is set to be “group_mean”.

When the input multistream consists of one single stream, it is returned as is, mainly for backward compatibility.

class unitxt.metric_utils.PostProcessRecipe(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, max_steps: int | None = None, steps: ~typing.List[~unitxt.operator.StreamingOperator], caching: bool = None)

Bases: SequentialOperatorInitializer

unitxt.metric_utils.get_remote_metrics_endpoint() str

Load the remote metrics endpoint from an environment variable.

Returns:

str - The remote endpoint on which the remote metrics are available.

unitxt.metric_utils.get_remote_metrics_names() List[str]

Load the remote metrics names from an environment variable.

Returns:

List[str] - names of metrics to be executed remotely.