unitxt.metric_utils module¶

class unitxt.metric_utils.DeleteTargetPrefix(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None)[source]¶: Bases: InstanceOperator, ArtifactFetcherMixin

class unitxt.metric_utils.EmptyPrediction[source]¶: Bases: object

class unitxt.metric_utils.EvaluationResults(*args, metadata=None, **kwargs)[source]¶

Bases: list

property global_scores¶

property groups_scores¶

property instance_scores: InstanceScores¶

property subsets_scores¶

class unitxt.metric_utils.FromPredictionsAndOriginalData(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None)[source]¶: Bases: StreamInitializerOperator

class unitxt.metric_utils.GlobalScores[source]¶

Bases: dict

GlobalScores is a dictionary-based class designed to handle and transform metric results into a structured format.

Parameters:

score (float) – The main score value.
score_name (str) – The name of the main score.

property score¶

property score_name¶

property summary¶

to_df()[source]¶

Transforms a dictionary of results into a pandas dataframe.

Transforms a dictionary of results into a dataframe with score_name as the index, and columns for score, ci_low, and ci_high. Handles cases where confidence intervals are missing.

Returns:: A dataframe with the extracted information, indexed by score_name.
Return type:: pd.DataFrame

class unitxt.metric_utils.GroupsScores[source]¶

Bases: dict

A dictionary subclass to store and manage group scores.

This class provides a property to summarize the scores and a custom string representation for pretty-printing.

property summary¶: A property to get a summary of the group scores.

class unitxt.metric_utils.InstanceInput(prediction: Any = __required__, references: List[Any] = __required__, additional_inputs: Dict | NoneType = None)[source]¶

Bases: Dataclass

A single instance inputted to a metric service.

class unitxt.metric_utils.InstanceScores(instances)[source]¶

Bases: list

property summary¶

to_df(flatten=True, columns=None)[source]¶

Transforms the stored results into a pandas DataFrame.

Parameters:

flatten (bool, optional) – Determines whether to use the flattened list of results (self) or the original instances (self.original_instances). Defaults to True.
columns (list, optional) – A list of column names to select from the resulting DataFrame. If None, all columns are included. Defaults to None.

Returns:

A DataFrame containing the transformed results. If columns is specified, only the specified columns are included.

Return type:

pandas.DataFrame

Raises:

KeyError – If any specified column in columns does not exist in the DataFrame.

to_markdown(flatten=True, columns=None, max_col_width=30, **kwargs)[source]¶

class unitxt.metric_utils.JoinSubsetsAndGroups(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None)[source]¶: Bases: MultiStreamOperator

class unitxt.metric_utils.MetricRecipe(data_classification_policy: List[str] = None, max_steps: int | NoneType = None, steps: List[unitxt.operator.StreamingOperator] = [], _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, calc_confidence_intervals: bool = True, subset_depth: int = 2)[source]¶: Bases: SequentialOperatorInitializer

class unitxt.metric_utils.MetricRequest(instance_inputs: List[unitxt.metric_utils.InstanceInput] = __required__)[source]¶

Bases: Dataclass

A request to a metrics service, includes a list of input instances.

class unitxt.metric_utils.MetricResponse(instances_scores: List[Dict[str, Any]] = __required__, global_score: Dict[str, Any] = __required__)[source]¶

Bases: Dataclass

A response produced by a metrics service, includes the computed scores.

class unitxt.metric_utils.PostProcessRecipe(data_classification_policy: List[str] = None, max_steps: int | NoneType = None, steps: List[unitxt.operator.StreamingOperator] = [], _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None)[source]¶: Bases: SequentialOperatorInitializer

class unitxt.metric_utils.SplitSubsetsAndGroups(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, subsets_field: str = 'subset', groups_field: str = 'groups', subset_depth: int | NoneType = None)[source]¶

Bases: MultiStreamOperator

Splits a MultiStream that is small - for metrics, hence: whole stream can sit in memory, split by the value of field ‘group’.

Parameters:: number_of_fusion_generations – int

the value in field group is of the form “sourcen/sourcenminus1/…” describing the sources in which the instance sat when these were fused, potentially several phases of fusion. the name of the most recent source sits first in this value. (See BaseFusion and its extensions) subsets_depth specifies the depth of the prefix by which to split the stream.

class unitxt.metric_utils.SubsetsScores[source]¶

Bases: dict

property summary¶

unitxt.metric_utils.empty_predictions_generator()[source]¶

unitxt.metric_utils.get_remote_metrics_endpoint() → str[source]¶

Load the remote metrics endpoint from an environment variable.

Returns:: str - The remote endpoint on which the remote metrics are available.

unitxt.metric_utils.get_remote_metrics_names() → List[str][source]¶

Load the remote metrics names from an environment variable.

Returns:: List[str] - names of metrics to be executed remotely.

unitxt.metric_utils.group_str(json_str)¶

unitxt.metric_utils.group_str_to_key_value(group_str)¶

unitxt.metric_utils.nan_mean(scores)[source]¶

unitxt.metric_utils.stream_name_to_origin_subset_group(stream_name)¶