unitxt.metric_utils module¶
- class unitxt.metric_utils.DeleteTargetPrefix(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None)[source]¶
Bases:
InstanceOperator
,ArtifactFetcherMixin
- class unitxt.metric_utils.EvaluationResults(*args, metadata=None, **kwargs)[source]¶
Bases:
list
- property global_scores¶
- property groups_scores¶
- property instance_scores: InstanceScores¶
- property subsets_scores¶
- class unitxt.metric_utils.FromPredictionsAndOriginalData(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None)[source]¶
Bases:
StreamInitializerOperator
- class unitxt.metric_utils.GlobalScores[source]¶
Bases:
dict
GlobalScores is a dictionary-based class designed to handle and transform metric results into a structured format.
- Parameters:
score (float) – The main score value.
score_name (str) – The name of the main score.
- property score¶
- property score_name¶
- property summary¶
- to_df()[source]¶
Transforms a dictionary of results into a pandas dataframe.
Transforms a dictionary of results into a dataframe with score_name as the index, and columns for score, ci_low, and ci_high. Handles cases where confidence intervals are missing.
- Returns:
A dataframe with the extracted information, indexed by score_name.
- Return type:
pd.DataFrame
- class unitxt.metric_utils.GroupsScores[source]¶
Bases:
dict
A dictionary subclass to store and manage group scores.
This class provides a property to summarize the scores and a custom string representation for pretty-printing.
- property summary¶
A property to get a summary of the group scores.
- class unitxt.metric_utils.InstanceInput(prediction: Any = __required__, references: List[Any] = __required__, additional_inputs: Dict | NoneType = None)[source]¶
Bases:
Dataclass
A single instance inputted to a metric service.
- class unitxt.metric_utils.InstanceScores(instances)[source]¶
Bases:
list
- property summary¶
- to_df(flatten=True, columns=None)[source]¶
Transforms the stored results into a pandas DataFrame.
- Parameters:
flatten (bool, optional) – Determines whether to use the flattened list of results (self) or the original instances (self.original_instances). Defaults to True.
columns (list, optional) – A list of column names to select from the resulting DataFrame. If None, all columns are included. Defaults to None.
- Returns:
A DataFrame containing the transformed results. If columns is specified, only the specified columns are included.
- Return type:
pandas.DataFrame
- Raises:
KeyError – If any specified column in columns does not exist in the DataFrame.
- class unitxt.metric_utils.JoinSubsetsAndGroups(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None)[source]¶
Bases:
MultiStreamOperator
- class unitxt.metric_utils.MetricRecipe(data_classification_policy: List[str] = None, max_steps: int | NoneType = None, steps: List[unitxt.operator.StreamingOperator] = [], _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, calc_confidence_intervals: bool = True, subset_depth: int = 2)[source]¶
- class unitxt.metric_utils.MetricRequest(instance_inputs: List[unitxt.metric_utils.InstanceInput] = __required__)[source]¶
Bases:
Dataclass
A request to a metrics service, includes a list of input instances.
- class unitxt.metric_utils.MetricResponse(instances_scores: List[Dict[str, Any]] = __required__, global_score: Dict[str, Any] = __required__)[source]¶
Bases:
Dataclass
A response produced by a metrics service, includes the computed scores.
- class unitxt.metric_utils.PostProcessRecipe(data_classification_policy: List[str] = None, max_steps: int | NoneType = None, steps: List[unitxt.operator.StreamingOperator] = [], _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None)[source]¶
- class unitxt.metric_utils.SplitSubsetsAndGroups(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, subsets_field: str = 'subset', groups_field: str = 'groups', subset_depth: int | NoneType = None)[source]¶
Bases:
MultiStreamOperator
Splits a MultiStream that is small - for metrics, hence: whole stream can sit in memory, split by the value of field ‘group’.
- Parameters:
number_of_fusion_generations – int
the value in field group is of the form “sourcen/sourcenminus1/…” describing the sources in which the instance sat when these were fused, potentially several phases of fusion. the name of the most recent source sits first in this value. (See BaseFusion and its extensions) subsets_depth specifies the depth of the prefix by which to split the stream.
- unitxt.metric_utils.get_remote_metrics_endpoint() str [source]¶
Load the remote metrics endpoint from an environment variable.
- Returns:
str - The remote endpoint on which the remote metrics are available.
- unitxt.metric_utils.get_remote_metrics_names() List[str] [source]¶
Load the remote metrics names from an environment variable.
- Returns:
List[str] - names of metrics to be executed remotely.
- unitxt.metric_utils.group_str(json_str)¶
- unitxt.metric_utils.group_str_to_key_value(group_str)¶
- unitxt.metric_utils.stream_name_to_origin_subset_group(stream_name)¶