unitxt.metrics module
- class unitxt.metrics.Accuracy(*argv, **kwargs)
Bases:
InstanceMetric- reduction_map = {'mean': ['accuracy']}
- class unitxt.metrics.BertScore(*argv, **kwargs)
Bases:
HuggingfaceBulkMetric- ci_scores: List[str] = ['f1', 'precision', 'recall']
- hf_metric_fields: List[str] = ['f1', 'precision', 'recall']
- reduction_map: Dict[str, List[str]] = {'mean': ['f1', 'precision', 'recall']}
- class unitxt.metrics.BulkInstanceMetric(*argv, **kwargs)
- class unitxt.metrics.CharEditDistanceAccuracy(*argv, **kwargs)
Bases:
InstanceMetric- reduction_map = {'mean': ['char_edit_dist_accuracy']}
- class unitxt.metrics.CustomF1(*argv, **kwargs)
Bases:
GlobalMetric
- class unitxt.metrics.F1(*argv, **kwargs)
Bases:
GlobalMetric
- class unitxt.metrics.F1MacroMultiLabel(*argv, **kwargs)
Bases:
F1MultiLabel
- class unitxt.metrics.F1MicroMultiLabel(*argv, **kwargs)
Bases:
F1MultiLabel
- class unitxt.metrics.F1MultiLabel(*argv, **kwargs)
Bases:
GlobalMetric- classes_to_ignore = ['none']
- class unitxt.metrics.GlobalMetric(*argv, **kwargs)
Bases:
SingleStreamOperator,MetricWithConfidenceIntervalA class for computing metrics that require joint calculations over all instances and are not just aggregation of scores of individuals instances.
For example, macro_F1 requires calculation requires calculation of recall and precision per class, so all instances of the class need to be considered. Accuracy, on the other hand, is just an average of the accuracy of all the instances.
- class unitxt.metrics.HuggingfaceBulkMetric(*argv, **kwargs)
Bases:
BulkInstanceMetric- hf_compute_args: dict = {}
- class unitxt.metrics.HuggingfaceMetric(*argv, **kwargs)
Bases:
GlobalMetric
- class unitxt.metrics.InstanceMetric(*argv, **kwargs)
Bases:
SingleStreamOperator,MetricWithConfidenceInterval- abstract property reduction_map: dict
- class unitxt.metrics.MAP(*argv, **kwargs)
Bases:
RetrievalMetric- reduction_map = {'mean': ['map']}
- class unitxt.metrics.MRR(*argv, **kwargs)
Bases:
RetrievalMetric- reduction_map = {'mean': ['mrr']}
- class unitxt.metrics.MatthewsCorrelation(*argv, **kwargs)
Bases:
HuggingfaceMetric
- class unitxt.metrics.MetricPipeline(*argv, **kwargs)
Bases:
MultiStreamOperator,Metric
- class unitxt.metrics.NDCG(*argv, **kwargs)
Bases:
GlobalMetricNormalized Discounted Cumulative Gain: measures the quality of ranking with respect to ground truth ranking scores.
As this measures ranking, it is a global metric that can only be calculated over groups of instances. In the common use case where the instances are grouped by different queries, i.e., where the task is to provide a relevance score for a search result w.r.t. a query, an nDCG score is calculated per each query (specified in the “query” input field of an instance) and the final score is the average across all queries. Note that the expected scores are relevance scores (i.e., higher is better) and not rank indices. The absolute value of the scores is only meaningful for the reference scores; for the predictions, only the ordering of the scores affects the outcome - for example, predicted scores of [80, 1, 2] and [0.8, 0.5, 0.6] will receive the same nDCG score w.r.t. a given set of reference scores.
See also https://en.wikipedia.org/wiki/Discounted_cumulative_gain
- class unitxt.metrics.Perplexity(*argv, **kwargs)
Bases:
BulkInstanceMetricComputes the likelihood of generating text Y after text X - P(Y|X).
- reduction_map: Dict[str, List[str]] = {'mean': ['perplexity']}
- class unitxt.metrics.PrecisionMacroMultiLabel(*argv, **kwargs)
Bases:
F1MultiLabel
- class unitxt.metrics.PrecisionMicroMultiLabel(*argv, **kwargs)
Bases:
F1MultiLabel
- class unitxt.metrics.RecallMacroMultiLabel(*argv, **kwargs)
Bases:
F1MultiLabel
- class unitxt.metrics.RecallMicroMultiLabel(*argv, **kwargs)
Bases:
F1MultiLabel
- class unitxt.metrics.RetrievalAtK(*argv, **kwargs)
Bases:
RetrievalMetric
- class unitxt.metrics.RetrievalMetric(*argv, **kwargs)
Bases:
InstanceMetric
- class unitxt.metrics.Reward(*argv, **kwargs)
Bases:
BulkInstanceMetric- reduction_map: Dict[str, List[str]] = {'mean': ['score']}
- class unitxt.metrics.Rouge(*argv, **kwargs)
Bases:
HuggingfaceMetric- rouge_types: List[str] = ['rouge1', 'rouge2', 'rougeL', 'rougeLsum']
- class unitxt.metrics.SentenceBert(*argv, **kwargs)
Bases:
BulkInstanceMetric- reduction_map: Dict[str, List[str]] = {'mean': ['score']}
- class unitxt.metrics.Squad(*argv, **kwargs)
Bases:
GlobalMetric
- class unitxt.metrics.StringContainment(*argv, **kwargs)
Bases:
InstanceMetric- reduction_map = {'mean': ['string_containment']}
- class unitxt.metrics.TokenOverlap(*argv, **kwargs)
Bases:
InstanceMetric- ci_scores: List[str] = ['f1', 'precision', 'recall']
- reduction_map = {'mean': ['f1', 'precision', 'recall']}
- class unitxt.metrics.UpdateStream(*argv, **kwargs)
Bases:
StreamInstanceOperator
- class unitxt.metrics.Wer(*argv, **kwargs)
Bases:
HuggingfaceMetric
- unitxt.metrics.abstract_factory()
- unitxt.metrics.abstract_field()
- unitxt.metrics.normalize_answer(s)
Lower text and remove punctuation, articles and extra whitespace.