unitxt.llm_as_judge module¶
- class unitxt.llm_as_judge.LLMAsJudge(__tags__: ~typing.Dict[str, str] = {}, main_score: str = 'llm_as_judge', prediction_type: str = None, single_reference_per_prediction: bool = False, n_resamples: int = 1000, confidence_level: float = 0.95, ci_scores: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, reduction_map: ~typing.Dict[str, ~typing.List[str]] = None, implemented_reductions: ~typing.List[str], batch_size: int = 32, recipe: str, inference_model: ~unitxt.inference.InferenceEngine)¶
Bases:
BulkInstanceMetricLLM as judge based metric class for evaluating correctness.
- main_score¶
The main score used for evaluation.
- Type:
str
- reduction_map¶
A dictionary specifying the reduction method for the metric.
- Type:
dict
- betch_size¶
The size of the bulk.
- Type:
int
- recipe¶
The unitxt recipe that will be used to create the judge dataset.
- Type:
str
- inference¶
the module that creates the inference.
- Type:
- prepare(self)¶
Initialization method for the metric.
- compute(self, references, predictions, additional_inputs)¶
Method to compute the metric.
- Usage:
metric = LlamaIndexCorrectnessMetric() scores = metric.compute(references, prediction, additional_inputs)