unitxt.llm_as_judge module¶

class unitxt.llm_as_judge.LLMAsJudge(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, main_score: str = 'llm_as_judge', prediction_type: ~typing.Any | str = typing.Any, single_reference_per_prediction: bool = False, score_prefix: str = '', n_resamples: int = 1000, confidence_level: float = 0.95, ci_scores: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, reduction_map: ~typing.Dict[str, ~typing.List[str]] | None = None, implemented_reductions: ~typing.List[str], task: ~typing.Literal['rating.single_turn', 'rating.single_turn_with_reference', 'pairwise_comparative_rating.single_turn'], template: ~unitxt.templates.Template, format: ~unitxt.formats.Format = None, system_prompt: ~unitxt.system_prompts.SystemPrompt = None, strip_system_prompt_and_format_from_inputs: bool = True, inference_model: ~unitxt.inference.InferenceEngine, batch_size: int = 32)¶

Bases: BulkInstanceMetric

LLM as judge based metric class for evaluating correctness.

main_score¶

The main score label used for evaluation.

Type:: str

task¶

The type of task the llm-as-judge runs. This defines the output and input format of the jude model.

Type:: Literal[“rating.single_turn”]

template¶

The template used when generating inputs for the judge llm.

Type:: Template

format¶

The format used when generating inputs for judge llm.

Type:: Format

system_prompt¶

The system prompt used when generating inputs for judge llm.

Type:: SystemPrompt

strip_system_prompt_and_format_from_inputs¶

Whether to strip the system prompt and formatting from the inputs that the models that is being judges received, when they are inserted to the llm-as-judge prompt.

Type:: bool

inference_model¶

the module that creates the inference of the judge llm.

Type:: InferenceEngine

reduction_map¶

A dictionary specifying the reduction method for the metric.

Type:: dict

batch_size¶

The size of the bulk.

Type:: int

prediction_type: Type | str = typing.Any¶