unitxt.llm_as_judge module

class unitxt.llm_as_judge.LLMAsJudge(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, main_score: str = 'llm_as_judge', prediction_type: ~typing.Any | str = typing.Any, single_reference_per_prediction: bool = False, score_prefix: str = '', n_resamples: int = 1000, confidence_level: float = 0.95, ci_scores: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, reduction_map: ~typing.Dict[str, ~typing.List[str]] | None = None, implemented_reductions: ~typing.List[str], task: ~typing.Literal['rating.single_turn', 'rating.single_turn_with_reference', 'pairwise_comparative_rating.single_turn'], template: ~unitxt.templates.Template, format: ~unitxt.formats.Format = None, system_prompt: ~unitxt.system_prompts.SystemPrompt = None, strip_system_prompt_and_format_from_inputs: bool = True, inference_model: ~unitxt.inference.InferenceEngine, batch_size: int = 32)

Bases: BulkInstanceMetric

LLM as judge based metric class for evaluating correctness.

main_score

The main score label used for evaluation.

Type:

str

task

The type of task the llm-as-judge runs. This defines the output and input format of the jude model.

Type:

Literal[“rating.single_turn”]

template

The template used when generating inputs for the judge llm.

Type:

Template

format

The format used when generating inputs for judge llm.

Type:

Format

system_prompt

The system prompt used when generating inputs for judge llm.

Type:

SystemPrompt

strip_system_prompt_and_format_from_inputs

Whether to strip the system prompt and formatting from the inputs that the models that is being judges received, when they are inserted to the llm-as-judge prompt.

Type:

bool

inference_model

the module that creates the inference of the judge llm.

Type:

InferenceEngine

reduction_map

A dictionary specifying the reduction method for the metric.

Type:

dict

batch_size

The size of the bulk.

Type:

int

prediction_type: Type | str = typing.Any