📄 Llama3 V1 Ibmgenai Judges¶

Note

ID: metrics.llm_as_judge.conversation_answer_idk.llama3_v1_ibmgenai_judges | Type: LLMAsJudge

{
    "__type__": "llm_as_judge",
    "inference_model": {
        "__type__": "ibm_gen_ai_inference_engine",
        "model_name": "meta-llama/llama-3-70b-instruct",
        "parameters": {
            "__type__": "ibm_gen_ai_inference_engine_params",
            "max_new_tokens": 256
        }
    },
    "main_score": "metrics.llm_as_judge.rating.llama370binstruct_template_v1",
    "prediction_type": "str",
    "task": "rating.single_turn",
    "template": "templates.response_assessment.judges.idk.v1"
}

Explanation about LLMAsJudge¶

LLM-as-judge-based metric class for evaluating correctness.

Attributes:
main_score (str): The main score label used for evaluation. task (Literal[“rating.single_turn”]): The type of task the llm as judge runs. This defines the output and input

format of the judge model.

template (Template): The template used when generating inputs for the judge llm. format (Format): The format used when generating inputs for judge llm. system_prompt (SystemPrompt): The system prompt used when generating inputs for judge llm. strip_system_prompt_and_format_from_inputs (bool): Whether to strip the system prompt and formatting from the

inputs that the models that is being judges received, when they are inserted to the llm-as-judge prompt.

inference_model (InferenceEngine): The module that creates the inference of the judge llm. reduction_map (dict): A dictionary specifying the reduction method for the metric. batch_size (int): The size of the bulk.

References: templates.response_assessment.judges.idk.v1