π Llama3 V1 Ibmgenai JudgesΒΆ
Note
ID: metrics.llm_as_judge.conversation_answer_idk.llama3_v1_ibmgenai_judges | Type: LLMAsJudge
{
"__type__": "llm_as_judge",
"inference_model": {
"__type__": "ibm_gen_ai_inference_engine",
"model_name": "meta-llama/llama-3-70b-instruct",
"parameters": {
"__type__": "ibm_gen_ai_inference_engine_params",
"max_new_tokens": 256
}
},
"main_score": "metrics.llm_as_judge.rating.llama370binstruct_template_v1",
"prediction_type": "str",
"task": "rating.single_turn",
"template": "templates.response_assessment.judges.idk.v1"
}
Explanation about LLMAsJudgeΒΆ
LLM-as-judge-based metric class for evaluating correctness.
- Attributes:
main_score (str): The main score label used for evaluation. task (Literal[βrating.single_turnβ]): The type of task the llm as judge runs. This defines the output and input
format of the judge model.
template (Template): The template used when generating inputs for the judge llm. format (Format): The format used when generating inputs for judge llm. system_prompt (SystemPrompt): The system prompt used when generating inputs for judge llm. strip_system_prompt_and_format_from_inputs (bool): Whether to strip the system prompt and formatting from the
inputs that the models that is being judges received, when they are inserted to the llm-as-judge prompt.
inference_model (InferenceEngine): The module that creates the inference of the judge llm. reduction_map (dict): A dictionary specifying the reduction method for the metric. batch_size (int): The size of the bulk.
References: templates.response_assessment.judges.idk.v1
Read more about catalog usage here.