πŸ“„ llm_as_judge_using_mt_bench_templateΒΆ

Note

ID: templates.rag.model_response_assessment.llm_as_judge_using_mt_bench_template | Type: InputOutputTemplate

{
    "input_format": "[Question]\n{question}\n\n[The Start of Assistant's Answer]\n{model_output}\n[The End of Assistant's Answer]",
    "instruction": "Please act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below. Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity,  and level of detail of the response. Begin your evaluation by providing a short explanation. Be as objective as possible. After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"[[5]]\".\n\n",
    "output_format": "{rating_label}",
    "postprocessors": [
        "processors.extract_mt_bench_judgment"
    ],
    "type": "input_output_template"
}

Explanation about InputOutputTemplateΒΆ

Generate field β€˜source’ from fields designated as input, and fields β€˜target’ and β€˜references’ from fields designated as output, of the processed instance.

Args specify the formatting strings with which to glue together the input and output designated fields of the processed instance into one string (β€˜source’ and β€˜target’), and into a list of strings (β€˜references’).

References: processors.extract_mt_bench_judgment

Read more about catalog usage here.