π llm_as_judge_using_mt_bench_templateΒΆ
Note
ID: templates.rag.model_response_assessment.llm_as_judge_using_mt_bench_template | Type: InputOutputTemplate
{
"input_format": "[Question]\n{question}\n\n[The Start of Assistant's Answer]\n{model_output}\n[The End of Assistant's Answer]",
"instruction": "Please act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below. Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity, and level of detail of the response. Begin your evaluation by providing a short explanation. Be as objective as possible. After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"[[5]]\".\n\n",
"output_format": "{rating_label}",
"postprocessors": [
"processors.extract_mt_bench_judgment"
],
"type": "input_output_template"
}
Explanation about InputOutputTemplateΒΆ
Generate field βsourceβ from fields designated as input, and fields βtargetβ and βreferencesβ from fields designated as output, of the processed instance.
Args specify the formatting strings with which to glue together the input and output designated fields of the processed instance into one string (βsourceβ and βtargetβ), and into a list of strings (βreferencesβ).
References: processors.extract_mt_bench_judgment
Read more about catalog usage here.