πŸ“„ Mt Bench Single TurnΒΆ

templates.response_assessment.rating.mt_bench_single_turn

type: InputOutputTemplate
instruction: "Please act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below. Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity, and level of detail of the response. Begin your evaluation by providing a short explanation. Be as objective as possible. After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"Rating: [[5]]\".\n\n"
input_format: "[Question]\n{question}\n\n[The Start of Assistant's Answer]\n{answer}\n[The End of Assistant's Answer]"
output_format: [[{rating}]]
postprocessors: 
  - processors.extract_mt_bench_rating_judgment
[source]

Explanation about InputOutputTemplateΒΆ

Generate field β€˜source’ from fields designated as input, and fields β€˜target’ and β€˜references’ from fields designated as output, of the processed instance.

Args specify the formatting strings with which to glue together the input and reference fields of the processed instance into one string (β€˜source’ and β€˜target’), and into a list of strings (β€˜references’).

References: processors.extract_mt_bench_rating_judgment

Read more about catalog usage here.