📄 Mt Bench Single Turn With Reference¶

templates.response_assessment.pairwise_comparison.mt_bench_single_turn_with_reference

PairwiseChoiceTemplate(
    choice_a_field="answer_a",
    choice_b_field="answer_b",
    answer_field="winner",
    choice_a_label="A",
    choice_b_label="B",
    choice_tie_label="C",
    shuffle=False,
    instruction="Please act as an impartial judge and evaluate the quality of the responses provided by two AI assistants to the user question displayed below. Your evaluation should consider correctness and helpfulness. You will be given a reference answer, assistant A's answer, and assistant B's answer. Your job is to evaluate which assistant's answer is better. Begin your evaluation by comparing both assistants' answers with the reference answer. Identify and correct any mistakes. Avoid any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible. After providing your explanation, output your final verdict by strictly following this format: "[[A]]" if assistant A is better, "[[B]]" if assistant B is better, and "[[C]]" for a tie.

",
    input_format="[User Question]
{question}

[The Start of Reference Answer]
{reference_answer}
[The End of Reference Answer]

[The Start of Assistant A's Answer]
{answer_a}
[The End of Assistant A's Answer]

[The Start of Assistant B's Answer]
{answer_b}
[The End of Assistant B's Answer]",
    output_format="[[{winner}]]",
    postprocessors=[
        "processors.extract_mt_bench_label_judgment",
    ],
)
[source]

Explanation about PairwiseChoiceTemplate¶

PairwiseChoiceTemplate.

Requirements:
The answer field value should be of type Literal[“choice_a”, “choice_b”, “tie”]

Args:

choice_a_field (str):
The field which contains choice_a value

choice_b_field (str):
The field which contains choice_b value

answer_field (str):
The field which contains the answer value. Should be of type Literal[“choice_1”, “choice_2”, “tie”]

choice_a_label (str):
The label of choice A answer as it is verbalized in the template.

choice_b_label (str):
The label of choice B answer as it is verbalized in the template.

choice_tie_label (str):
The label of a tie answer as it should be verbalized in the template.

shuffle (bool):
whether to shuffle the choices or not. This is done to take into account position bias.

shuffle: 50% of the time:

The values of choice_a_field and choice_b_field will be swapped.

If the values of answer_field is choice_a_label, set it to choice_b_label. Else if the values of answer_field is choice_b_label, set it to choice_a_label. Else if the value of answer_field is choice_tie_label, do nothing.

References: processors.extract_mt_bench_label_judgment