π Mt Bench Single TurnΒΆ
templates.response_assessment.pairwise_comparison.mt_bench_single_turn
PairwiseChoiceTemplate
(
choice_a_field="answer_a",
choice_b_field="answer_b",
answer_field="winner",
choice_a_label="A",
choice_b_label="B",
choice_tie_label="C",
shuffle=False,
instruction="Please act as an impartial judge and evaluate the quality of the responses provided by two AI assistants to the user question displayed below. You should choose the assistant that follows the user's instructions and answers the user's question better. Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity, and level of detail of their responses. Begin your evaluation by comparing the two responses and provide a short explanation. Avoid any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible. After providing your explanation, output your final verdict by strictly following this format: "[[A]]" if assistant A is better, "[[B]]" if assistant B is better, and "[[C]]" for a tie.
",
input_format="[User Question]
{question}
[The Start of Assistant A's Answer]
{answer_a}
[The End of Assistant A's Answer]
[The Start of Assistant B's Answer]
{answer_b}
[The End of Assistant B's Answer]",
output_format="[[{winner}]]",
postprocessors=[
"processors.extract_mt_bench_label_judgment",
],
)
[source]Explanation about PairwiseChoiceTemplateΒΆ
PairwiseChoiceTemplate.
- Requirements:
The answer field value should be of type Literal[βchoice_aβ, βchoice_bβ, βtieβ]
- Args:
- choice_a_field (str):
The field which contains choice_a value
- choice_b_field (str):
The field which contains choice_b value
- answer_field (str):
The field which contains the answer value. Should be of type Literal[βchoice_1β, βchoice_2β, βtieβ]
- choice_a_label (str):
The label of choice A answer as it is verbalized in the template.
- choice_b_label (str):
The label of choice B answer as it is verbalized in the template.
- choice_tie_label (str):
The label of a tie answer as it should be verbalized in the template.
- shuffle (bool):
whether to shuffle the choices or not. This is done to take into account position bias.
- shuffle: 50% of the time:
The values of choice_a_field and choice_b_field will be swapped.
If the values of answer_field is choice_a_label, set it to choice_b_label. Else if the values of answer_field is choice_b_label, set it to choice_a_label. Else if the value of answer_field is choice_tie_label, do nothing.
References: processors.extract_mt_bench_label_judgment
Read more about catalog usage here.