π Tool Calling CorrectnessΒΆ
metrics.llm_as_judge.direct.criteria.tool_calling_correctness
CriteriaWithOptions(
name="tool_calling_correctness",
description="The response correctly uses tool calls as expected, including the right tool names and parameters, in line with the reference or user query and instructions.",
prediction_field=None,
context_fields=None,
options=[
CriteriaOption(
name="Excellent",
description="All tool calls are correct, including names and parameters, matching the reference or user expectations precisely.",
),
CriteriaOption(
name="Good",
description="Tool calls are mostly correct with minor errors that do not affect the functionality or intent.",
),
CriteriaOption(
name="Mediocre",
description="The response attempts tool calls with partial correctness, but has notable issues in tool names, structure, or parameters.",
),
CriteriaOption(
name="Bad",
description="The tool calling logic is largely incorrect, with significant mistakes in tool usage or missing key calls.",
),
CriteriaOption(
name="Very Bad",
description="The tool calls are completely incorrect, irrelevant, or missing when clearly required.",
),
],
option_map={
"Excellent": 1.0,
"Good": 0.75,
"Mediocre": 0.5,
"Bad": 0.25,
"Very Bad": 0.0,
},
)
[source]from unitxt.llm_as_judge_constants import CriteriaOption
Explanation about CriteriaWithOptionsΒΆ
Criteria used by DirectLLMJudge to run evaluations.
Explanation about CriteriaOptionΒΆ
A criteria option.
Read more about catalog usage here.