📄 Accuracy¶

Metric that evaluates tool call predictions with reference calls. First generate unique key value pairs for the tool name, and all the parameters (including nested parameter). Reports average accuracy for each key, as well as micro and macro averages across all keys.

Supports only a single reference call per prediction.

metrics.tool_calling.key_value.accuracy

ToolCallKeyValueExtraction(
    metric="metrics.accuracy",
)
[source]

Explanation about ToolCallKeyValueExtraction¶

Metrics that formulate ToolCall evaluation as a Key Value Extraction task.

Each argument and each nested value are first flatten to a key value.

{ arguments : {“name” : “John”, “address” : { “street” : “Main St”, “City” : “Smallville” } } }

becomes

argument.names = “John” argument.address.street = “Main St” argument.address.city = “Smallvile”

Note that by default, if a parameter is a list of dictionaries, they are flattened with indexes

{ arguments{“addresses”[{ “street”“Main St”, “City”“Smallville” } ,
{ “street” : “Log St”, “City” : “BigCity” } ] } }

argument.address.0.street = “Main St” argument.address.0.city = “Smallvile” argument.address.1.street = “Log St” argument.address.1.city = “BigCity”

But if each dictionary in the list has a single unique key, it is used instead.

{ arguments{“addresses”[ { “home”{ “street”“Main St”, “City”“Smallville” }} ,
{ “work” : {“street” : “Log St”, “City” : “BigCity” } ] } }

argument.address.home.street = “Main St” argument.address.home.city = “Smallvile” argument.address.work.street = “Log St” argument.address.work.city = “BigCity”

References: metrics.accuracy