📄 Token Overlap¶

Metric that evaluates tool call predictions with reference calls. First generate unique key value pairs for the tool name, and all the parameters (including nested parameter). Supports only a single reference call per prediction.

Reports average token_overlap for each key, as well as micro and macro averages across all keys.

metrics.tool_calling.key_value.token_overlap

ToolCallKeyValueExtraction(
    metric="metrics.token_overlap",
    score_prefix="token_overlap_",
)
[source]

Explanation about ToolCallKeyValueExtraction¶

Metrics that formulate ToolCall evaluation as a Key Value Extraction task.

Each argument and each nested value are first flatten to a key value.

{ arguments : {“name” : “John”, “address” : { “street” : “Main St”, “City” : “Smallville” } } }

becomes

argument.names = “John” argument.address.street = “Main St” argument.address.city = “Smallvile”

Note that by default, if a parameter is a list of dictionaries, they are flattened with indexes

{ arguments{“addresses”[{ “street”“Main St”, “City”“Smallville” } ,
{ “street” : “Log St”, “City” : “BigCity” } ] } }

argument.address.0.street = “Main St” argument.address.0.city = “Smallvile” argument.address.1.street = “Log St” argument.address.1.city = “BigCity”

But if each dictionary in the list has a single unique key, it is used instead.

{ arguments{“addresses”[ { “home”{ “street”“Main St”, “City”“Smallville” }} ,
{ “work” : {“street” : “Log St”, “City” : “BigCity” } ] } }

argument.address.home.street = “Main St” argument.address.home.city = “Smallvile” argument.address.work.street = “Log St” argument.address.work.city = “BigCity”

References: metrics.token_overlap