πŸ“„ Token OverlapΒΆ

Metric that evaluates tool call predictions with reference calls. First generate unique key value pairs for the tool name, and all the parameters (including nested parameter). Supports only a single reference call per prediction.

Reports average token_overlap for each key, as well as micro and macro averages across all keys.

metrics.tool_calling.key_value.token_overlap

ToolCallKeyValueExtraction(
    metric="metrics.token_overlap",
    score_prefix="token_overlap_",
)
[source]

Explanation about ToolCallKeyValueExtractionΒΆ

Metrics that formulate ToolCall evaluation as a Key Value Extraction task.

Each argument and each nested value are first flatten to a key value.

{ arguments : {β€œname” : β€œJohn”, β€œaddress” : { β€œstreet” : β€œMain St”, β€œCity” : β€œSmallville” } } }

becomes

argument.names = β€œJohn” argument.address.street = β€œMain St” argument.address.city = β€œSmallvile”

Note that by default, if a parameter is a list of dictionaries, they are flattened with indexes

{ arguments{β€œaddresses”[{ β€œstreetβ€β€œMain St”, β€œCityβ€β€œSmallville” } ,

{ β€œstreet” : β€œLog St”, β€œCity” : β€œBigCity” } ] } }

argument.address.0.street = β€œMain St” argument.address.0.city = β€œSmallvile” argument.address.1.street = β€œLog St” argument.address.1.city = β€œBigCity”

But if each dictionary in the list has a single unique key, it is used instead.

{ arguments{β€œaddresses”[ { β€œhome”{ β€œstreetβ€β€œMain St”, β€œCityβ€β€œSmallville” }} ,

{ β€œwork” : {β€œstreet” : β€œLog St”, β€œCity” : β€œBigCity” } ] } }

argument.address.home.street = β€œMain St” argument.address.home.city = β€œSmallvile” argument.address.work.street = β€œLog St” argument.address.work.city = β€œBigCity”

References: metrics.token_overlap

Read more about catalog usage here.