π Token OverlapΒΆ
Metric that evaluates tool call predictions with reference calls. First generate unique key value pairs for the tool name, and all the parameters (including nested parameter). Supports only a single reference call per prediction.
Reports average token_overlap for each key, as well as micro and macro averages across all keys.
metrics.tool_calling.key_value.token_overlap
ToolCallKeyValueExtraction(
metric="metrics.token_overlap",
score_prefix="token_overlap_",
)
[source]Explanation about ToolCallKeyValueExtractionΒΆ
Metrics that formulate ToolCall evaluation as a Key Value Extraction task.
Each argument and each nested value are first flatten to a key value.
{ arguments : {βnameβ : βJohnβ, βaddressβ : { βstreetβ : βMain Stβ, βCityβ : βSmallvilleβ } } }
becomes
argument.names = βJohnβ argument.address.street = βMain Stβ argument.address.city = βSmallvileβ
Note that by default, if a parameter is a list of dictionaries, they are flattened with indexes
- { arguments{βaddressesβ[{ βstreetββMain Stβ, βCityββSmallvilleβ } ,
{ βstreetβ : βLog Stβ, βCityβ : βBigCityβ } ] } }
argument.address.0.street = βMain Stβ argument.address.0.city = βSmallvileβ argument.address.1.street = βLog Stβ argument.address.1.city = βBigCityβ
But if each dictionary in the list has a single unique key, it is used instead.
- { arguments{βaddressesβ[ { βhomeβ{ βstreetββMain Stβ, βCityββSmallvilleβ }} ,
{ βworkβ : {βstreetβ : βLog Stβ, βCityβ : βBigCityβ } ] } }
argument.address.home.street = βMain Stβ argument.address.home.city = βSmallvileβ argument.address.work.street = βLog Stβ argument.address.work.city = βBigCityβ
References: metrics.token_overlap
Read more about catalog usage here.