πŸ“„ AccuracyΒΆ

Metric that evaluates tool call predictions with reference calls. First generate unique key value pairs for the tool name, and all the parameters (including nested parameter). Reports average accuracy for each key, as well as micro and macro averages across all keys.

Supports only a single reference call per prediction.

metrics.tool_calling.key_value.accuracy

Explanation about ToolCallKeyValueExtractionΒΆ

Metrics that formulate ToolCall evaluation as a Key Value Extraction task.

Each argument and each nested value are first flatten to a key value.

{ arguments : {β€œname” : β€œJohn”, β€œaddress” : { β€œstreet” : β€œMain St”, β€œCity” : β€œSmallville” } } }

becomes

argument.names = β€œJohn” argument.address.street = β€œMain St” argument.address.city = β€œSmallvile”

Note that by default, if a parameter is a list of dictionaries, they are flattened with indexes

{ arguments{β€œaddresses”[{ β€œstreetβ€β€œMain St”, β€œCityβ€β€œSmallville” } ,

{ β€œstreet” : β€œLog St”, β€œCity” : β€œBigCity” } ] } }

argument.address.0.street = β€œMain St” argument.address.0.city = β€œSmallvile” argument.address.1.street = β€œLog St” argument.address.1.city = β€œBigCity”

But if each dictionary in the list has a single unique key, it is used instead.

{ arguments{β€œaddresses”[ { β€œhome”{ β€œstreetβ€β€œMain St”, β€œCityβ€β€œSmallville” }} ,

{ β€œwork” : {β€œstreet” : β€œLog St”, β€œCity” : β€œBigCity” } ] } }

argument.address.home.street = β€œMain St” argument.address.home.city = β€œSmallvile” argument.address.work.street = β€œLog St” argument.address.work.city = β€œBigCity”

References: metrics.accuracy

Read more about catalog usage here.