πŸ“„ Tool CallingΒΆ

benchmarks.tool_calling

Benchmark(
    subsets={
        "bfcl.simple": DatasetRecipe(
            card="cards.bfcl.multi_turn.simple_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.multiple": DatasetRecipe(
            card="cards.bfcl.multi_turn.multiple_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.live_multiple": DatasetRecipe(
            card="cards.bfcl.multi_turn.live_multiple_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.live_simple": DatasetRecipe(
            card="cards.bfcl.multi_turn.live_simple_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.java": DatasetRecipe(
            card="cards.bfcl.multi_turn.java_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.javascript": DatasetRecipe(
            card="cards.bfcl.multi_turn.javascript_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.parallel": DatasetRecipe(
            card="cards.bfcl.multi_turn.parallel_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.parallel_multiple": DatasetRecipe(
            card="cards.bfcl.multi_turn.parallel_multiple_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.live_parallel": DatasetRecipe(
            card="cards.bfcl.multi_turn.live_parallel_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "bfcl.live_parallel_multiple": DatasetRecipe(
            card="cards.bfcl.multi_turn.live_parallel_multiple_v3",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
        "xlam": DatasetRecipe(
            card="cards.xlam_function_calling_60k",
            format="formats.chat_api",
            metrics=[
                "metrics.tool_calling.multi_turn.validity",
                "metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge",
            ],
        ),
    },
)
[source]

from unitxt.standard import DatasetRecipe

Explanation about DatasetRecipeΒΆ

This class represents a standard recipe for data processing and preparation.

This class can be used to prepare a recipe. with all necessary steps, refiners and renderers included. It allows to set various parameters and steps in a sequential manner for preparing the recipe.

Args:
card (TaskCard):

TaskCard object associated with the recipe.

template (Template, optional):

Template object to be used for the recipe.

system_prompt (SystemPrompt, optional):

SystemPrompt object to be used for the recipe.

loader_limit (int, optional):

Specifies the maximum number of instances per stream to be returned from the loader (used to reduce loading time in large datasets)

format (SystemFormat, optional):

SystemFormat object to be used for the recipe.

metrics (List[str]):

list of catalog metrics to use with this recipe.

postprocessors (List[str]):

list of catalog processors to apply at post processing. (Not recommended to use from here)

group_by (List[Union[str, List[str]]]):

list of task_data or metadata keys to group global scores by.

train_refiner (StreamRefiner, optional):

Train refiner to be used in the recipe.

max_train_instances (int, optional):

Maximum training instances for the refiner.

validation_refiner (StreamRefiner, optional):

Validation refiner to be used in the recipe.

max_validation_instances (int, optional):

Maximum validation instances for the refiner.

test_refiner (StreamRefiner, optional):

Test refiner to be used in the recipe.

max_test_instances (int, optional):

Maximum test instances for the refiner.

demos_pool_size (int, optional):

Size of the demos pool. -1 for taking the whole of stream β€˜demos_taken_from’.

demos_pool(List[Dict[str, Any]], optional):

a list of instances to make the demos_pool

num_demos (int, optional):

Number of demos to add to each instance, to become part of the source to be generated for this instance.

demos_taken_from (str, optional):

Specifies the stream from where the demos are taken. Default is β€œtrain”.

demos_field (str, optional):

Field name for demos. Default is β€œdemos”. The num_demos demos selected for an instance are stored in this field of that instance.

demos_pool_field_name (str, optional):

field name to maintain the demos_pool, until sampled from, in order to make the demos. Defaults to constants.demos_pool_field.

demos_removed_from_data (bool, optional):

whether to remove the demos taken to demos_pool from the source data, Default is True

sampler (Sampler, optional):

The Sampler used to select the demonstrations when num_demos > 0.

skip_demoed_instances (bool, optional):

whether to skip pushing demos to an instance whose demos_field is already populated. Defaults to False.

steps (List[StreamingOperator], optional):

List of StreamingOperator objects to be used in the recipe.

augmentor (Augmentor) :

Augmentor to be used to pseudo randomly augment the source text

instruction_card_index (int, optional):

Index of instruction card to be used for preparing the recipe.

template_card_index (int, optional):

Index of template card to be used for preparing the recipe.

Methods:
prepare():

This overridden method is used for preparing the recipe by arranging all the steps, refiners, and renderers in a sequential manner.

Raises:
AssertionError:

If both template and template_card_index are specified at the same time.

References: metrics.tool_calling.multi_turn.correctness.llama_3_3_70b_instruct_judge, cards.bfcl.multi_turn.live_parallel_multiple_v3, cards.bfcl.multi_turn.parallel_multiple_v3, metrics.tool_calling.multi_turn.validity, cards.bfcl.multi_turn.live_parallel_v3, cards.bfcl.multi_turn.live_multiple_v3, cards.bfcl.multi_turn.live_simple_v3, cards.bfcl.multi_turn.javascript_v3, cards.bfcl.multi_turn.multiple_v3, cards.bfcl.multi_turn.parallel_v3, cards.bfcl.multi_turn.simple_v3, cards.xlam_function_calling_60k, cards.bfcl.multi_turn.java_v3, formats.chat_api

Read more about catalog usage here.