unitxt.api module¶
- unitxt.api.create_dataset(task: str | Task, test_set: List[Dict[Any, Any]], train_set: List[Dict[Any, Any]] | None = None, validation_set: List[Dict[Any, Any]] | None = None, split: str | None = None, **kwargs) DatasetDict | IterableDatasetDict | Dataset | IterableDataset[source]¶
Creates dataset from input data based on a specific task.
- Parameters:
task – The name of the task from the Unitxt Catalog (https://www.unitxt.ai/en/latest/catalog/catalog.tasks.__dir__.html)
test_set – required list of instances
train_set – optional train_set
validation_set – optional validation set
split – optional one split to choose
**kwargs – Arguments used to load dataset from provided datasets (see load_dataset())
- Returns:
DatasetDict
Example
template = Template(…) dataset = create_dataset(task=”tasks.qa.open”, template=template, format=”formats.chatapi”)
- unitxt.api.evaluate(predictions, dataset: Dataset | IterableDataset | None = None, data=None) EvaluationResults[source]¶
- unitxt.api.infer(instance_or_instances, engine: InferenceEngine, dataset_query: str | None = None, return_data: bool = False, return_log_probs: bool = False, return_meta_data: bool = False, previous_messages: List[Dict[str, str]] | None = None, **kwargs)[source]¶
- unitxt.api.load(source: SourceOperator | str)[source]¶
- unitxt.api.load_dataset(dataset_query: str | None = None, split: str | None = None, streaming: bool = False, disable_cache: bool | None = None, **kwargs) DatasetDict | IterableDatasetDict | Dataset | IterableDataset[source]¶
Loads dataset.
If the ‘dataset_query’ argument is provided, then dataset is loaded from a card in local catalog based on parameters specified in the query.
Alternatively, dataset is loaded from a provided card based on explicitly given parameters.
- Parameters:
dataset_query (str, optional) – A string query which specifies a dataset to load from local catalog or name of specific recipe or benchmark in the catalog. For example,
"card=cards.wnli,template=templates.classification.multi_class.relation.default".streaming (bool, False) – When True yields the data as Unitxt streams dictionary
split (str, optional) – The split of the data to load
disable_cache (str, optional) – Disable caching process of the data
**kwargs – Arguments used to load dataset from provided card, which is not present in local catalog.
- Returns:
DatasetDict
- Example:
dataset = load_dataset( dataset_query="card=cards.stsb,template=templates.regression.two_texts.simple,max_train_instances=5" ) # card and template must be present in local catalog # or built programmatically card = TaskCard(...) template = Template(...) loader_limit = 10 dataset = load_dataset(card=card, template=template, loader_limit=loader_limit)
- unitxt.api.load_recipe(dataset_query: str | None = None, **kwargs) DatasetRecipe[source]¶
- unitxt.api.produce(instance_or_instances, dataset_query: str | None = None, **kwargs) Dataset | Dict[str, Any][source]¶
- unitxt.api.select(instance_or_instances, engine: OptionSelectingByLogProbsInferenceEngine, dataset_query: str | None = None, return_data: bool = False, previous_messages: List[Dict[str, str]] | None = None, **kwargs)[source]¶