Glossary¶
Artifact¶
An artifact is a class that can be save in human readable format in the Unitxt catalog. Almost all Unitxt classes inherit from the Artifact class.
Catalog¶
All Unitxt artifacts – recipes, data-task cards, templates, pre-processing operators, formats and metrics – can be stored in the Unitxt Catalog.
In addition to the open-source catalog, that can be found in the documentation, users can choose to define a private catalog. This enables teams and organizations to harness the open Unitxt Catalog while upholding organizational requirements for additional proprietary artifacts.

Data Preparation Pipeline¶
The data preparation pipeline begins with standardizing the raw data into the task interface, as defined in the data-task card. The examples are then verbalized by the template, and the format operator applies system prompts, special tokens and in-context learning examples. To maintain compatibility, the output of this pipeline is a HuggingFace dataset, that can be saved or pushed to the hub.
The data preparation pipeline can be seen as the top flow in the following figure:

Data-Task Card¶
Defines how raw data is load from the dataset source and how it is standardized for a certain task. Typically, this includes data wrangling actions, e.g. renaming fields, filtering data instances, modifying values, train/test/val splitting etc.
The catalog contains predefined data-task cards for various datasets here.
Evaluation Pipeline¶
The evaluation pipeline is responsible for producing a list of evaluation scores that reflect model performance on a give dataset. It includes a de-verbalization of the model outputs (as defined in the template), and a computation of performance by the metrics defined in the task.
The standardization of the task interface, namely, having fixed names and types for its input and output fields, allows the use of any metric that accept such fields as input. In addition to the computed evaluation scores, Unitxt metrics supports a built in mechanism for confidence interval reporting, using statistical bootstrap.
Extensions¶
Unitxt supports Extensions such as “input-augmentation” (for example, adding random whitespace, introducing spelling mistakes, or replacing words with their synonyms) or label-noising (replaces the labels in the demonstrations randomly from a list of options). Such extensions can be added anywhere in the data-preparation pipeline between any two operators, depending on the desired logic (see the unitxt flow diagram).
Unitxt supports the addition of custom extensions to the Unitxt Catalog. Each extension is an independent unit, reusable across different datasets and tasks, templates and formats.
Format¶
A Unitxt Format defines a set of additional formatting requirements, unrelated to the underlying data or task, including those pertaining to system prompts, special tokens or user/agent prefixes, and in-context demonstrations.
Following the example in figure, the Unitxt format receives the text produced by the template “classify the sentence: ``I like toast’’”, and adds the system prompt “<SYS>You are a helpful agent</SYS>”, the Instruction-User-Agent schema cues, and the two presented demonstrations.
The catalog contains predefined formats here.
Operator¶
An operator is a class that takes multiple streams as input and produces multiple streams as output. Every modification of the data in the stream is done by an operator. Every operator should perform a single task and its name should reflect its operation.

Examples: AddDictToEveryInstance, RenameField, etc.
Post processors¶
Post processors are a set of operator. Each template defines the set of post processor that are appropriate for it. For example, post processors in a binary classification template could remove trailing whitespace, take the first word, convert Yes` to 1 , and all other values to 0.
Recipe¶
A Recipe holds a complete specification of a unitxt pipeline.
This includes DataTask card, Template, Format and parameters for different Extensions.
References¶
References are the “correct answers” for the task for a given instance. They are stored as a list of strings in the references field of the generated Unitxt dataset. For example, a reference for a binary classification task could be Yes` or No.
It is expect that the model will get a perfect score from the metrics if the model prediction is equal to one of the references.
The textual references are processed by the Template’s Post processors before passed to the Metrics. The post processor de-verbalize the textual representation of the references and converted it to the types required by the metric. For example, Yes and No values could be converted into 0.0` and 1.
Target¶
The target is one of the references. It is used as the expected model output in in-context learning demonstrations.
Stream¶
A stream is a sequence of data. It can be finite or infinite. It can be synchronous or asynchronous. Every instance in the stream is a simple python dictionary.


System Prompt¶
System prompt defines the fixed text that is added to the model input by the Format during the verbalization process. It is specified by the system_prompt parameter of the recipe
Task¶
A Unitxt task follows the formal definition of an NLP task, such as multi-label classification, named entity extraction, abstractive summarization or translation. A task is defined by its standard interface – namely, input and output fields – and by its evaluation metrics. Given a dataset, its contents are standardized into the fields defined by an appropriate task by a Data-Task Card.
The catalog contains predefined tasks here.
Template¶
A Unitxt Template defines the verbalizations to be applied to the inputs and targets, as well as the de-verbalization operations over the model predictions. For example, applying the template to “I like toast” verbalizes it into “classify the sentence: ``I like toast’’”:
In the other direction, template de-verbalization involves two steps. First, a general standardization of the output texts: taking only the first non-empty line of a model’s predictions, lowercasing, stripping whitespaces, etc. The second step standardizes the output to the specific task at-hand. For example, in Sentence Similarity, a prediction may be a quantized float number outputted as a string (e.g ``2.43’’), or a verbally expressed numeric expression (e.g ``two and a half’’). This depends on the verbalization defined by the template and the in-context demonstrations it constructs. Both types of outputs should be standardized before evaluation begins – e.g. to a float for sentence similarity. Having the de-verbalization steps defined within the template enables templates reuse across different models and datasets.
The templates, datasets and tasks in Unitxt are not exclusively tied. Each task can harness multiple templates and a template can be used for different datasets.
The catalog contains predefined templates here. Tasks section
Verbalization¶
Verbalization is the process of taking the task fields and converting them into the textual representation , which is provided as input to the model.
The verbalization process involves multiple components, the Template verbalizes the task specific prompt, while the Format and System prompt verbalizes any model specific requirements (e.g. system prompt, dialog prefixes) as well as in-context examples,
The verbalization involves verbalizing the tasks input fields for the input, and the task output fields for references.
