unitxt.dialog_operators module

Dialog Serializers.

Dialog serializers are the way to take dialog data and turn it into text that can be fed to the model.

The format of the dialog is:

dialog = [

{“user”: “hello”, “system”: “hi”}, {“user”: “kkk”, “system”: “”}, {“user”: “kkk”, “system”: “”},

]

class unitxt.dialog_operators.SerializeDialog(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, caching: bool = None, apply_to_streams: ~typing.List[str] = None, dont_apply_to_streams: ~typing.List[str] = None, field: str | None = None, to_field: str | None = None, field_to_field: ~typing.List[~typing.List[str]] | ~typing.Dict[str, str] | None = None, use_query: bool, process_every_value: bool = False, get_default: ~typing.Any = None, not_exist_ok: bool = False, format: ~unitxt.formats.SystemFormat | None = None, last_response_to_field: str | None = None, context_field: str | None = None, context_separator: str = ' ')

Bases: InstanceFieldOperator

Serializes dialog data for feeding into a model.

This class takes structured dialog data and converts it into a text format according to a specified template. It allows for the inclusion or exclusion of system responses and can operate on a per-turn basis or aggregate the entire dialog.

field

The field in the input data that contains the dialog.

Type:

str

to_field

The field in the output data where the serialized dialog will be stored.

Type:

Optional[str]

last_user_turn_to_field

Field to store the last user turn.

Type:

Optional[str]

last_system_turn_to_field

Field to store the last system turn.

Type:

Optional[str]

context_field

Field that contains additional context to be prepended to the dialog.

Type:

Optional[str]