unitxt.struct_data_operators module¶

This section describes unitxt operators for structured data.

These operators are specialized in handling structured data like tables. For tables, expected input format is:

{
    "header": ["col1", "col2"],
    "rows": [["row11", "row12"], ["row21", "row22"], ["row31", "row32"]]
}

For triples, expected input format is:

[[ "subject1", "relation1", "object1" ], [ "subject1", "relation2", "object2"]]

For key-value pairs, expected input format is:

{"key1": "value1", "key2": value2, "key3": "value3"}

class unitxt.struct_data_operators.ConstructTableFromRowsCols(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, fields: List[str] = __required__, to_field: str = __required__)[source]¶

Bases: InstanceOperator

Maps column and row field into single table field encompassing both header and rows.

field[0] = header string as List field[1] = rows string as List[List] field[2] = table caption string(optional)

class unitxt.struct_data_operators.ConvertTableColNamesToSequential(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶

Bases: FieldOperator

Replaces actual table column names with static sequential names like col_0, col_1,…

Sample input:
{
    "header": ["name", "age"],
    "rows": [["Alex", 21], ["Donald", 34]]
}

Sample output:
{
    "header": ["col_0", "col_1"],
    "rows": [["Alex", 21], ["Donald", 34]]
}

class unitxt.struct_data_operators.DumpJson(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶: Bases: FieldOperator

class unitxt.struct_data_operators.DuplicateTableColumns(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>, column_indices: List[int] = [], times: int = 1)[source]¶

Bases: TypeDependentAugmentor

Duplicates specific columns of a table for the given number of times.

Parameters:

column_indices (List[int]) – columns to be duplicated
times (int) – each column to be duplicated is to show that many times

augmented_type[source]¶: alias of Table

column_indices: List[int] = []¶

class unitxt.struct_data_operators.DuplicateTableRows(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>, row_indices: List[int] = [], times: int = 1)[source]¶

Bases: TypeDependentAugmentor

Duplicates specific rows of a table for the given number of times.

Parameters:

row_indices (List[int]) – rows to be duplicated
times (int) – each row to be duplicated is to show that many times

augmented_type[source]¶: alias of Table

row_indices: List[int] = []¶

class unitxt.struct_data_operators.GetNumOfTableCells(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶

Bases: FieldOperator

Get the number of cells in the given table.

class unitxt.struct_data_operators.InsertEmptyTableRows(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>, times: int = 0)[source]¶

Bases: TypeDependentAugmentor

Inserts empty rows in a table randomly for the given number of times.

Parameters:: times (int) –

augmented_type[source]¶: alias of Table

class unitxt.struct_data_operators.JsonStrToDict(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶

Bases: FieldOperator

Convert a Json string of representing key value as dictionary.

Ensure keys and values are strings, and there are no None values.

class unitxt.struct_data_operators.ListToKeyValPairs(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, fields: List[str] = __required__, to_field: str = __required__)[source]¶

Bases: InstanceOperator

Maps list of keys and values into key:value pairs.

Sample input in expected format: {“keys”: [“name”, “age”, “sex”], “values”: [“Alex”, 31, “M”]} Sample output: {“name”: “Alex”, “age”: 31, “sex”: “M”}

class unitxt.struct_data_operators.LoadJson(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, failure_value: Any = None, allow_failure: bool = False)[source]¶: Bases: FieldOperator

class unitxt.struct_data_operators.MapHTMLTableToJSON(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = ['bs4'], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶

Bases: FieldOperator

Converts HTML table format to the basic one (JSON).

JSON format:

{
    "header": ["col1", "col2"],
    "rows": [["row11", "row12"], ["row21", "row22"], ["row31", "row32"]]
}

class unitxt.struct_data_operators.MapTableListsToStdTableJSON(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶

Bases: FieldOperator

Converts lists table format to the basic one (JSON).

JSON format:

{
    "header": ["col1", "col2"],
    "rows": [["row11", "row12"], ["row21", "row22"], ["row31", "row32"]]
}

class unitxt.struct_data_operators.MaskColumnsNames(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶

Bases: TypeDependentAugmentor

Mask the names of tables columns with dummies “Col1”, “Col2” etc.

augmented_type[source]¶: alias of Table

class unitxt.struct_data_operators.MultipleToolCallPostProcessor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, failure_value: Any = None, allow_failure: bool = False)[source]¶: Bases: FieldOperator

class unitxt.struct_data_operators.ParseCSV(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, separator: str = ',', has_header: bool = True, skip_header: bool = False, dtype: Literal['str'] | NoneType = None, strip_cells: bool = False)[source]¶

Bases: FieldOperator

Parse CSV/TSV text content into table format.

This operator converts CSV or TSV text content into the standard table format used by Unitxt with header and rows fields.

Parameters:

separator (str) – Field separator character. Defaults to ‘,’.
has_header (bool) – Whether the first row contains column headers. Defaults to True.
skip_header (bool) – Whether to skip the first row entirely. Defaults to False.

Example

Parsing CSV content

ParseCSV(field="csv_content", to_field="table", separator=",")

Parsing TSV content

ParseCSV(field="tsv_content", to_field="table", separator="\t")

class unitxt.struct_data_operators.PythonCallProcessor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶: Bases: FieldOperator

class unitxt.struct_data_operators.SerializeKeyValPairs(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶

Bases: FieldOperator

Serializes key, value pairs into a flat sequence.

Sample input in expected format: {“name”: “Alex”, “age”: 31, “sex”: “M”} Sample output: name is Alex, age is 31, sex is M

class unitxt.struct_data_operators.SerializeTable(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶

Bases: ABC, TableSerializer

TableSerializer converts a given table into a flat sequence with special symbols.

Output format varies depending on the chosen serializer. This abstract class defines structure of a typical table serializer that any concrete implementation should follow.

class unitxt.struct_data_operators.SerializeTableAsConcatenation(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶

Bases: SerializeTable

Concat Serializer.

Concat all table content to one string of header and rows. Format(Sample): name age Alex 26 Diana 34

class unitxt.struct_data_operators.SerializeTableAsDFLoader(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶

Bases: SerializeTable

DFLoader Table Serializer.

Pandas dataframe based code snippet format serializer. Format(Sample):

pd.DataFrame({
    "name" : ["Alex", "Diana", "Donald"],
    "age" : [26, 34, 39]
},
index=[0,1,2])

class unitxt.struct_data_operators.SerializeTableAsHTML(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶

Bases: SerializeTable

HTML Table Serializer.

HTML table format used for rendering tables in web pages. Format(Sample):

<table>
    <thead>
        <tr><th>name</th><th>age</th><th>sex</th></tr>
    </thead>
    <tbody>
        <tr><td>Alice</td><td>26</td><td>F</td></tr>
        <tr><td>Raj</td><td>34</td><td>M</td></tr>
    </tbody>
</table>

class unitxt.struct_data_operators.SerializeTableAsImage(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = ['matplotlib', 'pillow'], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶: Bases: SerializeTable

class unitxt.struct_data_operators.SerializeTableAsIndexedRowMajor(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶

Bases: SerializeTable

Indexed Row Major Table Serializer.

Commonly used row major serialization format. Format: col : col1 | col2 | col 3 row 1 : val1 | val2 | val3 | val4 row 2 : val1 | …

class unitxt.struct_data_operators.SerializeTableAsJson(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶

Bases: SerializeTable

JSON Table Serializer.

Json format based serializer. Format(Sample):

{
    "0":{"name":"Alex","age":26},
    "1":{"name":"Diana","age":34},
    "2":{"name":"Donald","age":39}
}

class unitxt.struct_data_operators.SerializeTableAsMarkdown(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶

Bases: SerializeTable

Markdown Table Serializer.

Markdown table format is used in GitHub code primarily. Format:

|col1|col2|col3|
|---|---|---|
|A|4|1|
|I|2|1|
...

class unitxt.struct_data_operators.SerializeTableRowAsList(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, fields: str = __required__, to_field: str = __required__, max_cell_length: int | NoneType = None)[source]¶

Bases: InstanceOperator

Serializes a table row as list.

Parameters:

fields (str) –
to_field (str) –
max_cell_length (int) –

class unitxt.struct_data_operators.SerializeTableRowAsText(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, fields: str = __required__, to_field: str = __required__, max_cell_length: int | NoneType = None)[source]¶

Bases: InstanceOperator

Serializes a table row as text.

Parameters:

fields (str) –
to_field (str) –
max_cell_length (int) –

class unitxt.struct_data_operators.SerializeTriples(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶

Bases: FieldOperator

Serializes triples into a flat sequence.

Sample input in expected format: [[ “First Clearing”, “LOCATION”, “On NYS 52 1 Mi. Youngsville” ], [ “On NYS 52 1 Mi. Youngsville”, “CITY_OR_TOWN”, “Callicoon, New York”]]

Sample output: First Clearing : LOCATION : On NYS 52 1 Mi. Youngsville | On NYS 52 1 Mi. Youngsville : CITY_OR_TOWN : Callicoon, New York

class unitxt.struct_data_operators.ShuffleColumnsNames(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶

Bases: TypeDependentAugmentor

Shuffle table columns names to be displayed in random order.

augmented_type[source]¶: alias of Table

class unitxt.struct_data_operators.ShuffleTableColumns(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶

Bases: TypeDependentAugmentor

Shuffles the table columns randomly.

Sample Input:
    {
        "header": ["name", "age"],
        "rows": [["Alex", 26], ["Raj", 34], ["Donald", 39]],
    }

Sample Output:
    {
        "header": ["age", "name"],
        "rows": [[26, "Alex"], [34, "Raj"], [39, "Donald"]],
    }

augmented_type[source]¶: alias of Table

class unitxt.struct_data_operators.ShuffleTableRows(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶

Bases: TypeDependentAugmentor

Shuffles the input table rows randomly.

Sample Input:
{
    "header": ["name", "age"],
    "rows": [["Alex", 26], ["Raj", 34], ["Donald", 39]],
}

Sample Output:
{
    "header": ["name", "age"],
    "rows": [["Donald", 39], ["Raj", 34], ["Alex", 26]],
}

augmented_type[source]¶: alias of Table

class unitxt.struct_data_operators.ToolCallPostProcessor(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, failure_value: Any = None, allow_failure: bool = False)[source]¶: Bases: FieldOperator

class unitxt.struct_data_operators.TransposeTable(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶

Bases: TypeDependentAugmentor

Transpose a table.

Sample Input:
    {
        "header": ["name", "age", "sex"],
        "rows": [["Alice", 26, "F"], ["Raj", 34, "M"], ["Donald", 39, "M"]],
    }

Sample Output:
    {
        "header": [" ", "0", "1", "2"],
        "rows": [["name", "Alice", "Raj", "Donald"], ["age", 26, 34, 39], ["sex", "F", "M", "M"]],
    }

augmented_type[source]¶: alias of Table

class unitxt.struct_data_operators.TruncateTableCells(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, max_length: int = 15, table: str = None, text_output: str | NoneType = None)[source]¶

Bases: InstanceOperator

Limit the maximum length of cell values in a table to reduce the overall length.

Parameters:

max_length (int) –
answer (For tasks that produce a cell value as) –
replicated (truncating a cell value should be) –
implementation. (with truncating the corresponding answer as well. This has been addressed in the) –

class unitxt.struct_data_operators.TruncateTableRows(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, set_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, rows_to_keep: int = 10)[source]¶

Bases: FieldOperator

Limits table rows to specified limit by removing excess rows via random selection.

Parameters:: rows_to_keep (int) – number of rows to keep.

unitxt.struct_data_operators.extract_possible_json_str(text)[source]¶: Extract potential JSON string from text by finding outermost braces/brackets.

unitxt.struct_data_operators.shuffle_columns(table: Table, seed=0) → Table[source]¶

unitxt.struct_data_operators.shuffle_rows(table: Table, seed=0) → Table[source]¶

unitxt.struct_data_operators.truncate_cell(cell_value, max_len)[source]¶