unitxt.struct_data_operators module¶
This section describes unitxt operators for structured data.
These operators are specialized in handling structured data like tables. For tables, expected input format is:
{
"header": ["col1", "col2"],
"rows": [["row11", "row12"], ["row21", "row22"], ["row31", "row32"]]
}
For triples, expected input format is:
[[ "subject1", "relation1", "object1" ], [ "subject1", "relation2", "object2"]]
For key-value pairs, expected input format is:
{"key1": "value1", "key2": value2, "key3": "value3"}
- class unitxt.struct_data_operators.ConstructTableFromRowsCols(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, fields: List[str] = __required__, to_field: str = __required__)[source]¶
Bases:
InstanceOperator
Maps column and row field into single table field encompassing both header and rows.
field[0] = header string as List field[1] = rows string as List[List] field[2] = table caption string(optional)
- class unitxt.struct_data_operators.ConvertTableColNamesToSequential(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
Replaces actual table column names with static sequential names like col_0, col_1,…
Sample input: { "header": ["name", "age"], "rows": [["Alex", 21], ["Donald", 34]] } Sample output: { "header": ["col_0", "col_1"], "rows": [["Alex", 21], ["Donald", 34]] }
- class unitxt.struct_data_operators.DumpJson(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
- class unitxt.struct_data_operators.DuplicateTableColumns(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>, column_indices: List[int] = [], times: int = 1)[source]¶
Bases:
TypeDependentAugmentor
Duplicates specific columns of a table for the given number of times.
- Parameters:
column_indices (List[int]) – columns to be duplicated
times (int) – each column to be duplicated is to show that many times
- column_indices: List[int] = []¶
- class unitxt.struct_data_operators.DuplicateTableRows(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>, row_indices: List[int] = [], times: int = 1)[source]¶
Bases:
TypeDependentAugmentor
Duplicates specific rows of a table for the given number of times.
- Parameters:
row_indices (List[int]) – rows to be duplicated
times (int) – each row to be duplicated is to show that many times
- row_indices: List[int] = []¶
- class unitxt.struct_data_operators.GetNumOfTableCells(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
Get the number of cells in the given table.
- class unitxt.struct_data_operators.InsertEmptyTableRows(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>, times: int = 0)[source]¶
Bases:
TypeDependentAugmentor
Inserts empty rows in a table randomly for the given number of times.
- Parameters:
times (int) –
- class unitxt.struct_data_operators.JsonStrToDict(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
Convert a Json string of representing key value as dictionary.
Ensure keys and values are strings, and there are no None values.
- class unitxt.struct_data_operators.ListToKeyValPairs(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, fields: List[str] = __required__, to_field: str = __required__)[source]¶
Bases:
InstanceOperator
Maps list of keys and values into key:value pairs.
Sample input in expected format: {“keys”: [“name”, “age”, “sex”], “values”: [“Alex”, 31, “M”]} Sample output: {“name”: “Alex”, “age”: 31, “sex”: “M”}
- class unitxt.struct_data_operators.LoadJson(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, failure_value: Any = None, allow_failure: bool = False)[source]¶
Bases:
FieldOperator
- class unitxt.struct_data_operators.MapHTMLTableToJSON(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = ['bs4'], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
Converts HTML table format to the basic one (JSON).
JSON format:
{ "header": ["col1", "col2"], "rows": [["row11", "row12"], ["row21", "row22"], ["row31", "row32"]] }
- class unitxt.struct_data_operators.MapTableListsToStdTableJSON(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
Converts lists table format to the basic one (JSON).
JSON format:
{ "header": ["col1", "col2"], "rows": [["row11", "row12"], ["row21", "row22"], ["row31", "row32"]] }
- class unitxt.struct_data_operators.MaskColumnsNames(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶
Bases:
TypeDependentAugmentor
Mask the names of tables columns with dummies “Col1”, “Col2” etc.
- class unitxt.struct_data_operators.SerializeKeyValPairs(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
Serializes key, value pairs into a flat sequence.
Sample input in expected format: {“name”: “Alex”, “age”: 31, “sex”: “M”} Sample output: name is Alex, age is 31, sex is M
- class unitxt.struct_data_operators.SerializeTable(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶
Bases:
ABC
,TableSerializer
TableSerializer converts a given table into a flat sequence with special symbols.
Output format varies depending on the chosen serializer. This abstract class defines structure of a typical table serializer that any concrete implementation should follow.
- class unitxt.struct_data_operators.SerializeTableAsConcatenation(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶
Bases:
SerializeTable
Concat Serializer.
Concat all table content to one string of header and rows. Format(Sample): name age Alex 26 Diana 34
- class unitxt.struct_data_operators.SerializeTableAsDFLoader(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶
Bases:
SerializeTable
DFLoader Table Serializer.
Pandas dataframe based code snippet format serializer. Format(Sample):
pd.DataFrame({ "name" : ["Alex", "Diana", "Donald"], "age" : [26, 34, 39] }, index=[0,1,2])
- class unitxt.struct_data_operators.SerializeTableAsHTML(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶
Bases:
SerializeTable
HTML Table Serializer.
HTML table format used for rendering tables in web pages. Format(Sample):
<table> <thead> <tr><th>name</th><th>age</th><th>sex</th></tr> </thead> <tbody> <tr><td>Alice</td><td>26</td><td>F</td></tr> <tr><td>Raj</td><td>34</td><td>M</td></tr> </tbody> </table>
- class unitxt.struct_data_operators.SerializeTableAsImage(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = ['matplotlib', 'pillow'], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶
Bases:
SerializeTable
- class unitxt.struct_data_operators.SerializeTableAsIndexedRowMajor(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶
Bases:
SerializeTable
Indexed Row Major Table Serializer.
Commonly used row major serialization format. Format: col : col1 | col2 | col 3 row 1 : val1 | val2 | val3 | val4 row 2 : val1 | …
- class unitxt.struct_data_operators.SerializeTableAsJson(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶
Bases:
SerializeTable
JSON Table Serializer.
Json format based serializer. Format(Sample):
{ "0":{"name":"Alex","age":26}, "1":{"name":"Diana","age":34}, "2":{"name":"Donald","age":39} }
- class unitxt.struct_data_operators.SerializeTableAsMarkdown(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, serialized_type: object = <class 'unitxt.types.Table'>, seed: int = 0, shuffle_rows: bool = False, shuffle_columns: bool = False)[source]¶
Bases:
SerializeTable
Markdown Table Serializer.
Markdown table format is used in GitHub code primarily. Format:
|col1|col2|col3| |---|---|---| |A|4|1| |I|2|1| ...
- class unitxt.struct_data_operators.SerializeTableRowAsList(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, fields: str = __required__, to_field: str = __required__, max_cell_length: int | NoneType = None)[source]¶
Bases:
InstanceOperator
Serializes a table row as list.
- Parameters:
fields (str) –
to_field (str) –
max_cell_length (int) –
- class unitxt.struct_data_operators.SerializeTableRowAsText(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, fields: str = __required__, to_field: str = __required__, max_cell_length: int | NoneType = None)[source]¶
Bases:
InstanceOperator
Serializes a table row as text.
- Parameters:
fields (str) –
to_field (str) –
max_cell_length (int) –
- class unitxt.struct_data_operators.SerializeTriples(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False)[source]¶
Bases:
FieldOperator
Serializes triples into a flat sequence.
Sample input in expected format: [[ “First Clearing”, “LOCATION”, “On NYS 52 1 Mi. Youngsville” ], [ “On NYS 52 1 Mi. Youngsville”, “CITY_OR_TOWN”, “Callicoon, New York”]]
Sample output: First Clearing : LOCATION : On NYS 52 1 Mi. Youngsville | On NYS 52 1 Mi. Youngsville : CITY_OR_TOWN : Callicoon, New York
- class unitxt.struct_data_operators.ShuffleColumnsNames(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶
Bases:
TypeDependentAugmentor
Shuffle table columns names to be displayed in random order.
- class unitxt.struct_data_operators.ShuffleTableColumns(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶
Bases:
TypeDependentAugmentor
Shuffles the table columns randomly.
Sample Input: { "header": ["name", "age"], "rows": [["Alex", 26], ["Raj", 34], ["Donald", 39]], } Sample Output: { "header": ["age", "name"], "rows": [[26, "Alex"], [34, "Raj"], [39, "Donald"]], }
- class unitxt.struct_data_operators.ShuffleTableRows(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶
Bases:
TypeDependentAugmentor
Shuffles the input table rows randomly.
Sample Input: { "header": ["name", "age"], "rows": [["Alex", 26], ["Raj", 34], ["Donald", 39]], } Sample Output: { "header": ["name", "age"], "rows": [["Donald", 39], ["Raj", 34], ["Alex", 26]], }
- class unitxt.struct_data_operators.TransposeTable(data_classification_policy: List[str] = None, _requirements_list: Union[List[str], Dict[str, str]] = [], requirements: Union[List[str], Dict[str, str]] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: Union[str, NoneType] = None, to_field: Union[str, NoneType] = None, field_to_field: Union[List[List[str]], Dict[str, str], NoneType] = None, use_query: Union[bool, NoneType] = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, augmented_type: object = <class 'unitxt.types.Table'>)[source]¶
Bases:
TypeDependentAugmentor
Transpose a table.
Sample Input: { "header": ["name", "age", "sex"], "rows": [["Alice", 26, "F"], ["Raj", 34, "M"], ["Donald", 39, "M"]], } Sample Output: { "header": [" ", "0", "1", "2"], "rows": [["name", "Alice", "Raj", "Donald"], ["age", 26, 34, 39], ["sex", "F", "M", "M"]], }
- class unitxt.struct_data_operators.TruncateTableCells(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, max_length: int = 15, table: str = None, text_output: str | NoneType = None)[source]¶
Bases:
InstanceOperator
Limit the maximum length of cell values in a table to reduce the overall length.
- Parameters:
max_length (int) –
answer (For tasks that produce a cell value as) –
replicated (truncating a cell value should be) –
implementation. (with truncating the corresponding answer as well. This has been addressed in the) –
- class unitxt.struct_data_operators.TruncateTableRows(data_classification_policy: List[str] = None, _requirements_list: List[str] | Dict[str, str] = [], requirements: List[str] | Dict[str, str] = [], caching: bool = None, apply_to_streams: List[str] = None, dont_apply_to_streams: List[str] = None, field: str | NoneType = None, to_field: str | NoneType = None, field_to_field: List[List[str]] | Dict[str, str] | NoneType = None, use_query: bool | NoneType = None, process_every_value: bool = False, get_default: Any = None, not_exist_ok: bool = False, not_exist_do_nothing: bool = False, rows_to_keep: int = 10)[source]¶
Bases:
FieldOperator
Limits table rows to specified limit by removing excess rows via random selection.
- Parameters:
rows_to_keep (int) – number of rows to keep.