unitxt.text_utils module¶

unitxt.text_utils.camel_to_snake_case(s)[source]¶

Converts a string from camelCase to snake_case.

Parameters:: s (str) – The string to be converted.
Returns:: The string converted to snake_case.
Return type:: str

unitxt.text_utils.construct_dict_as_python_lines(d, indent_delta=4) → List[str][source]¶

Constructs the lines of a dictionary formatted as a piece of python code.

Parameters:

d – The element to be formatted.
indent_delta (int, optional) – The amount of spaces to add for each level of indentation. Defaults to 2.

unitxt.text_utils.construct_dict_as_yaml_lines(d, indent_delta=2) → List[str][source]¶

Constructs the lines of a dictionary formatted as yaml.

Parameters:

d – The element to be formatted.
indent_delta (int, optional) – The amount of spaces to add for each level of indentation. Defaults to 2.

unitxt.text_utils.is_camel_case(s)[source]¶

Checks if a string is in camelCase.

Parameters:: s (str) – The string to be checked.
Returns:: True if the string is in camelCase, False otherwise.
Return type:: bool

unitxt.text_utils.is_made_of_sub_strings(string, sub_strings)[source]¶

unitxt.text_utils.is_snake_case(s)[source]¶

Checks if a string is in snake_case.

Parameters:: s (str) – The string to be checked.
Returns:: True if the string is in snake_case, False otherwise.
Return type:: bool

unitxt.text_utils.lines_defining_obj_in_card(all_lines: List[str], obj_name: str, start_search_at_line: int = 0) → Tuple[int, int][source]¶

unitxt.text_utils.nested_tuple_to_string(nested_tuple: tuple) → str[source]¶

Converts a nested tuple to a string, with elements separated by underscores.

Parameters:: nested_tuple (tuple) – The nested tuple to be converted.
Returns:: The string representation of the nested tuple.
Return type:: str

unitxt.text_utils.print_dict(d, indent=0, indent_delta=4, max_chars=None, keys_to_print=None, log_level='info')[source]¶

unitxt.text_utils.print_dict_as_python(d: dict, indent_delta=4) → str[source]¶

unitxt.text_utils.print_dict_as_yaml(d: dict, indent_delta=2) → str[source]¶

unitxt.text_utils.split_words(s)[source]¶

Splits a string into words based on PascalCase, camelCase, snake_case, kebab-case, and numbers attached to strings.

Parameters:: s (str) – The string to be split.
Returns:: The list of words obtained after splitting the string.
Return type:: list

unitxt.text_utils.to_pretty_string(value, indent=0, indent_delta=4, max_chars=None, keys=None, item_label=None, float_format=None)[source]¶

Constructs a formatted string representation of various data structures (dicts, lists, tuples, and DataFrames).

Parameters:

value – The Python data structure to be formatted.
indent (int, optional) – The current level of indentation. Defaults to 0.
indent_delta (int, optional) – Amount of spaces to add per indentation level. Defaults to 4.
max_chars (int, optional) – Max characters per line before wrapping. Defaults to terminal width - 10.
keys (List[str], optional) – For dicts, optionally specify keys and order.
item_label (str, optional) – Internal parameter for labeling items.
float_format (str, optional) – Format string for float values (e.g., “.2f”). Defaults to None.