unitxt.text_utils module¶

unitxt.text_utils.camel_to_snake_case(s)[source]¶

Converts a string from camelCase to snake_case.

Parameters:

s (str) – The string to be converted.

Returns:

The string converted to snake_case.

Return type:

str

unitxt.text_utils.construct_dict_as_python_lines(d, indent_delta=4) List[str][source]¶

Constructs the lines of a dictionary formatted as a piece of python code.

Parameters:
  • d – The element to be formatted.

  • indent_delta (int, optional) – The amount of spaces to add for each level of indentation. Defaults to 2.

unitxt.text_utils.construct_dict_as_yaml_lines(d, indent_delta=2) List[str][source]¶

Constructs the lines of a dictionary formatted as yaml.

Parameters:
  • d – The element to be formatted.

  • indent_delta (int, optional) – The amount of spaces to add for each level of indentation. Defaults to 2.

unitxt.text_utils.is_camel_case(s)[source]¶

Checks if a string is in camelCase.

Parameters:

s (str) – The string to be checked.

Returns:

True if the string is in camelCase, False otherwise.

Return type:

bool

unitxt.text_utils.is_made_of_sub_strings(string, sub_strings)[source]¶
unitxt.text_utils.is_snake_case(s)[source]¶

Checks if a string is in snake_case.

Parameters:

s (str) – The string to be checked.

Returns:

True if the string is in snake_case, False otherwise.

Return type:

bool

unitxt.text_utils.lines_defining_obj_in_card(all_lines: List[str], obj_name: str, start_search_at_line: int = 0) Tuple[int, int][source]¶
unitxt.text_utils.nested_tuple_to_string(nested_tuple: tuple) str[source]¶

Converts a nested tuple to a string, with elements separated by underscores.

Parameters:

nested_tuple (tuple) – The nested tuple to be converted.

Returns:

The string representation of the nested tuple.

Return type:

str

unitxt.text_utils.print_dict(d, indent=0, indent_delta=4, max_chars=None, keys_to_print=None, log_level='info')[source]¶
unitxt.text_utils.print_dict_as_python(d: dict, indent_delta=4) str[source]¶
unitxt.text_utils.print_dict_as_yaml(d: dict, indent_delta=2) str[source]¶
unitxt.text_utils.split_words(s)[source]¶

Splits a string into words based on PascalCase, camelCase, snake_case, kebab-case, and numbers attached to strings.

Parameters:

s (str) – The string to be split.

Returns:

The list of words obtained after splitting the string.

Return type:

list

unitxt.text_utils.to_pretty_string(value, indent=0, indent_delta=4, max_chars=None, keys=None, item_label=None, float_format=None)[source]¶

Constructs a formatted string representation of various data structures (dicts, lists, tuples, and DataFrames).

Parameters:
  • value – The Python data structure to be formatted.

  • indent (int, optional) – The current level of indentation. Defaults to 0.

  • indent_delta (int, optional) – Amount of spaces to add per indentation level. Defaults to 4.

  • max_chars (int, optional) – Max characters per line before wrapping. Defaults to terminal width - 10.

  • keys (List[str], optional) – For dicts, optionally specify keys and order.

  • item_label (str, optional) – Internal parameter for labeling items.

  • float_format (str, optional) – Format string for float values (e.g., “.2f”). Defaults to None.