unitxt.sql_utils module

class unitxt.sql_utils.Cache[source]

Bases: object

A class that provides disk-based caching functionality for a given function.

async async_get_or_set(key, compute_fn, no_cache=False, refresh=False)[source]
async_memoize(key_func=<function generate_cache_key>, no_cache=False, refresh=False)[source]
get_or_set(key, compute_fn, no_cache=False, refresh=False)[source]
memoize(key_func=<function generate_cache_key>, no_cache=False, refresh=False)[source]
class unitxt.sql_utils.DatabaseConnector(db_config: SQLDatabase)[source]

Bases: ABC

Abstract base class for database connectors.

abstract execute_query(query: str) Any[source]

Abstract method to execute a query against the database.

abstract get_table_schema() str[source]

Abstract method to get database schema.

class unitxt.sql_utils.InMemoryDatabaseConnector(db_config: SQLDatabase)[source]

Bases: DatabaseConnector

Database connector for mocking databases with in-memory data structures.

execute_query(query: str) Any[source]

Simulates executing a query against the mock database.

get_table_schema(select_tables: List[str] | None = None) str[source]

Generates a mock schema from the tables structure.

class unitxt.sql_utils.LocalSQLiteConnector(db_config: SQLDatabase)[source]

Bases: DatabaseConnector

Database connector for SQLite databases.

download_database(db_id)[source]

Downloads the database from huggingface if needed.

execute_query(query: str) Any[source]

Executes a query against the SQLite database.

get_db_file_path(db_id)[source]

Gets the local path of a downloaded database file.

get_table_schema() str[source]

Extracts schema from an SQLite database.

class unitxt.sql_utils.RemoteDatabaseConnector(db_config: SQLDatabase)[source]

Bases: DatabaseConnector

Database connector for remote databases accessed via HTTP.

execute_query(query: str) Any[source]

Executes a query against the remote database, with retries for certain exceptions.

get_table_schema() str[source]

Retrieves the schema of a database.

unitxt.sql_utils.collect_clause(statement, clause_keyword)[source]

Parse SQL statement and collect clauses.

unitxt.sql_utils.execute_query_local(db_path: str, query: str) Any

Executes a query against the SQLite database.

unitxt.sql_utils.execute_query_remote(api_url: str, database_id: str, api_key: str, query: str, retryable_exceptions: tuple = (<class 'requests.exceptions.ConnectionError'>, <class 'requests.exceptions.ReadTimeout'>), max_retries: int = 3, retry_delay: int = 5, timeout: int = 30) -> (typing.Union[dict, NoneType], <class 'str'>)

Executes a query against the remote database, with retries for certain exceptions.

unitxt.sql_utils.extract_select_columns(statement)[source]

Parse SQL using sqlparse and extract columns.

unitxt.sql_utils.extract_select_info(sql: str)[source]

Parse SQL using sqlparse and return a dict of extracted columns and clauses.

unitxt.sql_utils.generate_cache_key(*args, **kwargs)[source]

Generate a stable hashable cache key for various input types.

Parameters:
  • args – Positional arguments of the function.

  • kwargs – Keyword arguments of the function.

Returns:

A hashed key as a string.

unitxt.sql_utils.get_cache()[source]

Returns a singleton cache instance, initializing it if necessary.

unitxt.sql_utils.get_db_connector(db_type: str)[source]

Creates and returns the appropriate DatabaseConnector instance based on db_type.

unitxt.sql_utils.is_sqlglot_parsable(sql: str, db_type='sqlite') bool[source]

Returns True if sqlglot does not encounter any error, False otherwise.

unitxt.sql_utils.is_sqlparse_parsable(sql: str) bool[source]

Returns True if sqlparse does not encounter any error, False otherwise.

unitxt.sql_utils.sql_exact_match(sql1: str, sql2: str) bool[source]

Return True if two SQL strings match after very basic normalization.

unitxt.sql_utils.sqlglot_optimized_equivalence(expected: str, generated: str) int[source]

Checks if SQL queries are equivalent using SQLGlot parsing, so we don’t run them.

unitxt.sql_utils.sqlglot_parsed_queries_equivalent(sql1: str, sql2: str, dialect: str = '') bool[source]
unitxt.sql_utils.sqlparse_queries_equivalent(sql1: str, sql2: str) bool[source]

Return True if both SQL queries are naively considered equivalent.

unitxt.sql_utils.strip_alias(col: str) str[source]

Remove any AS alias from a column.