General Settings¶
Introduction¶
Unitxt allows you to control various features at the library level, such as logging, caching, and more. These settings enable you to customize the behavior of the library to better suit your needs.
Settings in Unitxt can be modified using Python:
import unitxt
unitxt.settings.default_verbosity = "info"
Or by setting environment variables:
export UNITXT_DEFAULT_VERBOSITY="debug"
The naming convention for environment variables follows the pattern of prefixing the setting name with UNITXT_ in uppercase.
Important Settings¶
Logging Level¶
unitxt.settings.default_verbosity: Defines the verbosity level of the Python logging logger. - Type: str | Default: “info” | Env Var:UNITXT_DEFAULT_VERBOSITY
Data Loaders¶
unitxt.settings.global_loader_limit: Limits instances loaded by any data loader. - Type: int | Default: None | Env Var:UNITXT_GLOBAL_LOADER_LIMITunitxt.settings.loaders_max_retries: Maximum number of retries for loading data. - Type: int | Default: 10 | Env Var:UNITXT_LOADERS_MAX_RETRIESunitxt.settings.loader_cache_size: Defines the in-memory cache size for loaders. - Type: int | Default: 25 | Env Var:UNITXT_LOADER_CACHE_SIZEunitxt.settings.dataset_cache_default: Determines the default behavior for dataset caching. If the use_cache parameter of the load_dataset function is explicitly set, that value takes precedence. Otherwise, caching will be activated based on this setting. - Type: bool | Default: False | Env Var:UNITXT_DATASET_CACHE_DEFAULT
External Code¶
unitxt.settings.allow_unverified_code: Allows execution of unverified external code. - Type: bool | Default: False | Env Var:UNITXT_ALLOW_UNVERIFIED_CODE
Hugging Face Integration¶
unitxt.settings.disable_hf_datasets_cache: Disables caching for Hugging Face datasets. - Type: bool | Default: False | Env Var:UNITXT_DISABLE_HF_DATASETS_CACHEunitxt.settings.stream_hf_datasets_by_default: Enables streaming mode for Hugging Face datasets by default. - Type: bool | Default: False | Env Var:UNITXT_STREAM_HF_DATASETS_BY_DEFAULTunitxt.settings.hf_offline_models_path: Specifies the path to the directory containing offline pre-downloaded Hugging Face models. You will need to manually download the assets to the directory. - Type: str | Default: None | Env Var:UNITXT_HF_OFFLINE_MODELS_PATHunitxt.settings.hf_offline_metrics_path: Specifies the path to the directory containing offline pre-downloaded Hugging Face metrics. You will need to manually download the assets to the directory. - Type: str | Default: None | Env Var:UNITXT_HF_OFFLINE_METRICS_PATHunitxt.settings.hf_offline_datasets_path: Specifies the path to the directory containing offline pre-downloaded Hugging Face datasets. You will need to manually download the assets to the directory. - Type: str | Default: None | Env Var:UNITXT_HF_OFFLINE_DATASETS_PATH
Randomness¶
unitxt.settings.seed: Seed value for ensuring reproducibility. - Type: int | Default: 42 | Env Var:UNITXT_SEED
LLM Inference¶
unitxt.settings.mock_inference_mode: Enables or disables mock inference mode. - Type: bool | Default: False | Env Var:UNITXT_MOCK_INFERENCE_MODEunitxt.settings.default_provider: Specifies the default inference provider forCrossProviderInferenceEngine``s. - Type: str | Default: "watsonx" | Env Var: ``UNITXT_DEFAULT_PROVIDER
Catalogs¶
unitxt.settings.catalogs: Defines list of directories with local catalogs. - Type: list | Default: None | Env Var:UNITXT_CATALOGSunitxt.settings.use_only_local_catalogs: Restricts the system to using only local catalogs. - Type: bool | Default: False | Env Var:UNITXT_USE_ONLY_LOCAL_CATALOGS
Evaluation¶
unitxt.settings.num_resamples_for_instance_metrics: Number of bootstrap confidence interval resamples for instance-level metrics. - Type: int | Default: 1000 | Env Var:UNITXT_NUM_RESAMPLES_FOR_INSTANCE_METRICSunitxt.settings.num_resamples_for_global_metrics: Number of bootstrap confidence interval resamples for global metrics. - Type: int | Default: 100 | Env Var:UNITXT_NUM_RESAMPLES_FOR_GLOBAL_METRICS
Debugging¶
unitxt.settings.use_eager_execution: Enables eager execution mode for debugging. This mode disables streaming and ensures that each operation processes the entire stream before proceeding to the next step. This can be helpful for debugging by making the execution order more predictable. - Type: bool | Default: False | Env Var:UNITXT_USE_EAGER_EXECUTION
Privacy¶
unitxt.settings.data_classification_policy: Defines the data classification policy. - Type: str | Default: None | Env Var:UNITXT_DATA_CLASSIFICATION_POLICY