General Settings¶
Introduction¶
Unitxt allows you to control various features at the library level, such as logging, caching, and more. These settings enable you to customize the behavior of the library to better suit your needs.
Settings in Unitxt can be modified using Python:
import unitxt
unitxt.settings.default_verbosity = "info"
Or by setting environment variables:
export UNITXT_DEFAULT_VERBOSITY="debug"
The naming convention for environment variables follows the pattern of prefixing the setting name with UNITXT_
in uppercase.
Important Settings¶
Logging Level¶
unitxt.settings.default_verbosity
: Defines the verbosity level of the Python logging logger. - Type: str | Default: “info” | Env Var:UNITXT_DEFAULT_VERBOSITY
Data Loaders¶
unitxt.settings.global_loader_limit
: Limits instances loaded by any data loader. - Type: int | Default: None | Env Var:UNITXT_GLOBAL_LOADER_LIMIT
unitxt.settings.loaders_max_retries
: Maximum number of retries for loading data. - Type: int | Default: 10 | Env Var:UNITXT_LOADERS_MAX_RETRIES
unitxt.settings.loader_cache_size
: Defines the in-memory cache size for loaders. - Type: int | Default: 25 | Env Var:UNITXT_LOADER_CACHE_SIZE
External Code¶
unitxt.settings.allow_unverified_code
: Allows execution of unverified external code. - Type: bool | Default: False | Env Var:UNITXT_ALLOW_UNVERIFIED_CODE
Hugging Face Integration¶
unitxt.settings.disable_hf_datasets_cache
: Disables caching for Hugging Face datasets. - Type: bool | Default: False | Env Var:UNITXT_DISABLE_HF_DATASETS_CACHE
unitxt.settings.stream_hf_datasets_by_default
: Enables streaming mode for Hugging Face datasets by default. - Type: bool | Default: False | Env Var:UNITXT_STREAM_HF_DATASETS_BY_DEFAULT
unitxt.settings.hf_offline_models_path
: Specifies the path to the directory containing offline pre-downloaded Hugging Face models. You will need to manually download the assets to the directory. - Type: str | Default: None | Env Var:UNITXT_HF_OFFLINE_MODELS_PATH
unitxt.settings.hf_offline_metrics_path
: Specifies the path to the directory containing offline pre-downloaded Hugging Face metrics. You will need to manually download the assets to the directory. - Type: str | Default: None | Env Var:UNITXT_HF_OFFLINE_METRICS_PATH
unitxt.settings.hf_offline_datasets_path
: Specifies the path to the directory containing offline pre-downloaded Hugging Face datasets. You will need to manually download the assets to the directory. - Type: str | Default: None | Env Var:UNITXT_HF_OFFLINE_DATASETS_PATH
Randomness¶
unitxt.settings.seed
: Seed value for ensuring reproducibility. - Type: int | Default: 42 | Env Var:UNITXT_SEED
LLM Inference¶
unitxt.settings.mock_inference_mode
: Enables or disables mock inference mode. - Type: bool | Default: False | Env Var:UNITXT_MOCK_INFERENCE_MODE
unitxt.settings.default_provider
: Specifies the default inference provider forCrossProviderInferenceEngine``s. - Type: str | Default: "watsonx" | Env Var: ``UNITXT_DEFAULT_PROVIDER
Catalogs¶
unitxt.settings.catalogs
: Defines list of directories with local catalogs. - Type: list | Default: None | Env Var:UNITXT_CATALOGS
unitxt.settings.use_only_local_catalogs
: Restricts the system to using only local catalogs. - Type: bool | Default: False | Env Var:UNITXT_USE_ONLY_LOCAL_CATALOGS
Evaluation¶
unitxt.settings.num_resamples_for_instance_metrics
: Number of bootstrap confidence interval resamples for instance-level metrics. - Type: int | Default: 1000 | Env Var:UNITXT_NUM_RESAMPLES_FOR_INSTANCE_METRICS
unitxt.settings.num_resamples_for_global_metrics
: Number of bootstrap confidence interval resamples for global metrics. - Type: int | Default: 100 | Env Var:UNITXT_NUM_RESAMPLES_FOR_GLOBAL_METRICS
Debugging¶
unitxt.settings.use_eager_execution
: Enables eager execution mode for debugging. This mode disables streaming and ensures that each operation processes the entire stream before proceeding to the next step. This can be helpful for debugging by making the execution order more predictable. - Type: bool | Default: False | Env Var:UNITXT_USE_EAGER_EXECUTION
Privacy¶
unitxt.settings.data_classification_policy
: Defines the data classification policy. - Type: str | Default: None | Env Var:UNITXT_DATA_CLASSIFICATION_POLICY