unitxt.evaluate_cli module

unitxt.evaluate_cli.cli_load_dataset(args: Namespace) Dataset[source]

Loads the dataset based on command line arguments.

Parameters:

args (argparse.Namespace) – Parsed command-line arguments.

Returns:

The loaded dataset.

Return type:

HFDataset

Raises:
  • UnitxtArtifactNotFoundError – If the specified card or template artifact is not found.

  • FileNotFoundError – If a specified file (e.g., in a local card path) is not found.

  • AttributeError – If there’s an issue accessing attributes during loading.

  • ValueError – If there’s a value-related error during loading (e.g., parsing).

unitxt.evaluate_cli.configure_unitxt_settings(args: Namespace)[source]

Configures unitxt settings and returns a context manager.

Parameters:

args (argparse.Namespace) – Parsed command-line arguments.

Returns:

A context manager for applying unitxt settings.

Return type:

ContextManager

unitxt.evaluate_cli.extract_overwrite_args(args)[source]
unitxt.evaluate_cli.extract_scores(directory)[source]
unitxt.evaluate_cli.initialize_inference_engine(args: Namespace, model_args_dict: Dict[str, Any], chat_kwargs_dict: Dict[str, Any]) InferenceEngine[source]

Initializes the appropriate inference engine based on arguments.

Parameters:
  • args (argparse.Namespace) – Parsed command-line arguments.

  • model_args_dict (Dict[str, Any]) – Processed model arguments.

  • chat_kwargs_dict (Dict[str, Any]) – Processed chat arguments.

Returns:

The initialized inference engine instance.

Return type:

InferenceEngine

Raises:
  • SystemExit – If required dependencies are missing for the selected model type.

  • ValueError – If required keys are missing in model_args for the selected model type.

unitxt.evaluate_cli.main()[source]

Main function to parse arguments and run evaluation.

unitxt.evaluate_cli.prepare_kwargs(kwargs: dict) Dict[str, Any][source]

Prepares the model arguments dictionary.

Parameters:

kwargs (dict) – Parsed command-line arguments.

Returns:

The processed model arguments dictionary.

Return type:

Dict[str, Any]

unitxt.evaluate_cli.prepare_output_paths(output_path: str, prefix: str) Tuple[str, str][source]

Creates output directory and defines file paths.

Parameters:
  • output_path (str) – The directory where output files will be saved.

  • prefix (str) – The prefix for the output file names.

Returns:

A tuple containing the path for the results summary file

and the path for the detailed samples file.

Return type:

Tuple[str, str]

unitxt.evaluate_cli.prepend_timestamp_to_path(original_path, timestamp)[source]

Takes a path string and a timestamp string, prepends the timestamp to the filename part of the path, and returns the new path string.

unitxt.evaluate_cli.process_and_save_results(args: Namespace, evaluation_results: EvaluationResults, results_path: str, samples_path: str) None[source]

Processes, prints, and saves the evaluation results.

Parameters:
  • args (argparse.Namespace) – Parsed command-line arguments.

  • evaluation_results (EvaluationResults) – The list of evaluated instances.

  • results_path (str) – Path to save the summary results JSON file.

  • samples_path (str) – Path to save the detailed samples JSON file.

Raises:

Exception – If an error occurs during result processing or saving (re-raised).

unitxt.evaluate_cli.run_evaluation(predictions: List[Any], dataset: Dataset) EvaluationResults[source]

Runs evaluation on the predictions.

Parameters:
  • predictions (List[Any]) – The list of predictions from the model.

  • dataset (HFDataset) – The dataset containing references and other data.

Returns:

The evaluated dataset (list of instances with scores).

Return type:

EvaluationResults

Raises:
  • RuntimeError – If evaluation returns no results or an unexpected type.

  • Exception – If any other error occurs during evaluation.

unitxt.evaluate_cli.run_inference(engine: InferenceEngine, dataset: Dataset) List[Any][source]

Runs inference using the initialized engine.

Parameters:
  • engine (InferenceEngine) – The inference engine instance.

  • dataset (HFDataset) – The dataset to run inference on.

Returns:

A list of predictions.

Return type:

List[Any]

Raises:

Exception – If an error occurs during inference.

unitxt.evaluate_cli.setup_logging(verbosity: str) None[source]

Configures logging based on verbosity level.

unitxt.evaluate_cli.setup_parser() ArgumentParser[source]

Sets up the argument parser.

unitxt.evaluate_cli.summarize_cli()[source]
unitxt.evaluate_cli.try_parse_json(value: str) str | dict | None[source]

Attempts to parse a string as JSON or key=value pairs.

Returns the original string if parsing fails and the string doesn’t look like JSON/kv pairs. Raises ArgumentTypeError if it looks like JSON but is invalid.