unitxt.artifact module

class unitxt.artifact.AbstractCatalog(data_classification_policy: List[str] = None, is_local: bool = <class 'unitxt.dataclass.Undefined'>)[source]

Bases: Artifact

class unitxt.artifact.Artifact(data_classification_policy: List[str] = None)[source]

Bases: Dataclass

classmethod from_dict(d, overwrite_args=None)[source]
classmethod get_artifact_type()[source]
get_pretty_print_name()[source]
classmethod is_artifact_dict(obj)[source]
classmethod is_artifact_file(path)[source]
classmethod is_possible_identifier(obj)[source]
classmethod is_registered_class(clz: object)[source]
classmethod is_registered_class_name(class_name: str)[source]
classmethod is_registered_type(type: str)[source]
classmethod load(path, artifact_identifier=None, overwrite_args=None)[source]
prepare()[source]
prepare_args()[source]
classmethod process_data_after_load(data)[source]
process_data_before_dump(data)[source]
classmethod register_class(artifact_class)[source]
save(path)[source]
serialize()[source]
to_json()[source]
verify()[source]
classmethod verify_artifact_dict(d)[source]
final verify_data_classification_policy()[source]
verify_instance(instance: Dict[str, Any], name: str | None = None) Dict[str, Any][source]

Checks if data classifications of an artifact and instance are compatible.

Raises an error if an artifact’s data classification policy does not include that of processed data. The purpose is to ensure that any sensitive data is handled in a proper way (for example when sending it to some external services).

Parameters:
  • instance (Dict[str, Any]) – data which should contain its allowed data classification policies under key ‘data_classification_policy’.

  • name (Optional[str]) – name of artifact which should be used to retrieve data classification from env. If not specified, then either __id__ or __class__.__name__, are used instead, respectively.

Returns:

unchanged instance.

Return type:

Dict[str, Any]

Examples:

instance = {"x": "some_text", "data_classification_policy": ["pii"]}

# Will raise an error as "pii" is not included policy
metric = Accuracy(data_classification_policy=["public"])
metric.verify_instance(instance)

# Will not raise an error
template = SpanLabelingTemplate(data_classification_policy=["pii", "propriety"])
template.verify_instance(instance)

# Will not raise an error since the policy was specified in environment variable:
UNITXT_DATA_CLASSIFICATION_POLICY = json.dumps({"metrics.accuracy": ["pii"]})
metric = fetch_artifact("metrics.accuracy")
metric.verify_instance(instance)

Bases: Artifact

class unitxt.artifact.ArtifactList(data_classification_policy: List[str] = None)[source]

Bases: list, Artifact

class unitxt.artifact.Catalogs[source]

Bases: object

instance = <unitxt.artifact.Catalogs object>
register(catalog)[source]
reset()[source]
unregister(catalog)[source]
exception unitxt.artifact.MissingArtifactTypeError(dic)[source]

Bases: ValueError

exception unitxt.artifact.UnitxtArtifactNotFoundError(name, catalogs)[source]

Bases: UnitxtError

exception unitxt.artifact.UnrecognizedArtifactTypeError(type)[source]

Bases: ValueError

unitxt.artifact.dict_diff_string(dict1, dict2, max_diff=200)[source]
unitxt.artifact.fetch_artifact(artifact_rep) Tuple[Artifact, AbstractCatalog | None][source]

Loads an artifict from one of possible representations.

  1. If artifact representation is already an Artifact object, return it.

  2. If artifact representation is a string location of a local file, load the Artifact from the local file.

  3. If artifact representation is a string name in the catalog, load the Artifact from the catalog.

  4. If artifact representation is a json string, create a dictionary representation from the string and build an Artifact object from it.

  5. Otherwise, check that the artifact representation is a dictionary and build an Artifact object from it.

unitxt.artifact.get_artifacts_data_classification(artifact: str) List[str] | None[source]

Loads given artifact’s data classification policy from an environment variable.

Parameters:

artifact (str) – Name of the artifact which the data classification policy should be retrieved for. For example “metrics.accuracy”.

Returns:

Optional[List[str]] - Data classification policies for the specified artifact

if they were found, or None otherwise.

unitxt.artifact.get_catalog_name_and_args(name: str, catalogs: List[AbstractCatalog] | None = None)[source]
unitxt.artifact.get_closest_artifact_type(type)[source]
unitxt.artifact.get_raw(obj)[source]
unitxt.artifact.maybe_recover_artifact(obj)[source]
unitxt.artifact.maybe_recover_artifacts_structure(obj)[source]
unitxt.artifact.register_all_artifacts(path)[source]
unitxt.artifact.reset_artifacts_json_cache()[source]
unitxt.artifact.verbosed_fetch_artifact(identifier)[source]