unitxt.artifact module¶
- class unitxt.artifact.Artifact(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None)¶
Bases:
Dataclass
- classmethod from_dict(d, overwrite_args=None)¶
- classmethod get_artifact_type()¶
- get_pretty_print_name()¶
- classmethod is_artifact_dict(d)¶
- classmethod is_artifact_file(path)¶
- classmethod is_registered_class(clz: object)¶
- classmethod is_registered_class_name(class_name: str)¶
- classmethod is_registered_type(type: str)¶
- classmethod load(path, artifact_identifier=None, overwrite_args=None)¶
- prepare()¶
- classmethod process_data_after_load(data)¶
- process_data_before_dump(data)¶
- classmethod register_class(artifact_class)¶
- save(path)¶
- serialize()¶
- to_json()¶
- verify()¶
- classmethod verify_artifact_dict(d)¶
- final verify_data_classification_policy()¶
- verify_instance(instance: Dict[str, Any], name: str | None = None) Dict[str, Any] ¶
Checks if data classifications of an artifact and instance are compatible.
Raises an error if an artifact’s data classification policy does not include that of processed data. The purpose is to ensure that any sensitive data is handled in a proper way (for example when sending it to some external services).
- Parameters:
instance (Dict[str, Any]) – data which should contain its allowed data classification policies under key ‘data_classification_policy’.
name (Optional[str]) –
name of artifact which should be used to retrieve data classification from env. If not specified, then either __id__ or
__class__.__name__, are used instead, respectively.
- Returns:
unchanged instance.
- Return type:
Dict[str, Any]
Examples
instance = {“x”: “some_text”, “data_classification_policy”: [“pii”]}
# Will raise an error as “pii” is not included policy metric = Accuracy(data_classification_policy=[“public”]) metric.verify_instance(instance)
# Will not raise an error template = SpanLabelingTemplate(data_classification_policy=[“pii”, “propriety”]) template.verify_instance(instance)
# Will not raise an error since the policy was specified in environment variable: UNITXT_DATA_CLASSIFICATION_POLICY = json.dumps({“metrics.accuracy”: [“pii”]}) metric = fetch_artifact(“metrics.accuracy”) metric.verify_instance(instance)
- class unitxt.artifact.ArtifactList(__tags__: Dict[str, str] = {}, data_classification_policy: List[str] = None)¶
Bases:
list
,Artifact
- class unitxt.artifact.Artifactories¶
Bases:
object
- instance = <unitxt.artifact.Artifactories object>¶
- register(artifactory)¶
- reset()¶
- unregister(artifactory)¶
- class unitxt.artifact.Artifactory(__tags__: ~typing.Dict[str, str] = {}, data_classification_policy: ~typing.List[str] = None, is_local: bool)¶
Bases:
Artifact
- exception unitxt.artifact.MissingArtifactTypeError(dic)¶
Bases:
ValueError
- exception unitxt.artifact.UnitxtArtifactNotFoundError(name, artifactories)¶
Bases:
Exception
- exception unitxt.artifact.UnrecognizedArtifactTypeError(type)¶
Bases:
ValueError
- unitxt.artifact.fetch_artifact(artifact_rep) Tuple[Artifact, Artifactory | None] ¶
Loads an artifict from one of possible representations.
If artifact representation is already an Artifact object, return it.
If artifact representation is a string location of a local file, load the Artifact from local file.
If artifact representation is a string name iin the catalog, load the Artifact from the catalog.
If artifact representation is a json string, create dictionary representation from the string and build an Artifact object from it.
Otherwise, check the artifact representation is a dictionary and build an Artifact object from it.
- unitxt.artifact.get_artifactory_name_and_args(name: str, artifactories: List[Artifactory] | None = None)¶
- unitxt.artifact.get_artifacts_data_classification(artifact: str) List[str] | None ¶
Loads given artifact’s data classification policy from an environment variable.
- Parameters:
artifact (str) – Name of the artifact which the data classification policy should be retrieved for. For example “metrics.accuracy”.
- Returns:
- Optional[List[str]] - Data classification policies for the specified artifact
if they were found, or None otherwise.
- unitxt.artifact.get_closest_artifact_type(type)¶
- unitxt.artifact.get_raw(obj)¶
- unitxt.artifact.is_name_legal_for_catalog(name)¶
- unitxt.artifact.map_values_in_place(object, mapper)¶
- unitxt.artifact.maybe_recover_artifact(artifact)¶
- unitxt.artifact.register_all_artifacts(path)¶
- unitxt.artifact.reset_artifacts_json_cache()¶
- unitxt.artifact.verbosed_fetch_artifact(identifier)¶
- unitxt.artifact.verify_legal_catalog_name(name)¶