unitxt.artifact module¶
- class unitxt.artifact.AbstractCatalog(data_classification_policy: List[str] = None, is_local: bool = <class 'unitxt.dataclass.Undefined'>)[source]¶
Bases:
Artifact
- class unitxt.artifact.Artifact(data_classification_policy: List[str] = None)[source]¶
Bases:
Dataclass
- verify_instance(instance: Dict[str, Any], name: str | None = None) Dict[str, Any] [source]¶
Checks if data classifications of an artifact and instance are compatible.
Raises an error if an artifact’s data classification policy does not include that of processed data. The purpose is to ensure that any sensitive data is handled in a proper way (for example when sending it to some external services).
- Parameters:
instance (Dict[str, Any]) – data which should contain its allowed data classification policies under key ‘data_classification_policy’.
name (Optional[str]) – name of artifact which should be used to retrieve data classification from env. If not specified, then either
__id__
or__class__.__name__
, are used instead, respectively.
- Returns:
unchanged instance.
- Return type:
Dict[str, Any]
- Examples:
instance = {"x": "some_text", "data_classification_policy": ["pii"]} # Will raise an error as "pii" is not included policy metric = Accuracy(data_classification_policy=["public"]) metric.verify_instance(instance) # Will not raise an error template = SpanLabelingTemplate(data_classification_policy=["pii", "propriety"]) template.verify_instance(instance) # Will not raise an error since the policy was specified in environment variable: UNITXT_DATA_CLASSIFICATION_POLICY = json.dumps({"metrics.accuracy": ["pii"]}) metric = fetch_artifact("metrics.accuracy") metric.verify_instance(instance)
- class unitxt.artifact.ArtifactLink(data_classification_policy: List[str] = None, to: unitxt.artifact.Artifact = __required__)[source]¶
Bases:
Artifact
- class unitxt.artifact.ArtifactList(data_classification_policy: List[str] = None)[source]¶
Bases:
list
,Artifact
- exception unitxt.artifact.UnitxtArtifactNotFoundError(name, catalogs)[source]¶
Bases:
UnitxtError
- unitxt.artifact.fetch_artifact(artifact_rep) Tuple[Artifact, AbstractCatalog | None] [source]¶
Loads an artifict from one of possible representations.
If artifact representation is already an Artifact object, return it.
If artifact representation is a string location of a local file, load the Artifact from the local file.
If artifact representation is a string name in the catalog, load the Artifact from the catalog.
If artifact representation is a json string, create a dictionary representation from the string and build an Artifact object from it.
Otherwise, check that the artifact representation is a dictionary and build an Artifact object from it.
- unitxt.artifact.get_artifacts_data_classification(artifact: str) List[str] | None [source]¶
Loads given artifact’s data classification policy from an environment variable.
- Parameters:
artifact (str) – Name of the artifact which the data classification policy should be retrieved for. For example “metrics.accuracy”.
- Returns:
- Optional[List[str]] - Data classification policies for the specified artifact
if they were found, or None otherwise.
- unitxt.artifact.get_catalog_name_and_args(name: str, catalogs: List[AbstractCatalog] | None = None)[source]¶