📄 Extraction

This is Entity Extraction task where multiple entity types are to be extracted. The input is the ‘text’ and ‘entity_types’ to extract (e.g. [“Organization”, “Location”, “Person”])

By default, classical f1 metric is used, which expects a list of <entity,entity_type> pairs. Multiple f1 score are reported, including f1_micro and f1_macro and f1 per per entity_type.”. The template’s post processors must convert the model textual predictions into the expected list format.

tasks.span_labeling.extraction

Task(
    input_fields={
        "text": "str",
        "text_type": "str",
        "entity_types": "List[str]",
    },
    reference_fields={
        "text": "str",
        "spans_starts": "List[int]",
        "spans_ends": "List[int]",
        "labels": "List[str]",
    },
    prediction_type="List[Tuple[str, str]]",
    metrics=[
        "metrics.ner",
    ],
    augmentable_inputs=[
        "text",
    ],
    defaults={
        "text_type": "text",
    },
    default_template="templates.span_labeling.extraction.detailed",
)
[source]

Explanation about Task

Task packs the different instance fields into dictionaries by their roles in the task.

Args:
input_fields (Union[Dict[str, str], List[str]]):

Dictionary with string names of instance input fields and types of respective values. In case a list is passed, each type will be assumed to be Any.

reference_fields (Union[Dict[str, str], List[str]]):

Dictionary with string names of instance output fields and types of respective values. In case a list is passed, each type will be assumed to be Any.

metrics (List[str]):

List of names of metrics to be used in the task.

prediction_type (Optional[str]):

Need to be consistent with all used metrics. Defaults to None, which means that it will be set to Any.

defaults (Optional[Dict[str, Any]]):

An optional dictionary with default values for chosen input/output keys. Needs to be consistent with names and types provided in ‘input_fields’ and/or ‘output_fields’ arguments. Will not overwrite values if already provided in a given instance.

The output instance contains three fields:
  1. “input_fields” whose value is a sub-dictionary of the input instance, consisting of all the fields listed in Arg ‘input_fields’.

  2. “reference_fields” – for the fields listed in Arg “reference_fields”.

  3. “metrics” – to contain the value of Arg ‘metrics’

References: templates.span_labeling.extraction.detailed, metrics.ner

Read more about catalog usage here.