πŸ“„ ImbalancedΒΆ

Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope (OOS), i.e., queries that do not fall into any of the system’s supported intents. This poses a new challenge because models cannot assume that every query at… See the full description on the dataset page: https://huggingface.co/datasets/clinc_oos.

Tags: annotations_creators:expert-generated, language:en, language_creators:crowdsourced, license:cc-by-3.0, multilinguality:monolingual, region:us, size_categories:10K<n<100K, source_datasets:original, task_categories:text-classification, task_ids:intent-classification

cards.clinc_oos.imbalanced

type: TaskCard
loader: 
  type: LoadHF
  path: clinc_oos
  name: imbalanced
preprocess_steps: 
  - type: Shuffle
    page_size: 9223372036854775807
  - type: Rename
    field_to_field: 
      intent: label
  - type: MapInstanceValues
    mappers: 
      label: 
        0: restaurant reviews
        1: nutrition info
        2: account blocked
        3: oil change how
        4: time
        5: weather
        6: redeem rewards
        7: interest rate
        8: gas type
        9: accept reservations
        10: smart home
        11: user name
        12: report lost card
        13: repeat
        14: whisper mode
        15: what are your hobbies
        16: order
        17: jump start
        18: schedule meeting
        19: meeting schedule
        20: freeze account
        21: what song
        22: meaning of life
        23: restaurant reservation
        24: traffic
        25: make call
        26: text
        27: bill balance
        28: improve credit score
        29: change language
        30: no
        31: measurement conversion
        32: timer
        33: flip coin
        34: do you have pets
        35: balance
        36: tell joke
        37: last maintenance
        38: exchange rate
        39: uber
        40: car rental
        41: credit limit
        42: oos
        43: shopping list
        44: expiration date
        45: routing
        46: meal suggestion
        47: tire change
        48: todo list
        49: card declined
        50: rewards balance
        51: change accent
        52: vaccines
        53: reminder update
        54: food last
        55: change ai name
        56: bill due
        57: who do you work for
        58: share location
        59: international visa
        60: calendar
        61: translate
        62: carry on
        63: book flight
        64: insurance change
        65: todo list update
        66: timezone
        67: cancel reservation
        68: transactions
        69: credit score
        70: report fraud
        71: spending history
        72: directions
        73: spelling
        74: insurance
        75: what is your name
        76: reminder
        77: where are you from
        78: distance
        79: payday
        80: flight status
        81: find phone
        82: greeting
        83: alarm
        84: order status
        85: confirm reservation
        86: cook time
        87: damaged card
        88: reset settings
        89: pin change
        90: replacement card duration
        91: new card
        92: roll dice
        93: income
        94: taxes
        95: date
        96: who made you
        97: pto request
        98: tire pressure
        99: how old are you
        100: rollover 401k
        101: pto request status
        102: how busy
        103: application status
        104: recipe
        105: calendar update
        106: play music
        107: yes
        108: direct deposit
        109: credit limit change
        110: gas
        111: pay bill
        112: ingredients list
        113: lost luggage
        114: goodbye
        115: what can i ask you
        116: book hotel
        117: are you a bot
        118: next song
        119: change speed
        120: plug type
        121: maybe
        122: w2
        123: oil change when
        124: thank you
        125: shopping list update
        126: pto balance
        127: order checks
        128: travel alert
        129: fun fact
        130: sync device
        131: schedule maintenance
        132: apr
        133: transfer
        134: ingredient substitution
        135: calories
        136: current location
        137: international fees
        138: calculator
        139: definition
        140: next holiday
        141: update playlist
        142: mpg
        143: min payment
        144: change user name
        145: restaurant suggestion
        146: travel notification
        147: cancel
        148: pto used
        149: travel suggestion
        150: change volume
  - type: Set
    fields: 
      classes: 
        - restaurant reviews
        - nutrition info
        - account blocked
        - oil change how
        - time
        - weather
        - redeem rewards
        - interest rate
        - gas type
        - accept reservations
        - smart home
        - user name
        - report lost card
        - repeat
        - whisper mode
        - what are your hobbies
        - order
        - jump start
        - schedule meeting
        - meeting schedule
        - freeze account
        - what song
        - meaning of life
        - restaurant reservation
        - traffic
        - make call
        - text
        - bill balance
        - improve credit score
        - change language
        - no
        - measurement conversion
        - timer
        - flip coin
        - do you have pets
        - balance
        - tell joke
        - last maintenance
        - exchange rate
        - uber
        - car rental
        - credit limit
        - oos
        - shopping list
        - expiration date
        - routing
        - meal suggestion
        - tire change
        - todo list
        - card declined
        - rewards balance
        - change accent
        - vaccines
        - reminder update
        - food last
        - change ai name
        - bill due
        - who do you work for
        - share location
        - international visa
        - calendar
        - translate
        - carry on
        - book flight
        - insurance change
        - todo list update
        - timezone
        - cancel reservation
        - transactions
        - credit score
        - report fraud
        - spending history
        - directions
        - spelling
        - insurance
        - what is your name
        - reminder
        - where are you from
        - distance
        - payday
        - flight status
        - find phone
        - greeting
        - alarm
        - order status
        - confirm reservation
        - cook time
        - damaged card
        - reset settings
        - pin change
        - replacement card duration
        - new card
        - roll dice
        - income
        - taxes
        - date
        - who made you
        - pto request
        - tire pressure
        - how old are you
        - rollover 401k
        - pto request status
        - how busy
        - application status
        - recipe
        - calendar update
        - play music
        - yes
        - direct deposit
        - credit limit change
        - gas
        - pay bill
        - ingredients list
        - lost luggage
        - goodbye
        - what can i ask you
        - book hotel
        - are you a bot
        - next song
        - change speed
        - plug type
        - maybe
        - w2
        - oil change when
        - thank you
        - shopping list update
        - pto balance
        - order checks
        - travel alert
        - fun fact
        - sync device
        - schedule maintenance
        - apr
        - transfer
        - ingredient substitution
        - calories
        - current location
        - international fees
        - calculator
        - definition
        - next holiday
        - update playlist
        - mpg
        - min payment
        - change user name
        - restaurant suggestion
        - travel notification
        - cancel
        - pto used
        - travel suggestion
        - change volume
      text_type: sentence
      type_of_class: intent
task: tasks.classification.multi_class
templates: templates.classification.multi_class.all
[source]

Explanation about TaskCardΒΆ

TaskCard delineates the phases in transforming the source dataset into model input, and specifies the metrics for evaluation of model output.

Attributes:

loader: specifies the source address and the loading operator that can access that source and transform it into a unitxt multistream.

preprocess_steps: list of unitxt operators to process the data source into model input.

task: specifies the fields (of the already (pre)processed instance) making the inputs, the fields making the outputs, and the metrics to be used for evaluating the model output.

templates: format strings to be applied on the input fields (specified by the task) and the output fields. The template also carries the instructions and the list of postprocessing steps, to be applied to the model output.

Explanation about MapInstanceValuesΒΆ

A class used to map instance values into other values.

This class is a type of InstanceOperator, it maps values of instances in a stream using predefined mappers.

Attributes:
mappers (Dict[str, Dict[str, Any]]): The mappers to use for mapping instance values.

Keys are the names of the fields to undergo mapping, and values are dictionaries that define the mapping from old values to new values.

strict (bool): If True, the mapping is applied strictly. That means if a value

does not exist in the mapper, it will raise a KeyError. If False, values that are not present in the mapper are kept as they are.

process_every_value (bool): If True, all fields to be mapped should be lists, and the mapping

is to be applied to their individual elements. If False, mapping is only applied to a field containing a single value.

Examples:

MapInstanceValues(mappers={β€œa”: {β€œ1”: β€œhi”, β€œ2”: β€œbye”}}) replaces β€˜1’ with β€˜hi’ and β€˜2’ with β€˜bye’ in field β€˜a’ in all instances of all streams: instance {β€œa”:”1”, β€œb”: 2} becomes {β€œa”:”hi”, β€œb”: 2}.

MapInstanceValues(mappers={β€œa”: {β€œ1”: β€œhi”, β€œ2”: β€œbye”}}, process_every_value=True) Assuming field β€˜a’ is a list of values, potentially including β€œ1”-s and β€œ2”-s, this replaces each such β€œ1” with β€œhi” and β€œ2” – with β€œbye” in all instances of all streams: instance {β€œa”: [β€œ1”, β€œ2”], β€œb”: 2} becomes {β€œa”: [β€œhi”, β€œbye”], β€œb”: 2}.

MapInstanceValues(mappers={β€œa”: {β€œ1”: β€œhi”, β€œ2”: β€œbye”}}, strict=True) To ensure that all values of field β€˜a’ are mapped in every instance, use strict=True. Input instance {β€œa”:”3”, β€œb”: 2} will raise an exception per the above call, because β€œ3” is not a key in the mapper of β€œa”.

MapInstanceValues(mappers={β€œa”: {str([1,2,3,4]): β€˜All’, str([]): β€˜None’}}, strict=True) replaces a list [1,2,3,4] with the string β€˜All’ and an empty list by string β€˜None’. Note that mapped values are defined by their string representation, so mapped values must be converted to strings.

Explanation about ShuffleΒΆ

Shuffles the order of instances in each page of a stream.

Args (of superclass):

page_size (int): The size of each page in the stream. Defaults to 1000.

Explanation about LoadHFΒΆ

Loads datasets from the HuggingFace Hub.

It supports loading with or without streaming, and it can filter datasets upon loading.

Args:

path: The path or identifier of the dataset on the HuggingFace Hub. name: An optional dataset name. data_dir: Optional directory to store downloaded data. split: Optional specification of which split to load. data_files: Optional specification of particular data files to load. revision: Optional. The revision of the dataset. Often the commit id. Use in case you want to set the dataset version. streaming: Bool indicating if streaming should be used. filtering_lambda: A lambda function for filtering the data after loading. num_proc: Optional integer to specify the number of processes to use for parallel dataset loading.

Example:

Loading glue’s mrpc dataset

load_hf = LoadHF(path='glue', name='mrpc')

Explanation about SetΒΆ

Adds specified fields to each instance in a given stream or all streams (default) If fields exist, updates them.

Args:
fields (Dict[str, object]): The fields to add to each instance.

Use β€˜/’ to access inner fields

use_deepcopy (bool) : Deep copy the input value to avoid later modifications

Examples:

# Add a β€˜classes’ field with a value of a list β€œpositive” and β€œnegative” to all streams Set(fields={β€œclasses”: [β€œpositive”,”negatives”]})

# Add a β€˜start’ field under the β€˜span’ field with a value of 0 to all streams Set(fields={β€œspan/start”: 0}

# Add a β€˜classes’ field with a value of a list β€œpositive” and β€œnegative” to β€˜train’ stream Set(fields={β€œclasses”: [β€œpositive”,”negatives”], apply_to_stream=[β€œtrain”]})

# Add a β€˜classes’ field on a given list, prevent modification of original list # from changing the instance. Set(fields={β€œclasses”: alist}), use_deepcopy=True) # if now alist is modified, still the instances remain intact.

Explanation about RenameΒΆ

Renames fields.

Move value from one field to another, potentially, if field name contains a /, from one branch into another. Remove the from field, potentially part of it in case of / in from_field.

Examples:

Rename(field_to_field={β€œb”: β€œc”}) will change inputs [{β€œa”: 1, β€œb”: 2}, {β€œa”: 2, β€œb”: 3}] to [{β€œa”: 1, β€œc”: 2}, {β€œa”: 2, β€œc”: 3}]

Rename(field_to_field={β€œb”: β€œc/d”}) will change inputs [{β€œa”: 1, β€œb”: 2}, {β€œa”: 2, β€œb”: 3}] to [{β€œa”: 1, β€œc”: {β€œd”: 2}}, {β€œa”: 2, β€œc”: {β€œd”: 3}}]

Rename(field_to_field={β€œb”: β€œb/d”}) will change inputs [{β€œa”: 1, β€œb”: 2}, {β€œa”: 2, β€œb”: 3}] to [{β€œa”: 1, β€œb”: {β€œd”: 2}}, {β€œa”: 2, β€œb”: {β€œd”: 3}}]

Rename(field_to_field={β€œb/c/e”: β€œb/d”}) will change inputs [{β€œa”: 1, β€œb”: {β€œc”: {β€œe”: 2, β€œf”: 20}}}] to [{β€œa”: 1, β€œb”: {β€œc”: {β€œf”: 20}, β€œd”: 2}}]

References: templates.classification.multi_class.all, tasks.classification.multi_class

Read more about catalog usage here.