π PlusΒΆ
Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope (OOS), i.e., queries that do not fall into any of the systemβs supported intents. This poses a new challenge because models cannot assume that every query atβ¦ See the full description on the dataset page: https://huggingface.co/datasets/clinc_oos.
Tags: annotations_creators:expert-generated, language:en, language_creators:crowdsourced, license:cc-by-3.0, multilinguality:monolingual, region:us, size_categories:10K<n<100K, source_datasets:original, task_categories:text-classification, task_ids:intent-classification
cards.clinc_oos.plus
type: TaskCard
loader:
type: LoadHF
path: clinc_oos
name: plus
preprocess_steps:
- type: Shuffle
page_size: 9223372036854775807
- type: Rename
field_to_field:
intent: label
- type: MapInstanceValues
mappers:
label:
0: restaurant reviews
1: nutrition info
2: account blocked
3: oil change how
4: time
5: weather
6: redeem rewards
7: interest rate
8: gas type
9: accept reservations
10: smart home
11: user name
12: report lost card
13: repeat
14: whisper mode
15: what are your hobbies
16: order
17: jump start
18: schedule meeting
19: meeting schedule
20: freeze account
21: what song
22: meaning of life
23: restaurant reservation
24: traffic
25: make call
26: text
27: bill balance
28: improve credit score
29: change language
30: no
31: measurement conversion
32: timer
33: flip coin
34: do you have pets
35: balance
36: tell joke
37: last maintenance
38: exchange rate
39: uber
40: car rental
41: credit limit
42: oos
43: shopping list
44: expiration date
45: routing
46: meal suggestion
47: tire change
48: todo list
49: card declined
50: rewards balance
51: change accent
52: vaccines
53: reminder update
54: food last
55: change ai name
56: bill due
57: who do you work for
58: share location
59: international visa
60: calendar
61: translate
62: carry on
63: book flight
64: insurance change
65: todo list update
66: timezone
67: cancel reservation
68: transactions
69: credit score
70: report fraud
71: spending history
72: directions
73: spelling
74: insurance
75: what is your name
76: reminder
77: where are you from
78: distance
79: payday
80: flight status
81: find phone
82: greeting
83: alarm
84: order status
85: confirm reservation
86: cook time
87: damaged card
88: reset settings
89: pin change
90: replacement card duration
91: new card
92: roll dice
93: income
94: taxes
95: date
96: who made you
97: pto request
98: tire pressure
99: how old are you
100: rollover 401k
101: pto request status
102: how busy
103: application status
104: recipe
105: calendar update
106: play music
107: yes
108: direct deposit
109: credit limit change
110: gas
111: pay bill
112: ingredients list
113: lost luggage
114: goodbye
115: what can i ask you
116: book hotel
117: are you a bot
118: next song
119: change speed
120: plug type
121: maybe
122: w2
123: oil change when
124: thank you
125: shopping list update
126: pto balance
127: order checks
128: travel alert
129: fun fact
130: sync device
131: schedule maintenance
132: apr
133: transfer
134: ingredient substitution
135: calories
136: current location
137: international fees
138: calculator
139: definition
140: next holiday
141: update playlist
142: mpg
143: min payment
144: change user name
145: restaurant suggestion
146: travel notification
147: cancel
148: pto used
149: travel suggestion
150: change volume
- type: Set
fields:
classes:
- restaurant reviews
- nutrition info
- account blocked
- oil change how
- time
- weather
- redeem rewards
- interest rate
- gas type
- accept reservations
- smart home
- user name
- report lost card
- repeat
- whisper mode
- what are your hobbies
- order
- jump start
- schedule meeting
- meeting schedule
- freeze account
- what song
- meaning of life
- restaurant reservation
- traffic
- make call
- text
- bill balance
- improve credit score
- change language
- no
- measurement conversion
- timer
- flip coin
- do you have pets
- balance
- tell joke
- last maintenance
- exchange rate
- uber
- car rental
- credit limit
- oos
- shopping list
- expiration date
- routing
- meal suggestion
- tire change
- todo list
- card declined
- rewards balance
- change accent
- vaccines
- reminder update
- food last
- change ai name
- bill due
- who do you work for
- share location
- international visa
- calendar
- translate
- carry on
- book flight
- insurance change
- todo list update
- timezone
- cancel reservation
- transactions
- credit score
- report fraud
- spending history
- directions
- spelling
- insurance
- what is your name
- reminder
- where are you from
- distance
- payday
- flight status
- find phone
- greeting
- alarm
- order status
- confirm reservation
- cook time
- damaged card
- reset settings
- pin change
- replacement card duration
- new card
- roll dice
- income
- taxes
- date
- who made you
- pto request
- tire pressure
- how old are you
- rollover 401k
- pto request status
- how busy
- application status
- recipe
- calendar update
- play music
- yes
- direct deposit
- credit limit change
- gas
- pay bill
- ingredients list
- lost luggage
- goodbye
- what can i ask you
- book hotel
- are you a bot
- next song
- change speed
- plug type
- maybe
- w2
- oil change when
- thank you
- shopping list update
- pto balance
- order checks
- travel alert
- fun fact
- sync device
- schedule maintenance
- apr
- transfer
- ingredient substitution
- calories
- current location
- international fees
- calculator
- definition
- next holiday
- update playlist
- mpg
- min payment
- change user name
- restaurant suggestion
- travel notification
- cancel
- pto used
- travel suggestion
- change volume
text_type: sentence
type_of_class: intent
task: tasks.classification.multi_class
templates: templates.classification.multi_class.all
[source]Explanation about TaskCardΒΆ
TaskCard delineates the phases in transforming the source dataset into model input, and specifies the metrics for evaluation of model output.
- Attributes:
loader: specifies the source address and the loading operator that can access that source and transform it into a unitxt multistream.
preprocess_steps: list of unitxt operators to process the data source into model input.
task: specifies the fields (of the already (pre)processed instance) making the inputs, the fields making the outputs, and the metrics to be used for evaluating the model output.
templates: format strings to be applied on the input fields (specified by the task) and the output fields. The template also carries the instructions and the list of postprocessing steps, to be applied to the model output.
Explanation about MapInstanceValuesΒΆ
A class used to map instance values into other values.
This class is a type of InstanceOperator, it maps values of instances in a stream using predefined mappers.
- Attributes:
- mappers (Dict[str, Dict[str, Any]]): The mappers to use for mapping instance values.
Keys are the names of the fields to undergo mapping, and values are dictionaries that define the mapping from old values to new values.
- strict (bool): If True, the mapping is applied strictly. That means if a value
does not exist in the mapper, it will raise a KeyError. If False, values that are not present in the mapper are kept as they are.
- process_every_value (bool): If True, all fields to be mapped should be lists, and the mapping
is to be applied to their individual elements. If False, mapping is only applied to a field containing a single value.
- Examples:
MapInstanceValues(mappers={βaβ: {β1β: βhiβ, β2β: βbyeβ}}) replaces β1β with βhiβ and β2β with βbyeβ in field βaβ in all instances of all streams: instance {βaβ:β1β, βbβ: 2} becomes {βaβ:βhiβ, βbβ: 2}.
MapInstanceValues(mappers={βaβ: {β1β: βhiβ, β2β: βbyeβ}}, process_every_value=True) Assuming field βaβ is a list of values, potentially including β1β-s and β2β-s, this replaces each such β1β with βhiβ and β2β β with βbyeβ in all instances of all streams: instance {βaβ: [β1β, β2β], βbβ: 2} becomes {βaβ: [βhiβ, βbyeβ], βbβ: 2}.
MapInstanceValues(mappers={βaβ: {β1β: βhiβ, β2β: βbyeβ}}, strict=True) To ensure that all values of field βaβ are mapped in every instance, use strict=True. Input instance {βaβ:β3β, βbβ: 2} will raise an exception per the above call, because β3β is not a key in the mapper of βaβ.
MapInstanceValues(mappers={βaβ: {str([1,2,3,4]): βAllβ, str([]): βNoneβ}}, strict=True) replaces a list [1,2,3,4] with the string βAllβ and an empty list by string βNoneβ. Note that mapped values are defined by their string representation, so mapped values must be converted to strings.
Explanation about ShuffleΒΆ
Shuffles the order of instances in each page of a stream.
- Args (of superclass):
page_size (int): The size of each page in the stream. Defaults to 1000.
Explanation about LoadHFΒΆ
Loads datasets from the HuggingFace Hub.
It supports loading with or without streaming, and it can filter datasets upon loading.
- Args:
path: The path or identifier of the dataset on the HuggingFace Hub. name: An optional dataset name. data_dir: Optional directory to store downloaded data. split: Optional specification of which split to load. data_files: Optional specification of particular data files to load. revision: Optional. The revision of the dataset. Often the commit id. Use in case you want to set the dataset version. streaming: Bool indicating if streaming should be used. filtering_lambda: A lambda function for filtering the data after loading. num_proc: Optional integer to specify the number of processes to use for parallel dataset loading.
- Example:
Loading glueβs mrpc dataset
load_hf = LoadHF(path='glue', name='mrpc')
Explanation about SetΒΆ
Adds specified fields to each instance in a given stream or all streams (default) If fields exist, updates them.
- Args:
- fields (Dict[str, object]): The fields to add to each instance.
Use β/β to access inner fields
use_deepcopy (bool) : Deep copy the input value to avoid later modifications
- Examples:
# Add a βclassesβ field with a value of a list βpositiveβ and βnegativeβ to all streams Set(fields={βclassesβ: [βpositiveβ,βnegativesβ]})
# Add a βstartβ field under the βspanβ field with a value of 0 to all streams Set(fields={βspan/startβ: 0}
# Add a βclassesβ field with a value of a list βpositiveβ and βnegativeβ to βtrainβ stream Set(fields={βclassesβ: [βpositiveβ,βnegativesβ], apply_to_stream=[βtrainβ]})
# Add a βclassesβ field on a given list, prevent modification of original list # from changing the instance. Set(fields={βclassesβ: alist}), use_deepcopy=True) # if now alist is modified, still the instances remain intact.
Explanation about RenameΒΆ
Renames fields.
Move value from one field to another, potentially, if field name contains a /, from one branch into another. Remove the from field, potentially part of it in case of / in from_field.
- Examples:
Rename(field_to_field={βbβ: βcβ}) will change inputs [{βaβ: 1, βbβ: 2}, {βaβ: 2, βbβ: 3}] to [{βaβ: 1, βcβ: 2}, {βaβ: 2, βcβ: 3}]
Rename(field_to_field={βbβ: βc/dβ}) will change inputs [{βaβ: 1, βbβ: 2}, {βaβ: 2, βbβ: 3}] to [{βaβ: 1, βcβ: {βdβ: 2}}, {βaβ: 2, βcβ: {βdβ: 3}}]
Rename(field_to_field={βbβ: βb/dβ}) will change inputs [{βaβ: 1, βbβ: 2}, {βaβ: 2, βbβ: 3}] to [{βaβ: 1, βbβ: {βdβ: 2}}, {βaβ: 2, βbβ: {βdβ: 3}}]
Rename(field_to_field={βb/c/eβ: βb/dβ}) will change inputs [{βaβ: 1, βbβ: {βcβ: {βeβ: 2, βfβ: 20}}}] to [{βaβ: 1, βbβ: {βcβ: {βfβ: 20}, βdβ: 2}}]
References: templates.classification.multi_class.all, tasks.classification.multi_class
Read more about catalog usage here.