DSTC 11 Track 2: Intent Induction from Conversations for Task-Oriented Dialogue

License: Apache License 2.0

Jsonnet 10.80% Python 89.20%

dstc11-track2-intent-induction's Introduction

Intent Induction from Conversations for Task-Oriented Dialogue

This repository contains data, relevant scripts and baseline code for the DSTC11 summer track on Intent Induction from Conversations for Task-Oriented Dialogue.

This track aims to evaluate methods for the automatic induction of customer intents in the realistic setting of customer service interactions between human agents and customers. As complete conversations will be provided, participants can make use of information in both agent and customer turns. The track includes two tasks: (1) intent clustering, which requires participants to assign labels to turns in the dialogues where customers express intents, and (2) open intent induction, in which participants must induce a set of intents from dialogues, with each intent defined by a list of sample utterances to be used as training data for an intent classifier.

Organizers: James Gung, Raphael Shu, Jason Krone, Salvatore Romeo, Arshit Gupta, Yassine Benajiba, Saab Mansour and Yi Zhang

Contact: dstc11-intent-induction (AT) amazon (DOT) com

News

April 25th - Details about the track and submissions can now be found in the Track Overview Paper
April 17th - The DSTC11 workshop will take place at SIGDIAL x INLG 2023, Sept. 11-15th
November 8th - Call for DSTC11 workshop papers announced
September 26th - Test conversations released!
- Two domains: Banking and Finance
- Submission form open until October 3rd, 2022 (11:59pm Anywhere on Earth UTC-12)
- Submission standalone validation script provided (see Submissions)
October 24th - Ground truth intent labels, test utterances, all submitted entries, and raw results released
- Task 1 Submissions and Results
- Task 2 Submissions and Results

Timeline

Development data release: June 13th, 2022
Test data release: September 26th, 2022
Entry submission deadline: October 3rd, 2022
Final result announcement: October 24th, 2022
Paper submission: December 2nd, 2022
Paper acceptance notification: January 20th, 2023
Camera-ready submission deadline: January 27th, 2023
DSTC11 Workshop: SIGDIAL x INLG 2023, September 11-15th, 2023

DSTC11 Track 2 Tasks

See Task 1 - Intent Clustering for more details on Task 1.
See Task 2 - Open Intent Induction for more details on Task 2.

Running Baselines

Python 3 (>=3.7) is required. Using a conda/virtual environment is recommended.

# install dependencies
pip3 install -r requirements.txt

# run intent clustering (Task 1) baselines and evaluation
python3 -m sitod.run_experiment \
--data_root_dir dstc11 \
--experiment_root_dir results \
--config configs/run-intent-clustering-baselines.jsonnet

# run open intent induction (Task 2) baselines and evaluation
python3 -m sitod.run_experiment \
--data_root_dir dstc11 \
--experiment_root_dir results \
--config configs/run-open-intent-induction-baselines.jsonnet

Important Links

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Please cite the following papers if using the tasks, code, or data from this track in your work:

@misc{gung2023natcs,
      title={NatCS: Eliciting Natural Customer Support Dialogues}, 
      author={James Gung and Emily Moeng and Wesley Rose and Arshit Gupta and Yi Zhang and Saab Mansour},
      year={2023},
      eprint={2305.03007},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{gung2023intent,
      title={Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11}, 
      author={James Gung and Raphael Shu and Emily Moeng and Wesley Rose and Salvatore Romeo and Yassine Benajiba and Arshit Gupta and Saab Mansour and Yi Zhang},
      year={2023},
      eprint={2304.12982},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

dstc11-track2-intent-induction's People

Contributors

Stargazers

Watchers

Forkers

zdepablo judepark96 myeonghahwang scutcyr brunotech cheesama serotoninpm mikolajkrzyminski

dstc11-track2-intent-induction's Issues

About task one test set labeling issue

Can the results of the test set be self-created with labels？

How to merge team members

I am wondering how to merge team members in the google form.

Development dataset: empty utterances

Some utterances in development dataset contain no words. Examples of the utterance content: '', '.', '...'.

The proposal identifies datasets as "spoken conversations with manual transcriptions and annotations".

The questions are:

What is the reason to use empty transcriptions?
Can it affect turns with intents in evaluation datasets?

Additional info:
Development dataset contains no turns with Intent or InformIntent labels and empty utterance.
The total number of empty utterances is 142.
The full list of turns with empty utterances: empty_utterances.txt

Development dataset: Agent's turns with intent labels

Some Agent's turns contain intent labels.

The main task of the track is to identify intents of Customer. However, development dataset contains intent labels for Agent's turns as well.

The questions are:

Should we use only Customer's turns for clustering? Are Agent's turns with intent labels able to affect the results on evaluation datasets?
Has Classifier for InformIntent label been trained on all turns with intent labels or only Customer's intents?
Has Classifier for InformIntent label been trained on data from all domains (development and evaluation) or only on the particular domain?

Additional info:
The total number of Agent's turns with intent labels is 78, including 1 with InformIntent label as well.
The full list of Agent's turns with intent labels: agent_intent.txt

Has the final result been announced?

Dstc11 website shows the final result announcement is 10/14, but I didn't find the result entrance. Thank you!

Update files in Google Forms

Hello. I wonder how to update the file that was submitted in Google Forms.

How to evaluate the submissions?

Hi. It is clarified in the proposal that ACC and NMI will be used to evaluate models for task 1, F1 and coverage will be used to evaluate models for task 2. However, in the released raw results, only ACC is used to rank the submissions. I want to make sure which metrics will be used finally for these two tasks.

Thanks!

When will task2 test data be released?

Hello, excuse me, Where is task2's test set? and when will the test data of task2 be released?

Thanks

About running clustering condition on experiments.py

https://github.com/amazon-research/dstc11-track2-intent-induction/blob/0312cbcd666dc3efb20a32a0658ca5e723081b78/sitod/experiment.py#L133

I think the intention is to skip running clustering when there're prediction results already and also you don't want to ovewrite them.
The codes would be like
skip condition: Path.exists() and not self._overwrite
run condition: not(Path.exists() and not self._ovewrite) -> not Path.exists() or self._ovewrite.

and not seems to be changed to or.

A list of intents

You said you will provide a list of intents, where it is, hope you can answer, thank you

Why test-utterances.jsonl files have only two sentences?

Hi I've participated dstc11-track2

I did some experiments and found weird scores like:

RunID	NMI	ARI	ACC	Precision	Recall	F1	Example Coverage	Reference K	K	# Intents	# Utterances	# Utterances per Intent
test_model	100.0	100.0	100.0	100.0	100.0	100.0	100.0	1.0	1.0	42.0	3684.0	87.7

So I check the test-banking and test-finance dataset, but there are only two sentences:

{"utterance": "Dummy test utterance 1", "utterance_id": "finance_0000", "intent": "DummyLabel"}
{"utterance": "Dummy test utterance 2", "utterance_id": "finance_0001", "intent": "DummyLabel"}

Done.

Did i get wrong datasets?

About ground-truth label of dialog acts

In dev set, only predicted labels of dialog acts are provided, so where can I find the ground-truth labels of dialog acts? I think utterances with intent labels include all samples informing intents in dev set, is it correct?

Thanks!

When will the test set be sent?

hello, excuse me, when will the test set be released? and how will the test set be distributed? and how do i know if I've signed up successfully?

thanks

Submission for paper

Hello, when and where do I submit the workshop paper?

Will there be a submission site?

Will there be a submission site where you can submit a result and verify whether the format is ok?

questions abount the number of the conversations in the test phase

About 1k conversations provided in the dev phase. Generally, for unsupervised tasks, the amount of data is not enough. I wonder if the number of test dataset is more?

About submission

Is there a requirement for the submitted file name? There is also an option to fill in the email address. Do we have to choose Google Mail? (I don't mean the email address of the bind form)

Is it right to use word 'open' when explain Task 1

Hello, Thank you for organizing this track!:)
I'm participating in task1 and have some questions.
In task 1, there was no in-domain train data for the test set.
For this reason, I think Task 1 is also an 'open' intent clustering task.
Is it right to use the word 'open' to explain task 1 in the workshop paper or other reports?

About the test set samples

I notice that in the test-banking/dialogues.jsonl or test-finance/dialogues.jsonl, there are cases where "dialogue_acts" is "[InformIntent]", but the intents are []. And there are also cases where "dialogue_acts" is "[InformIntent]", but the intents are ["DummyLabel"]. Does this mean we only need to cluster the sentence with the "["DummyLabel"]" intents?

Results of workshop paper

Hello, we saw "Camera-ready submission deadline: January 27th, 2023", but we have not received any news about the workshop. May I ask when you will announce it?

The reviewers' reviews of the workshop paper cannot be found in the CMT system

Hi, we received an acceptance notice for the paper in the DSTC11 workshop, but we cannot find the reviewers' reviews in the CMT system.

NO

How many submissions from one team are allowed?

No google accounts

Hello. It seems the submission is done via google form, but i do not have a google account, and i have difficult creating one. Are there other ways of submitting the files?

sumission code issue

i wonder should we submit our complete code(how we trained model etc) in this deadline("Entry submission deadline: October 3rd, 2022")? or just summit one schema json file?
thanks!

About Dialogue act in task 2.

In dstc11/task2-open-intent-induction.md, it is written as "Participants will be able to use the provided automatic dialog act classifier predictions as inputs to their system.".
But there seems to be no related codes.
What do we have to do to use that automatic dialogue act classifier?

Can you provide PyTorch version？

Hello, Question for Testing Dataset in DSTC 11

In Test Dataset, it contains only utterances labeled with Inform Intent.
We wonder whether the input in evaluation is the same as the dev dataset in GitHub.

{"intent_id": "intent_1", "utterances": ["I want to book a flight", "I need a flight", ...]}
{"intent_id": "intent_2", "utterances": ["I want to reserve a room", "I need a hotel room", ...]}
...

Thank you.

amazon-science / dstc11-track2-intent-induction Goto Github PK