DSTC 8: End-to-End Multi-Domain Track - Task 2 - Fast Adaptation

Organized by adatkins - Current server time: Dec. 5, 2019, 5:36 p.m. UTC

Previous

Final Submissions
June 17, 2019, midnight UTC

Current

Test Submissions
June 17, 2019, midnight UTC

End

Competition Ends
Oct. 14, 2019, 6:59 a.m. UTC

Fast Adapation of Predicted User Responses in Goal-Oriented Dialogue

Welcome to the DSTC 8 competition! In this task you will develop a system that can predict or generate a user response to a dialogue from any domain.

Reminder: submissions are due Sunday October 13, 2019 at 11:59 pm Pacific Daylight Time (PDT)

Quick Links

News

  • [2019-11-05] Human evaluation results available, see the official DSTC 8 spreadsheet. Full rankings per testset, dialogue, and metric are available here.
  • [2019-10-21] Automated evaluation results are available, see below. Full results available here.
  • [2019-10-07] Submission deadline has been extended to Sunday October 13, 2019 at 11:59 pm Pacific Daylight Time (PDT)
  • [2019-09-24] Submission format details posted on the evaluation page.
  • [2019-09-23] Evaluation data is posted. See the evaluation and data pages for details.
  • [2019-07-15] Codalab competition back online. Due to a major outage in the Codalab platform, participants who registered before July 12, 2019 must re-register.
  • [2019-06-17] Task description and data are released.
  • [2019-06-10] Registration is Open! Registrants will be approved starting June 17, and will have access to the data then.

Results

Automated Results

Test SetMetricBaseline 1Baseline 2Team ATeam BTeam CTeam D
MetaLWOz (heldout) - pure task BLEU 1 0.0956738 0.071289217 0.247069566 0.127296662 0.139526286 0.092481424
MetaLWOz (heldout) - pure task BLEU 4 0.0235307 0.015362318 0.110891162 0.034489617 0.036630688 0.016711193
MetaLWOz (heldout) - cross task BLEU 1 0.0593872 0.050962451 0.172918246 0.103930693 0.122790246 0.093176351
MetaLWOz (heldout) - cross task BLEU 4 0.00412379 0.003077277 0.035006828 0.016665385 0.022570272 0.017658968
MultiWOz (single domain per dialogue) BLEU 1 0.178067 0.143607441 0.392799366 0.114798727 0.100198825 0.2477592
MultiWOz (single domain per dialogue) BLEU 4 0.0257248 0.014038822 0.157241504 0.021712508 0.019080919 0.068340561
MultiWOz (single domain per dialogue) Intent F1 0.515258 0.466089053 0.78690419 0.644938428 0.613976032 0.549802744
MultiWOz (single domain per dialogue) Intent + Slots F1 0.265817 0.195955486 0.599330053 0.483334653 0.418702922 0.423359061

Submission Evaluation

For the evaluation dataset, please check the data sources page.

Evaluation for this task is using automatic as well as human metrics.

During development, participants can track their progress using word overlap metrics, e.g. using nlg-eval. Depending on the parameters of scripts/make_test_set, you can determine within-task or across-task generalization within a MetaLWoz domain.

Towards the end of the evaluation phase, we will provide a zip file with dialogues in a novel domain and a file specifying dialogues and turns that participants should predict. The file format is the same as the one produced by scripts/make_test_set, each line is a valid JSON object with the following schema: { "support_dlgs": ["SUPPORT_DLG_ID_1", "SUPPORT_DLG_ID_2", ...], "target_dlg": "TARGET_DLG_ID", "predict_turn": "ZERO-BASED-TURN-INDEX" }

Dialogue IDs uniquely identify a dialogue in the provided MetaLWoz zip file.

To generate predictions, condition your (pre-trained) model on the support dialogues, and use the target dialogue history as context to predict the indicated user turn.

Make sure that (1) your model has never seen the test domain before predicting and (2) reset your model before adapting it to the support set and predicting each dialogue.

On the responses submitted by the participants, we will

  1. Run a fixed NLU module to determine whether response intents and slots are in line with ground truth.
  2. Ask crowd workers to evaluate informativeness and appropriateness of the responses.

Submission Format

Submissions should have one response per line, in JSON format, with this schema: { "dlg_id": "DIALOGUE ID FROM ZIP FILE", "predict_turn": "ZERO-BASED PREDICT TURN INDEX", "response": "PREDICTED RESPONSE" } where dlg_id and predict_turn correspond to the target_dlg id and predict_turn of the test specification file above, respectively.

Additionally we ask that submissions be clearly marked or annotated according to which test spec from the evaluation dataset they correspond to. This could be subdirectories for each of the test specs, or corresponding prefixes or filenames in the final zip archive.

A sample submission is available, based on generating responses like our retrieval baseline published on GitHub.

./scripts/retrieval-baseline predict your-model eval_data/dstc8-metalwoz-heldout.zip \
--test-spec eval_data/test-spec-metalwoz-held-out-pure-task.jsonl \
--nlg-eval-out-dir submission/predictions-metalwoz-heldout-pure

./scripts/retrieval-baseline predict your-model eval_data/dstc8-metalwoz-heldout.zip \
--test-spec eval_data/test-spec-metalwoz-held-out-cross-task.jsonl \
--nlg-eval-out-dir submission/predictions-metalwoz-heldout-cross

./scripts/retrieval-baseline predict your-model eval_data/dstc8-multiwoz2.0.zip \
--test-spec eval_data/test-spec-multiwoz2.0.jsonl \
--nlg-eval-out-dir submission/predictions-multiwoz2.0

When submitting your results, please fill in the fields below in the submission form. Please provide an email address the organizers can contact you at, if it differs from the one registered with Codalab.

  • Team Name
  • Method Name (brief)
  • Method Description (any other model or submission details you would like to provide)
  • Organization/Affiliation

We will take your last submission to the platform as your final submission, by default. We will evaluate more submissions if our time and budget allow.

Terms and Conditions

By registering for this competition the participant agrees to the following terms.

Publication

Participants will not publish their results, code, or models prior to the DSTC 8 workshop.

Disqualification

The organizers reserve the right to

  • reject submissions that violate the competition rules, are illegible, or are late.
  • reject submissions from multiple accounts deemed to belong to the same individual or team.
  • investigate suspicious submissions.
  • inspect the source code used to generate the submissions upon request.

Submission

Final submissions will be accepted up to October 6, 2019 at 11:59pm Eastern Standard Time.

Results must be submitted from one Codalab account per team.

Each team may only have a single Codalab account.

The submission must be annotated with an affiliation to be considered for evaluation.

All results must be submitted through the Codalab platform.

Participants agree to not share code privately or outside of the Codalab platform until the DSTC 8 workshop.

Data Use

Participants agree to use the validation data provided in the evaluation phase for model validation and evaluation only.

Should participants use external data for model training or evaluation they agree to either

  • use free publicly available data
  • release the data

and describe the external data in their submission.

Data for the competition is provided via external links and is subject to the licenses included therein.

Fair Use

Participants will not abuse the Codalab infrastructure to gain a competitive advantage in the competition.

Participants will conduct themselves in a respectful manner on the Codalab website or face disqualification.

Schedule

  • June 3, 2019: Registration opens.
  • June 17, 2019: Competition opens, Reddit and MetaLWOz training data is released, and development begins.
  • September 23, 2019: MultiWOz test data is released for participant evaluation.
  • October 13, 2019: Entry submission deadline.
  • October 21, 2019: Objective evaluation results are released.
  • TBA: Human evaluation results are released.
  • TBA: DSTC8 paper submission deadline.
  • TBA: DSTC8 workshop.

Task Description

Baseline code and other task details can be found here.

In goal-oriented dialogue, data is scarce. This is a problem for dialogue system designers, who cannot rely on large pre-trained models. The aim of our challenge is to develop natural language generation (NLG) models which can be quickly adapted to a new domain given a few goal-oriented dialogues from that domain.

The suggested approach roughly follows the idea of meta-learning (e.g. MAML: Finn, Abbeel, Levine, 2017, Antoniou et al. 2018, Ravi & Larochelle 2017): During the training phase, train a model that can be adapted quickly to a new domain:

During the evaluation phase, the model should predict the final user turn of an incomplete dialogue, given some (hundreds) of examples from the same domain:

Organizers

You can contact all the contest organizers at dstc8-task2@microsoft.com

The organizers, affiliated with MSR Montréal, are:

  • Hannes Schulz
  • Adam Atkinson
  • Shikhar Sharma
  • Mahmoud Adada
  • Kaheer Suleman

Test Submissions

Start: June 17, 2019, midnight

Description: Test the format of your submissions and troubleshoot errors here. Note the leaderboard does nothing

Final Submissions

Start: June 17, 2019, midnight

Description: Final model predictions submitted to the competition.

Competition Ends

Oct. 14, 2019, 6:59 a.m.

You must be logged in to participate in competitions.

Sign In
# Username Score
1 adatkins 0.0