Official Ranking: https://docs.google.com/document/d/172Hp24wKTbaaY1veVOLDJg2zWtYCaHiNhLc6RO1Ubjw/edit?usp=sharing
We have sent access invites to the teams (team creators)/participants who submitted predictions to the evaluation leaderboards regarding folders that contain relevant files for cross-team analysis. If you made a submission to the eval leaderboard but did not receive the invite, please mail us at sopank@andrew.cmu.edu.
For participants who did not submit an official evaluation prediction, but would like to participate in the cross-team analysis should also contact us at sopank@andrew.cmu.edu.
************
We request all authors of system papers to include one table per system/track/setting in their report with the following information:
- Track (coref/deixis/bridging)
- Setting (predicted or gold mentions?)
- Baseline(s) used/modified
- Learning framework (i.e., modifications to the baseline(s), including modifications to both training and decoding)
- Markable identification model
- Data used for training
- Data used for development
****************
**Fully annotated test data available now (Participate -> Get data)!**
****************
Submissions of System Descriptions and Analysis Papers
One unique aspect of this shared task is that we are inviting two types of submissions, first System descriptions (Due August 3), and then Analysis papers (Due September 6).
All shared task participants who have participated in at least one track are invited to submit a system description of up to 5 pages plus 2 extra pages for additional tracks plus references. Participants who have participated in multiple tasks may add additional pages (i.e., 2 extra pages for 2 tracks, or 4 extra pages for 3 tracks). Submissions may also add up to 2 pages per track for error analysis. This is optional, but very highly encouraged. Submissions should conform to the EMNLP 21 format. Submissions of system descriptions are due on August 3 to the shared task softconf site: https://www.softconf.com/emnlp2021/CODICRAC2021/. Note that submitted drafts will be made publicly available to author teams participating in the Analysis paper call.
We are also making an open call for what we are calling Analysis Papers. The purpose of these papers is to present a vision statement for the field based on a cross-cutting comparison across systems submitted or a critique of the shared task itself, pointing to gaps in the field not addressed in this shared task. Analysis papers may focus on just one track or multiple tracks. By August 10, we will make the submitted system descriptions and a performance table that reports the output on each instance for each of the participating systems. Submissions of analysis papers can be between 4 and 6 pages and should conform to the EMNLP 21 format. Submissions of analysis papers are due on September 6 to the shared task softconf site: https://www.softconf.com/emnlp2021/CODICRAC2021/.
Link to the style files: https://2021.emnlp.org/call-for-papers/style-and-formatting
Please feel free to reach out if you have questions or concerns: sharedtask-codicrac-emnlp2021@googlegroups.com
****************
Sopan Khosla (Carnegie Mellon University), Ramesh Manuvinakurike (Intel Labs), Vincent Ng (University of Texas at Dallas), Massimo Poesio (Queen Mary University of London), Michael Strube (Heidelberg Institute for Theoretical Studies), Carolyn Rosé (Carnegie Mellon University)
Contact Email: sharedtask-codicrac-emnlp2021@googlegroups.com
Welcome to the shared task on Anaphora Resolution in Dialogues. This shared task provides:
Three Tracks
Resolution of anaphoric identity
Resolution of bridging references
Resolution of discourse deixis/abstract anaphora
New paradigm: two-stage shared task to facilitate community-wide visioning
New emphasis on less-studied forms of anaphora: Abstract and Bridging
New Genre: Conversation
New computational techniques: transfer of learned representations across genres
New opportunities for interaction between communities: Discourse and Dialogue
New data set
This shared task is jointly run through CRAC 2021 and CODI 2021 at EMNLP 2021.
The first release of the data coming up on March 26, 2021!
The Future of Anaphora: Birds of a Feather Session for CODI-CRAC Shared Task at NAACL
We will host a Birds of a Feather session at NAACL at 2pm EST on June 8 (zoom details: Participate -> Get data)! At this birds of a feather session, we will provide tips for using provided baselines and scorer code as well as answer any questions participants might have. All are welcome to participate in the dawning of a new era for anaphora research!!
Coreference and anaphora resolution is a long-studied problem in computational linguistics and NLP. Although multiple benchmark datasets have been developed in recent years, the progress in this area has been hindered because most of these corpora do not emphasize potentially difficult cases. For example, datasets like OntoNotes [1], GAP [2], and LitBank [3] only focus on identity coreference and neglect relations like discourse deixis [9] or bridging anaphora [10], both of which introduce interesting research challenges.
Several works have shown the importance of syntax for anaphora resolution [4,5,6,7]. However, these features might not generalize well to conversations where the language is often grammatically incorrect and suffers from disfluencies. Apart from this, anaphora resolution in dialogue requires systems to perform speaker grounding of pronouns and focus on long-distance conversation structure, complexities that are often missing in news or Wikipedia articles which form a large chunk of current coreference resolution state-of-the-art datasets.
This shared-task goes above and beyond the simple cases of coreference resolution that arguably overestimate the performance of current SOTA models on the task. The goal of this shared task is to bring together researchers from disciplines like discourse analysis, dialogue systems, machine learning, and linguistics for the purpose of paving the way to advances in the area of coreference and anaphora resolution.
In this shared task, you will contribute approaches/models for addressing three types of anaphoric relations. The shared task is therefore structured into three sub-tasks. You have the option to participate in one or more of these sub-tasks. The three sub-tasks include:
Resolution of anaphoric identity
Resolution of bridging references
Resolution of discourse deixis/abstract anaphora
The data for the shared task includes conversations from five different domains:
ARRAU (Trains_91): Dev set available now!
Switchboard: Dev set available now!
AMI: Dev set available now!
Persuasion: Dev set available now!
Light: Dev set available now!
Since the main aim of this shared-task is to create generalizable models, we will only release the dev/test sets from each of the five domains mentioned above. However, participants are free to use ARRAU_PEAR, ARRAU_RST, ARRAU_Trains93, ARRAU_GNOME, and other external data to train their models.
The datasets for shared-task would be released in the Universal Anaphora format. We encourage the participants to refer to the most up-to-date documentation of the annotation format here.
The annotation of the data was co-sponsored by the Heidelberg Institute for Theoretical Studies gGmbH (HITS) and DALI.
Predictions for each dataset (during each phase) should be put in separate directories with the same name as the dataset. These directories should then be put in the main directory that needs to be zipped (recursively) and submitted to Codalab.
Expected Directory Structure:
./Solution
|__ ARRAU
|__ prediction_file
|__ Switchboard
|__ prediction_file
|__ AMI
|__ prediction_file
|__ Persuasion
|__ prediction_file
|__ Light
|__ prediction_file
Participants are free to submit predictions for one or more datasets. However, the leaderboard will be updated using the latest submission and will not carry over scores from previous submissions.
E.g. If the participant submits their prediction for ARRAU in their first submission, i.e. ./solution/ARRAU/ARR_preds, with other dataset folders empty, the leaderboard will incorporate the prediction score for ARRAU and 0.0 for other datasets. If they now want to submit results to the leaderboard for Switchboard, the correct way would be to submit both ./solution/ARRAU/ARRAU_preds (their best ARRAU predictions) and ./solution/Switchboard/Swbd_preds (new switchboard predictions) to ensure that the leaderboard displays non-zero scores for both datasets.
We will evaluate the submissions for anaphoric identity and discourse deixis using CoNLL Avg. F1 score [1]. For bridging, we will report Entity F1 scores.
The shared-task scorer is derived from universal-anaphora-scorer, and we encourage the participants to use this repository to evaluate their models locally.
Mar 26, 2021 - Training and development data released.
May 28 - Baselines and helper scripts released.
June 8 - Birds of a Feather session at NAACL 2021.
**June 21 - Test data for Eval - AR, Eval - Br (Pred), and Eval - DD (Pred) released.
July 10 - Submission deadline Eval - AR, Eval - Br (Pred), and Eval - DD (Pred).
July 11 - Test data for Eval - Br (Gold) and Eval - DD (Gold) released.
July 21 - Submission deadline Eval - Br (Gold) and Eval - DD (Gold).
Aug 3 - System descriptions due (Stage 1).
Aug 4 - Error-analysis and cross-team discussions start (Stage 2).
Sep 6 - Analysis reports due (Stage 2).
Sep 20 - Accept/reject notifications.
Oct 1 - Camera-ready version due.
Nov 7-11 - EMNLP 2021.
Predictions for each dataset (during each phase) should be put in separate directories with the same name as the dataset. These directories should then be put in the main directory that needs to be zipped (recursively) and submitted to codalab.
Expected Directory Structure
./Solution
|__ ARRAU
|__ prediction_file
|__ Switchboard
|__ prediction_file
|__ AMI
|__ prediction_file
|__ Persuasion
|__ prediction_file
|__ Light
|__ prediction_file
Participants are free to submit predictions for one or more datasets. However, the leaderboard will be updated using the latest submission and will not carry over scores from previous submissions.
E.g. If the participant submits their prediction for ARRAU in their first submission, i.e. ./solution/ARRAU/ARR_preds, with other dataset folders empty, the leaderboard will incorporate the prediction score for ARRAU and 0.0 for other datasets. If they now want to submit results to the leaderboard for Switchboard, the correct way would be to submit both ./solution/ARRAU/ARRAU_preds (their best ARRAU predictions) and ./solution/Switchboard/Swbd_preds (new switchboard predictions) to ensure that the leaderboard displays non-zero scores for both datasets.
We will evaluate the submissions for anaphoric identity and discourse deixis using CoNLL Avg. F1 score [1]. For bridging, we will report Entity F1 scores.
The shared-task scorer is derived from universal-anaphora-scorer, and we encourage the participants to use this repository to evaluate their models locally.
Participants should not share this data outside the shared-task!
Start: March 26, 2021, midnight
Description: Anaphora Resolution - Train your model on official training set for Anaphora Resolution. Feel free to use external datasets or knowledge sources. Submit results on the validation data.
Start: March 26, 2021, midnight
Description: Bridging - Train your model on official training set for Anaphora Resolution. Feel free to use external datasets or knowledge sources. Submit results on the validation data.
Start: March 26, 2021, midnight
Description: Discourse Deixis - Train your model on official training set for Anaphora Resolution. Feel free to use external datasets or knowledge sources. Submit results on the validation data.
Start: June 21, 2021, midnight
Description: Anaphora Resolution - Evaluate your model on the official test set. You can use validation set to aid with training.
Start: July 11, 2021, midnight
Description: Bridging - Evaluate your model on the official test set using gold mentions. You can use validation set to aid with training.
Start: June 21, 2021, midnight
Description: Bridging - Evaluate your model on the official test set using system mentions. You can use validation set to aid with training.
Start: July 11, 2021, midnight
Description: Discourse Deixis - Evaluate your model on the official test set using gold mentions. You can use validation set to aid with training.
Start: June 21, 2021, midnight
Description: Discourse Deixis - Evaluate your model on the official test set using system mentions. You can use validation set to aid with training.
July 21, 2021, midnight
You must be logged in to participate in competitions.
Sign In