Second NADI Shared Task (Subtask 2.1)

Organized by chiyu94 - Current server time: Jan. 25, 2021, 11:18 a.m. UTC

Previous

Development
Dec. 11, 2020, noon UTC

Current

Test
Dec. 27, 2020, noon UTC

Next

Post-Evaluation
Jan. 29, 2021, 11:59 a.m. UTC

Welcome to Subtask 2.1 of the Second NADI shared task!

Arabic has a wide variety of dialects, many of which remain under-studied primarily due to lack of data. The goal of the Nuanced Arabic Dialect Identification (NADI) is to alleviate this bottleneck by affording the community with diverse data from 21 Arab countries. The data can be used for modeling dialects, and NADI focuses on dialect identification. Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. Previous work on Arabic dialect identification has focused on coarse-grained regional varieties such as Gulf or Levantine (e.g., Zaidan and Callison-Burch, 2013; Elfardy and Diab, 2013; Elaraby and Abdul-Mageed, 2018) or country-level varieties (e.g., Bouamor et al., 2018; Zhang and Abdul-Mageed, 2019) such as the MADAR shared task in WANLP 2019 (Bouamor, Hassan, and Habash, 2019). The MADAR shared task also involved city-level classification on human translated data. Abdul-Mageed, Zhang, Elmadany, and Ungar (2020) also developed models for detecting city-level variation. NADI aims at maintaining this theme of modeling fine-grained variation. 

Shared Task:

NADI targets province-level dialects, and as such is the first to focus on naturally-occurring fine-grained dialect at the sub-country level. The NADI 2020 shared was held with WANLP 2020 (Abdul-Mageed, Zhang, Bouamor, and Habash, 2020). The NADI 2021 shared task will be held with WANLP@EACL2021 and will continue to focus on fine-grained dialects with new datasets and efforts to distinguish both modern standard Arabic (MSA) and dialects (DA) according to their geographical origin. The data covers a total of 100 provinces from all 21 Arab countries and come from the Twitter domain. Evaluation and task set up follow the NADI 2020 shared task. The subtasks involved include:

 (To receive access to the data, teams intending to participate are invited to fill in the form on the official website of NADI shared task. )

The subtasks involved include:

  • Subtask 1.1: Country-level MSA identification: A total of 21,000 tweets, covering 21 Arab countries.
  • Subtask 1.2: Country-level DA identification: A total of 21,000 tweets, covering 21 Arab countries.  

 

  • Subtask 2.1: Province-level MSA identification: A total of 21,000 tweets, covering 100 provinces.
  • Subtask 2.2: Province-level DA identification: A total of 21,000 tweets, covering 100 provinces.  

Unlabeled data: 

Participants will also be provided with an additional 10M unlabeled tweets that can be used in developing their systems for either or both of the tasks.

Metrics:

The evaluation metrics will include precision/recall/f-score/accuracy. Macro Averaged F-score will be the official metric.

Participating teams will be provided with a common training data set and a common development set. No external manually labelled data sets are allowed. A blind test data set will be used to evaluate the output of the participating teams. All teams are required to report on the development and test set in their writeups.

The shared task will be hosted through CODALAB. Teams will be provided with a CODALAB link for each shared task.

  • CODALAB link for NADI Shared Task Subtask 1.1: https://competitions.codalab.org/competitions/27768
  • CODALAB link for NADI Shared Task Subtask 1.2: https://competitions.codalab.org/competitions/27769
  • CODALAB link for NADI Shared Task Subtask 2.1: https://competitions.codalab.org/competitions/27770
  • CODALAB link for NADI Shared Task Subtask 2.2: https://competitions.codalab.org/competitions/27771

Important dates:

  • December 15, 2020: Release of training data and scoring script
  • December 27, 2020: Registration deadline
  • December 28, 2020: Test set made available
  • January 18, 2021: Codalab system submission deadline
  • January 27, 2021: Shared task system paper submissions due
  • February 8, 2021: Notification of acceptance
  • February 15, 2021: Camera-ready version of shared task system papers due (strict!)
  • April 19-20, 2020: Workshop Dates

Contact:

Please visit the official website of the NADI shared task for more information.

For any questions related to this task, please contact the organizers directly using the following email address: ubc.nadi2020@gmail.com 

 

Evaluation Criteria

Metrics: The evaluation metrics will include precision/recall/f-score/accuracy. Macro Averaged F-score will be the official metric.

Terms and Conditions

To receive access to the data, teams intending to participate are invited to fill in the form on the official website

Copyright (c) 2021 The University of British Columbia, Canada; Carnegie Mellon University Qatar; New York University Abu Dhabi. All rights reserved.

Development

Start: Dec. 11, 2020, noon

Description: Development phase: Develop your models and submit prediction labels on the DEV set of subtask 1. Note: The name of your submission should be 'teamname_subtask21_dev_numberOFsubmission.zip' that includes a text file of your prediction (e.g., A submission 'UBC_subtask21_dev_1.zip' that is the zip file of my first prediction, 'UBC_subtask21_dev_1.txt'.)

Test

Start: Dec. 27, 2020, noon

Description: Test phase: Submit your prediction labels on the TEST set of subtask 1. Each team is allowed a maximum of 3 submissions. Note: The name of your submission should be 'teamname_subtask21_test_numberOFsubmission.zip' that includes a text file of your predictions (e.g., A submission 'UBC_subtask21_test_1.zip' that is the zip file of my prediction, 'UBC_subtask21_test_1.txt'.)

Post-Evaluation

Start: Jan. 29, 2021, 11:59 a.m.

Description: Post-Evaluation: Submit your prediction on the TEST set of subtask 1 after the competition deadline. The name of your submission should be 'teamname_subtask21_test_numberOFsubmission.zip' that includes a text file of your predictions (e.g., A submission 'UBC_subtask21_test_1.zip' that is the zip file of my prediction, 'UBC_subtask21_test_1.txt'.)

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In