MADAR Shared Task Subtask-2: Twitter User Dialect Identification

Organized by sabithassan - Current server time: April 26, 2019, 4:24 a.m. UTC


Development Phase
April 9, 2019, midnight UTC


Test Phase
May 6, 2019, midnight UTC


Competition Ends
May 14, 2019, midnight UTC


Welcome to the Subtask 2 of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification, organized at The Fourth Arabic Natural Language Processing Workshop (WANLP 2019).The goal of Subtask 2 is to predict countries of Twitter users from 21 Arab Countries by using information about tweets posted by the Twitter users.


  • December 10, 2018: first announcement of the shared task
  • January 7, 2019: set up of shared task website
  • January 28, 2019: registration begins and release of initial training sets and scoring script
  • March 18, 2019: final training data release
  • April 29, 2019: registration deadline
  • May 6, test set available
  • May 13, 2019: systems' outputs collected
  • May 20, 2019: system results due to participants
  • May 27, 2019: shared task system papers due
  • June 10, 2019: reviews due
  • June 17, 2019: notification of acceptance
  • June 24, 2019: camera-ready version of shared task system papers due
  • August 1, 2019: ACL 2019 Workshop in Florence

Task Organisers

  • Houda Bouamor (Fortia Financial Solutions, France)
  • Sabit Hasan (Carnegie Mellon University Qatar, Qatar)
  • Nizar Habash (New York University Abu Dhabi, UAE)

For any questions related to this subtask, please post to this google group, or contact the organizers directly using the following email address: 


  • Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., et al. (2018). The MADAR Arabic Dialect Corpus and Lexicon. In Proceedings of the 11th International Conference on Language Resources and Evaluation. (PDF)
  • Salameh, M., Bouamor, H. and Habash, N. (2018). Fine-Grained Arabic Dialect Identification. In Proceedings of the 27th International Conference on Computational Linguistics. (PDF)

Evaluation Criteria

Systems will be evaluated using Macro Averaged F1-score.

Submission format information is available from the 'Participate' tab above.

Shared Task Metrics and Restrictions:

The performance of submitted systems will be evaluated on
predictions of country labels for Twitter users in

IMPORTANT: Participants are NOT allowed to use
MADAR-Twitter-Subtask-2.DEV.user-label.tsv and
for training purposes. Participants must report the performance
of their best system on MADAR-Twitter-Subtask-2.DEV.user-label.tsv
in their Shared Task system description paper.

IMPORTANT: Participants can only use the ***text*** of the tweets
obtained through ( and the specific
information about the tweets provided in
Participants are NOT allowed to use additional tweets, nor
are they allowed to use outside information about the Twitter User.
Specifically -- participants should not use meta
data from Twitter about the users or the tweets, e.g.,
geo-location data.

The training data from MADAR-Shared-Task-Subtask-1 is allowed.
External manually labelled data sets are *NOT* allowed.
However, the use of publicly available unlabelled data is allowed.


Copyright 2019 Carnegie Mellon University and New York University Abu
Dhabi. All Rights Reserved.

This work is licensed under the Creative Commons Attribution-NonCommercial-
NoDerivatives 4.0 International License.

If you use this resource, cite:

Bouamor, Houda, Sabit Hassan, Nizar Habash and Kemal Oflazer.
The MADAR Shared Task on Arabic Fine-Grained Dialect Identification.
In Proceedings of the Workshop for Arabic Natural Language Processing.
Florence, Italy, 2019.

Development Phase

Start: April 9, 2019, midnight

Test Phase

Start: May 6, 2019, midnight

Competition Ends

May 14, 2019, midnight

You must be logged in to participate in competitions.

Sign In