MADAR Shared Task Subtask-2: Twitter User Dialect Identification

Organized by sabithassan - Current server time: June 19, 2019, 9:29 p.m. UTC

Previous

Test Phase
May 5, 2019, midnight UTC

Current

Development Phase
April 9, 2019, midnight UTC

End

Competition Ends
May 18, 2019, noon UTC

Task

Welcome to the Subtask 2 of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification, organized at The Fourth Arabic Natural Language Processing Workshop (WANLP 2019).The goal of Subtask 2 is to predict countries of Twitter users from 21 Arab Countries by using information about tweets posted by the Twitter users.

Required Task Description

  • All teams are required to provide a 1-page description of their submitted systems. Not providing this description will make the submission incomplete and it will not be considered.
  • The description should be sent by email to madar.shared.task@gmail.com
  • The description should minimally include: 
    • Team name: <team>
    • Number of runs submitted: <1,2,3>

    • Participants:    <person1> <email> <affiliation> <country>

                            <person2> <email> <affiliation> <country>

                             ...

    • Resources used:   <dictionary X>, <corpus Y>, ...

    • Tools used:             <POS tagger X>, <IR system Y>, <word alignment system Z>,

                                    <machine learning library T>, ... 

    • Techniques used: Any special techniques or insights

Optional Task Description (Published Paper)

  • Teams are strongly encouraged to submit 4-page system descriptions to be included in the proceedings of the Arabic Natural Language Processing Workshop.
  • Submission instruction are here: http://wanlp2019.arabic-nlp.net

 

Dates

  • April 29, 2019: Registration deadline
  • May 6, 2019: Test set available
  • May 17, 2019: Codalab shared task submission deadline.
  • May 17, 2019: Required task Description submission deadline.
  • May 27, 2019: Shared task system papers due
  • May 30, 2019: Notification of acceptance
  • June 5, 2019: Camera-ready version of shared task system papers due
  • August 1, 2019: ACL 2019 Workshop in Florence

Task Organisers

  • Houda Bouamor (Fortia Financial Solutions, France)
  • Sabit Hasan (Carnegie Mellon University Qatar, Qatar)
  • Nizar Habash (New York University Abu Dhabi, UAE)

For any questions related to this subtask, please post to this google group, or contact the organizers directly using the following email address: madar.shared.task@gmail.com 

References

  • Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., et al. (2018). The MADAR Arabic Dialect Corpus and Lexicon. In Proceedings of the 11th International Conference on Language Resources and Evaluation. (PDF)
  • Salameh, M., Bouamor, H. and Habash, N. (2018). Fine-Grained Arabic Dialect Identification. In Proceedings of the 27th International Conference on Computational Linguistics. (PDF)

Evaluation Criteria

Systems will be evaluated using Macro Averaged F1-score.

Submission format information is available from the 'Participate' tab above.

Shared Task Metrics and Restrictions:

The performance of submitted systems will be evaluated on
predictions of country labels for Twitter users in
MADAR-Twitter-Subtask-2.TEST.user-label.tsv.

IMPORTANT: Participants are NOT allowed to use
MADAR-Twitter-Subtask-2.DEV.user-label.tsv and
MADAR-Twitter-Subtask-2.DEV.user-tweet-features.tsv
for training purposes. Participants must report the performance
of their best system on MADAR-Twitter-Subtask-2.DEV.user-label.tsv
in their Shared Task system description paper.

IMPORTANT: Participants can only use the ***text*** of the tweets
obtained through (MADAR-Obtain-Tweets.py) and the specific
information about the tweets provided in
MADAR-Twitter-Subtask-2.TRAIN.user-tweet-features.tsv.
Participants are NOT allowed to use additional tweets, nor
are they allowed to use outside information about the Twitter User.
Specifically -- participants should not use meta
data from Twitter about the users or the tweets, e.g.,
geo-location data.

The training data from MADAR-Shared-Task-Subtask-1 is allowed.
External manually labelled data sets are *NOT* allowed.
However, the use of publicly available unlabelled data is allowed.

License:

Copyright 2019 Carnegie Mellon University and New York University Abu
Dhabi. All Rights Reserved.

This work is licensed under the Creative Commons Attribution-NonCommercial-
NoDerivatives 4.0 International License.
(https://creativecommons.org/licenses/by-nc-nd/4.0/)

If you use this resource, cite:

Bouamor, Houda, Sabit Hassan, Nizar Habash and Kemal Oflazer.
The MADAR Shared Task on Arabic Fine-Grained Dialect Identification.
In Proceedings of the Workshop for Arabic Natural Language Processing.
Florence, Italy, 2019.

Development Phase

Start: April 9, 2019, midnight

Test Phase

Start: May 5, 2019, midnight

Competition Ends

May 18, 2019, noon

You must be logged in to participate in competitions.

Sign In