MADAR Shared Task Subtask-1: Travel Domain Dialect Identification

Organized by sabithassan - Current server time: June 19, 2019, 8:46 p.m. UTC

Previous

Test Phase
May 5, 2019, midnight UTC

Current

Development Phase
April 9, 2019, midnight UTC

End

Competition Ends
May 18, 2019, noon UTC

Task

Welcome to the Subtask 1 of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification, organized at The Fourth Arabic Natural Language Processing Workshop (WANLP 2019). In subtask 1, participants are provided with a large-scale collection of parallel sentences in the travel domain covering the dialects of 25 cities from the Arab World plus standard Arabic (MSA). The task is to build systems that predict a dialect class among one of the 26 labels (25+ MSA) for given sentences.

Required Task Description

  • All teams are required to provide a 1-page description of their submitted systems. Not providing this description will make the submission incomplete and it will not be considered.
  • The description should be sent by email to madar.shared.task@gmail.com
  • The description should minimally include: 
    • Team name: <team>
    • Number of runs submitted: <1,2,3>

    • Participants:  <person1> <email> <affiliation> <country>

                          <person2> <email> <affiliation> <country>

                           ...

    • Resources used:   <dictionary X>, <corpus Y>, ...

    • Tools used: <POS tagger X>, <IR system Y>, <word alignment system Z>,

                       <machine learning library T>, ... 

    • Techniques used: Any special techniques or insights

Optional Task Description (Published Paper)

  • Teams are strongly encouraged to submit 4-page system descriptions to be included in the proceedings of the Arabic Natural Language Processing Workshop.
  • Submission instruction are here: http://wanlp2019.arabic-nlp.net

 

Dates

  • April 29, 2019: Registration deadline
  • May 6, 2019: Test set available
  • May 17, 2019: Codalab shared task submission deadline.
  • May 17, 2019: Required task Description submission deadline.
  • May 27, 2019: Shared task system papers due
  • May 30, 2019: Notification of acceptance
  • June 5, 2019: Camera-ready version of shared task system papers due
  • August 1, 2019: ACL 2019 Workshop in Florence

Task Organisers

  • Houda Bouamor (Fortia Financial Solutions, France)
  • Sabit Hasan (Carnegie Mellon University Qatar, Qatar)
  • Nizar Habash (New York University Abu Dhabi, UAE)

For any questions related to this subtask, please post to this google group, or contact the organizers directly using the following email address: madar.shared.task@gmail.com 

References

  • Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., et al. (2018). The MADAR Arabic Dialect Corpus and Lexicon. In Proceedings of the 11th International Conference on Language Resources and Evaluation. (PDF)
  • Salameh, M., Bouamor, H. and Habash, N. (2018). Fine-Grained Arabic Dialect Identification. In Proceedings of the 27th International Conference on Computational Linguistics. (PDF)

Evaluation Criteria

Systems will be evaluated using Macro Averaged F1-score.

Submission format information is available from the 'Participate' tab above.

Shared Task Metrics and Restrictions:

The performance of submitted systems will be evaluated on
MADAR-Corpus26-test.tsv which will be made available during the
evaluation phase. MADAR-Corpus6-train.tsv and
MADAR-Corpus6-dev.tsv are provided to aid building the models.
Participants are welcome to use both of these files for training
purposes.

The training data from MADAR-Shared-Task-Subtask-2 is allowed.
External manually labelled data sets are *NOT* allowed.
However, the use of publicly available unlabelled data is allowed. 

IMPORTANT: Participants are NOT allowed to use
MADAR-Corpus26-dev.tsv for training purposes. Participants must
report the performance of their best system on
MADAR-Corpus26-dev.tsv in their Shared Task system description
paper.

License:

Copyright 2018 Carnegie Mellon University and New York University Abu
Dhabi. All Rights Reserved.

A license to use and copy this dataset and its documentation solely
for your internal research and evaluation purposes, without fee and
without a signed licensing agreement, is hereby granted upon your
download of the dataset, through which you agree to the following: 1)
the above copyright notice, this paragraph and the following three
paragraphs will prominently appear in all internal copies and
modifications; 2) no rights to sublicense or further distribute this
software are granted; 3) no rights to modify this dataset are granted;
and 4) no rights to assign this license are granted. Please Contact
the Carnegie Mellon University “CMU” Center for Technology Transfer
and Enterprise Creation, 4615 Forbes Avenue, Suite 302, Pittsburgh, PA
15213 - phone 412.268.7393, for commercial licensing opportunities, or
for further distribution, modification or license rights.

Created by Houda, Bouamor, Nizar Habash, Mohammad Salameh, Wajdi
Zaghouani, Owen Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa,
Fadhl Eryani, Alexander Erdmann and Kemal Oflazer.

IN NO EVENT SHALL CMU OR NYU, OR THEIR EMPLOYEES, OFFICERS, AGENTS OR
TRUSTEES ("COLLECTIVELY "CMU/NYU PARTIES") BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY
KIND, INCLUDING LOST PROFITS, ARISING OUT OF ANY CLAIM RESULTING FROM
YOUR USE OF THIS DATASET AND ITS DOCUMENTATION, EVEN IF ANY OF CMU/NYU
PARTIES HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH CLAIM OR DAMAGE.

CMU/NYU SPECIFICALLY DISCLAIMS ANY WARRANTIES OF ANY KIND REGARDING
THE DATASET, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE, OR THE ACCURACY OR USEFULNESS, OR COMPLETENESS OF THE
SOFTWARE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY,
PROVIDED HEREUNDER IS PROVIDED COMPLETELY "AS IS". REGENTS HAS NO
OBLIGATION TO PROVIDE FURTHER DOCUMENTATION, MAINTENANCE, SUPPORT,
UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

If you use this resource, cite:

Bouamor, Houda, Nizar Habash, Mohammad Salameh, Wajdi Zaghouani, Owen
Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa, Fadhl Eryani,
Alexander Erdmann and Kemal Oflazer. The MADAR Arabic Dialect Corpus
and Lexicon. In Proceedings of the International Conference on
Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.

Development Phase

Start: April 9, 2019, midnight

Test Phase

Start: May 5, 2019, midnight

Competition Ends

May 18, 2019, noon

You must be logged in to participate in competitions.

Sign In