Goals
To promote Asian language translation research
To promote the use of large, noisy parallel web corpora for MT
To promote research on processing of parallel texts
To promote research on exploiting sub-corpus provenance information
Important Dates
Feb 11, 2020: Release of the baseline system
Mar 23, 2020: Release of test data
Apr 20, 2020: Official submissions due by web upload
Apr 24, 2020: System description paper due
May 11, 2020: review feedback
[All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”)]
Task Description
We offer two tasks:
Japanese-to-Chinese MT (link --> https://competitions.codalab.org/competitions/21430)
Chinese-to-Japanese MT
We provide a large, noisy set of Japanese-Chinese segment pairs built from web data.
We evaluate system translations on a (secret) mixed-genre test set, curated for high quality segment pairs.
After receiving test data, participants have one week to submit translations.
After all submissions are received, we will post a populated leaderboard that will continue to receive post-evaluation submissions.
The evaluation metric for the shared task is 4-gram character Bleu.
We encourage participants to use only the provided parallel training data. Use of other, non-parallel data is allowed if thoroughly documented and in principle, publicly available.
Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.
Acknowledgments: Thanks to Didi Chuxing for providing data and research time to support this shared task.
The evaluation metric for the shared task is 4-gram character Bleu.
The script to be used for Bleu computation is here (almost identical to that in Moses with a few minor differences). Instructions to run the script is in the baseline code that we released for the shared task. (link)
[NEW] You can download the evaluation scripts by following "Learn the details" tab --> get_starting_kit. You will find examples of how to run the evaluation script on the development dataset as reference.
We encourage participants to use only the provided parallel training data. Use of other data is allowed, if thoroughly documented (and in principle, publicly available). Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.
Task participants may use the released corpora for research purposes. Other uses should be cleared with data owners.
Download | Size (mb) | Phase |
---|---|---|
Starting Kit | 0.684 | #1 Post-evaluation phase |
We are maintaining a post-evalution online leaderboard in Codalab. If you wish to submit your system, the rules and instructions to submit are as follows:
Rules of submission:
Instructions for submitting your system
Start: Jan. 17, 2020, midnight
April 21, 2020, 4:32 p.m.
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | ajaynagesh | 26.3 |