Japanese-Chinese bidirectional machine translation (JA --> ZH)

Organized by ajaynagesh - Current server time: April 9, 2020, 11:13 a.m. UTC

Current

Evaluation phase
Jan. 17, 2020, midnight UTC

End

Competition Ends
April 21, 2020, 4:31 p.m. UTC

Welcome!

  • Goals

    • To promote Asian language translation research

    • To promote the use of large, noisy parallel web corpora for MT

    • To promote research on processing of parallel texts

    • To promote research on exploiting sub-corpus provenance information

 

  • Important Dates

    • Jan 17, 2020: Release of training and development resources 
    • Feb 11, 2020: Release of the baseline system

    • Mar 23, 2020: Release of test data

    • Apr 20, 2020: Official submissions due by web upload

    • Apr 24, 2020: Initial, populated leaderboard release
    • Apr 24, 2020: System description paper due

    • May 11, 2020: review feedback

    • May 18, 2020: camera-ready paper due

 [All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”)]

  • Task Description

    • We offer two tasks:

      • Japanese-to-Chinese MT

      • Chinese-to-Japanese MT

    • We provide a large, noisy set of Japanese-Chinese segment pairs built from web data.

    • We evaluate system translations on a (secret) mixed-genre test set, curated for high quality segment pairs.

    • After receiving test data, participants have one week to submit translations.

    • After all submissions are received, we will post a populated leaderboard that will continue to receive post-evaluation submissions.

    • The evaluation metric for the shared task is 4-gram character Bleu.

    • We encourage participants to use only the provided parallel training data.  Use of other, non-parallel data is allowed if thoroughly documented and in principle, publicly available.

    • Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.

  • Acknowledgments:  Thanks to Didi Chuxing for providing data and research time to support this shared task.

Evaluation Criteria

The evaluation metric for the shared task is 4-gram character Bleu.

The script to be used for Bleu computation is here (almost identical to that in Moses with a few minor differences). Instructions to run the script is in the baseline code that we released for the shared task. (link)

[NEW] You can download the evaluation scripts by following "Learn the details" tab --> get_starting_kit. You will find examples of how to run the evaluation script on the development dataset as reference.  

 

Terms and Conditions

We encourage participants to use only the provided parallel training data. Use of other data is allowed, if thoroughly documented (and in principle, publicly available). Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.

Task participants may use the released corpora for research purposes.  Other uses should be cleared with data owners.

Download Size (mb) Phase
Starting Kit 0.684 #1 Evaluation phase

Evaluation phase

Start: Jan. 17, 2020, midnight

Competition Ends

April 21, 2020, 4:31 p.m.

You must be logged in to participate in competitions.

Sign In