Japanese-Chinese bidirectional machine translation

Organized by ajaynagesh - Current server time: Feb. 24, 2020, 9:49 p.m. UTC

Current

First phase
Jan. 17, 2020, midnight UTC

End

Competition Ends
Never

Welcome!

  • Goals

    • To promote Asian language translation research

    • To promote the use of large, noisy parallel web corpora for MT

    • To promote research on processing of parallel texts

    • To promote research on exploiting sub-corpus provenance information

 

  • Important Dates

    • Jan 17, 2020: Release of training and development resources 
    • Feb 11, 2020: Release of the baseline system

    • Mar 17, 2020: Release of test data

    • Mar 31, 2020: Official submissions due by web upload

    • Apr 6, 2020: Initial, populated leaderboard release
    • Apr 6, 2020: System description paper due

    • May 4, 2020: review feedback

    • May 18, 2020: camera-ready paper due

 [All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”)]

  • Task Description

    • We offer two tasks:

      • Japanese-to-Chinese MT

      • Chinese-to-Japanese MT

    • We provide a large, noisy set of Japanese-Chinese segment pairs built from web data.

    • We evaluate system translations on a (secret) mixed-genre test set, curated for high quality segment pairs.

    • After receiving test data, participants have one week to submit translations.

    • After all submissions are received, we will post a populated leaderboard that will continue to receive post-evaluation submissions.

    • The evaluation metric for the shared task is 4-gram character Bleu.

    • We encourage participants to use only the provided parallel training data.  Use of other, non-parallel data is allowed if thoroughly documented and in principle, publicly available.

    • Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.

  • Acknowledgments:  Thanks to Didi Chuxing for providing data and research time to support this shared task.

NOTE: We are releasing the data to participants. Please register to the competition to get the dataset. We will update the leaderboard with the baseline model and release instructions to train a baseline model in the coming week. Stay tuned

Evaluation Criteria

The evaluation metric for the shared task is 4-gram character Bleu.

The script to be used for Bleu computation is here (almost identical to that in Moses with a few minor differences). Instructions to run the script is in the baseline code that we released for the shared task. (link)

Terms and Conditions

We encourage participants to use only the provided parallel training data. Use of other data is allowed, if thoroughly documented (and in principle, publicly available). Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.

Task participants may use the released corpora for research purposes.  Other uses should be cleared with data owners.

First phase

Start: Jan. 17, 2020, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In