Japanese-Chinese bidirectional machine translation (ZH --> JA)

Organized by ajaynagesh - Current server time: Sept. 27, 2020, 12:47 p.m. UTC

Previous

Post-evaluation phase
Jan. 17, 2020, midnight UTC

Current

Post-evaluation phase
Jan. 17, 2020, midnight UTC

End

Competition Ends
April 21, 2020, 4:32 p.m. UTC

Welcome!

  • Goals

    • To promote Asian language translation research

    • To promote the use of large, noisy parallel web corpora for MT

    • To promote research on processing of parallel texts

    • To promote research on exploiting sub-corpus provenance information

 

  • Important Dates

    • Jan 17, 2020: Release of training and development resources 
    • Feb 11, 2020: Release of the baseline system

    • Mar 23, 2020: Release of test data

    • Apr 20, 2020: Official submissions due by web upload

    • Apr 24, 2020:Release the official results
    • Apr 24, 2020: System description paper due

    • May 11, 2020: review feedback

    • May 18, 2020: camera-ready paper due

 [All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”)]

  • Task Description

    • We offer two tasks:

      • Japanese-to-Chinese MT

      • Chinese-to-Japanese MT

    • We provide a large, noisy set of Japanese-Chinese segment pairs built from web data.

    • We evaluate system translations on a (secret) mixed-genre test set, curated for high quality segment pairs.

    • After receiving test data, participants have one week to submit translations.

    • After all submissions are received, we will post a populated leaderboard that will continue to receive post-evaluation submissions.

    • The evaluation metric for the shared task is 4-gram character Bleu.

    • We encourage participants to use only the provided parallel training data.  Use of other, non-parallel data is allowed if thoroughly documented and in principle, publicly available.

    • Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.

  • Acknowledgments:  Thanks to Didi Chuxing for providing data and research time to support this shared task.

Evaluation Criteria

The evaluation metric for the shared task is 4-gram character Bleu.

The script to be used for Bleu computation is here (almost identical to that in Moses with a few minor differences). Instructions to run the script is in the baseline code that we released for the shared task. (link)

[NEW] You can download the evaluation scripts by following "Learn the details" tab --> get_starting_kit. You will find examples of how to run the evaluation script on the development dataset as reference.  

Terms and Conditions

We encourage participants to use only the provided parallel training data. Use of other data is allowed, if thoroughly documented (and in principle, publicly available). Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.

Task participants may use the released corpora for research purposes.  Other uses should be cleared with data owners.

Download Size (mb) Phase
Starting Kit 0.684 #1 Post-evaluation phase

We are maintaining a post-evalution online leaderboard in Codalab. If you wish to submit your system, the rules and instructions to submit are as follows: 

Rules of submission: 

  • Your system should ONLY be trained using the training data resources provided by us through the competition (Paticipate -> Get Data and Submission Instructions)
  • You are NOT allowed to use external data or resources while training your MT system. This is to avoid overlap of the training data with the blind test data. We sourced the test data from parallel test sentences from the internet (authored between Jan 2020 and Mar 2020)
  • Use pre-trained language models (trained on external resources) is allowed 

Instructions for submitting your system 

  1. Your system's output should contain the same number of lines as the input test data (make sure to verify this by running `wc -l`).
  2. Each segment should read like normal Japanese.
  3. Name your submission file 'answer.txt'.
  4. Create a 'zip' file using only your submission file 'answer.txt'.
  5. You could rename your zip file with a appropriate name.
  6. Submit the zip file via "Participate" -> "Submit/View Results". Enter the meta data fields. When you press the submit button, you can select the zip file created on your system for submission. 
  7. You can check the status of your submission (and errors, if any, by refreshing the status) 
  8. Press "Submit to leaderboard" button
  9. To your submission on the leaderboard, go to the "Results" tab.  

Post-evaluation phase

Start: Jan. 17, 2020, midnight

Competition Ends

April 21, 2020, 4:32 p.m.

You must be logged in to participate in competitions.

Sign In
# Username Score
1 Ajay_backup 26.30