Triangular MT: Using English to improve Russian-to-Chinese machine translation

Organized by ajaynagesh - Current server time: April 2, 2025, 6:19 p.m. UTC

Previous

Post-Evaluation
July 22, 2021, 11:59 p.m. UTC

Current

Post-Evaluation
July 22, 2021, 11:59 p.m. UTC

End

Competition Ends
July 22, 2021, 11:59 p.m. UTC

Welcome!

Task Description

Given a low-resource language pair (X/Y), the bulk of previous MT work has pursued one of two strategies.

  • Direct: Collect parallel X/Y data from the web, and train an X-to-Y translator, OR
  • Pivot: Collect parallel X/English and Y/English data (often much larger than X/Y data), train two translators (X-to-English + English-to-Y), and pipeline them to form an X-to-Y translator

However, there are many other possible strategies for combining such resources. These may involve, for example, ​ensemble ​methods, ​multi-source ​training methods, ​multi-target ​training methods, or novel data augmentation​ methods.

The goals of this shared task is to promote:

  • translation between non-English languages,
  • optimally mixing direct and indirect parallel resources, and
  • exploting noisy, parallel web corpora 

Task: Russian-to-Chinese machine translation

We provide three parallel corpora: 

  • Chinese/Russian: crawled from the web and aligned at the segment level, and combined with different public resources
  • Chinese/English: combining several public resources
  • Russian/English: combining several public resources

We evaluate system translations on a (secret) mixed-genre test set, drawn from the web and curated for high quality segment pairs. After receiving test data, participants have one week to submit translations. After all submissions are received, we will post a populated leaderboard that will continue to receive post-evaluation submissions.

Important Dates

  • Apr 5, 2021:           Release of training and development resources 
  • Apr 5, 2021:           Release of the baseline system
  • Jul 12, 2021:          Release of test data
  • Jul 22, 2021:          Official submissions due by web upload
  • Jul 26, 2021:          Release of the official results
  • Aug 5, 2021:          System description paper due
  • Sep 5, 2021:          Review feedback
  • Sep 15, 2021:        Camera-ready papers due 
  • Nov 10-11, 2021:   Workshop

Contacts

Chair: Ajay Nagesh (DiDi Labs, USA)
Email: ajaynagesh@didiglobal.com   

Organizers

  • Arkady Arkhangorodsky (DiDi Labs, USA)
  • Ajay Nagesh, Chair (DiDi Labs, USA)
  • Kevin Knight (DiDi Labs, USA)

Acknowledgments: 

Thanks to Didi Chuxing for providing data and research time to support this shared task.

Evaluation Criteria

The task is (one-way) Russian-to-Chinese translation. The evaluation metric for the shared task is 4-gram character Bleu.

The script to be used for Bleu computation is here (almost identical to that in Moses with a few minor differences). Instructions to run the script is in the baseline code that we released for the shared task.

Terms and Conditions

  • The participants must use only the provided parallel training data. Use of other data is not allowed
  • Participants must be willing to write a system description paper (in English), to promote knowledge sharing and rapid advancement of the field.
  • Task participants may use the released corpora for research purposes.  Other uses should be cleared with data owners.

Development phase

Start: April 5, 2021, midnight

Evaluation

Start: July 12, 2021, midnight

Post-Evaluation

Start: July 22, 2021, 11:59 p.m.

Competition Ends

July 22, 2021, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In
# Username Score
1 CSS_TSC 27.6
2 JeonghyeokPark 26.8
3 kaiyuhuang 21.7