CodaLab -

> Questions about domain/techniques/evaluation

Hi Ajay,

Thanks for your prompt reply. I have downloaded and checked the details about this shared task. May I ask:

1. We understand the blind test set will be a mixed-domain/mixed-genre (e.g. spoken, news, travel etc.). I wonder if the domain distribution of the test set is similar to that of dev set?
2. Can we use pretraining such as mBART and BERT for parameter initialization of NMT models?
3. It only mentioned "4-gram character BLEU", but it is the same toolkit (sacrebleu --tokenize zh) as WMT21 News English-to-Chinese Task? If not, could you please provide us the toolkit script?

Thanks for your help. Have a nice day.

Cheers,
Longyue

Posted by: wly0229 @ April 29, 2021, 1:47 a.m.

Hi Longyue,

To answer your questions:

1. We cannot say much on the test dataset as it a secret blind test.
2. Yes, you could use pre-trained models to initialize your NMT system.
3. The evaluation script and examples of its usage is present at: https://github.com/didi/wmt2021_triangular_mt/tree/master/eval

- Ajay

Posted by: ajaynagesh @ April 29, 2021, 6:01 p.m.

Post in this thread

Forums

Triangular MT: Using English to improve Russian-to-Chinese machine translation Forum

> Questions about domain/techniques/evaluation