Hi Ajay,
Thanks for your prompt reply. I have downloaded and checked the details about this shared task. May I ask:
1. We understand the blind test set will be a mixed-domain/mixed-genre (e.g. spoken, news, travel etc.). I wonder if the domain distribution of the test set is similar to that of dev set?
2. Can we use pretraining such as mBART and BERT for parameter initialization of NMT models?
3. It only mentioned "4-gram character BLEU", but it is the same toolkit (sacrebleu --tokenize zh) as WMT21 News English-to-Chinese Task? If not, could you please provide us the toolkit script?
Thanks for your help. Have a nice day.
Cheers,
Longyue
Hi Longyue,
To answer your questions:
1. We cannot say much on the test dataset as it a secret blind test.
2. Yes, you could use pre-trained models to initialize your NMT system.
3. The evaluation script and examples of its usage is present at: https://github.com/didi/wmt2021_triangular_mt/tree/master/eval
- Ajay
Posted by: ajaynagesh @ April 29, 2021, 6:01 p.m.