NLP Class Project

Organized by cerberusd - Current server time: March 30, 2025, 3:04 p.m. UTC

Current

First phase
June 30, 2013, midnight UTC

End

Competition Ends
Never

Welcome!

For this project, your goal is to improve upon a transformer baseline on a simplified version of the WMT21 Machine Translation using Terminologies task.
The goal of this project is to explore methods that incorporate terminologies into either the training or the inference process, in order to improve both the accuracy and consistency of MT systems.

We consider the English-to-French translation task, and evaluation is performed on the TICO-19 dataset, which is part of the overall evaluation for the task in WMT21.

To help you focus on developing interesting methods, we provide a baseline system. To get started, please visit the Github repo.

Evaluation and Submission

To simplify the submission process, we accept the prediction output files in the format provided by the baseline code.

To submit a result, make a zip file containing the two files, "predict_results.json" and "eval_results.json" upload the file from the Participate tab.

 

"predict_results.json" should contain your test bleu, for example:

{
"predict_bleu": 37.6491,
"predict_gen_len": 39.9957,
"predict_loss": 1.2598096132278442,
"predict_runtime": 73.9915,
"predict_samples": 2100,
"predict_samples_per_second": 28.382
}

 

"eval_results.json" should contain your dev bleu, for example:

{
"epoch": 3.0,
"eval_bleu": 40.3334,
"eval_gen_len": 36.5963,
"eval_loss": 0.8021446466445923,
"eval_runtime": 32.7068,
"eval_samples": 971,
"eval_samples_per_second": 29.688
}

 

  • The scoring code searches for the attribute "predict_bleu" in predict_results.json and "eval_bleu" in eval_results.json, so if you have the results saved in some other format, make sure these lines are in each file.
  • Note that while we require both test and dev files, the final evaluation for the project will be only on the test BLEU score (predict_results.json).
  • If you are unable to submit a eval_results.json file, you can just submit the same file as predict_result.json, only change the "predict_bleu" to "eval_bleu" and the filename should be "eval_results.json".
  • Please note that Codalab can sometimes have random delays! If you are experiencing problems with submitting results, please check the submission log to see if there are any problems, and if it there is an unresolved issue, let us nkow!

Rules

Please submit the actual output result of your code only. For reproducibility, we will require code submissions for all final solutions.

First phase

Start: June 30, 2013, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In