The official shared task on Quality Estimation aims to further examine automatic methods for estimating the quality of neural machine translation output at run-time, without relying on reference translations. As in previous years, we cover estimation at various levels. Important elements introduced this year include: a new task where sentences are annotated with Direct Assessment (DA) scores instead of labels based on post-editing; a new multilingual sentence-level dataset mainly from Wikipedia articles, where the source articles can be retrieved for document-wide context; the availability of NMT models to explore system-internal information for the task.
In addition to generally advancing the state of the art at all prediction levels for modern neural MT, our specific goals are:
Offical task webpage: QE Shared Task 2020
This submission platform covers Task 3: Document-level MQM score.
In Task 3 MQM subtask, participating systems are required to predict document-level quality according to MQM scores. Submissions will be evaluated according to how well they scoretranslations. We thus expect an absolute quality score for each sentence translation.
Submission Format
The output of your system for the MQM subtask should produce scores for the translations at the document level. Since documents are organized in different directories, you also need to identify which document a score is assigned to. Each output line should formatted in the following way:
<METHOD NAME> <DOCUMENT ID> <MQM SCORE>
Where:
METHOD NAME
is the name of your quality estimation method.DOCUMENT ID
is the identifier of the translation you are scoring; it is the name of the corresponding directory.DOCUMENT SCORE
is the predicted MQM score for the document.Lines can be in any order. Each field should be delimited by a single tab character.
To allow the automatic evaluation of your predictions, please submit them in a file named as follows: predictions.txt, and package them in a zipped file (.zip).
Submissions will be evaluated in terms of Pearson's correlation between the true and predicted MQM document-level scores.
The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes.
Participants are allowed to explore any additional data and resources deemed relevant.
Each participating team can submit at most 30 systems for each of the language pairs of each subtask (max 5 a day).
Start: April 19, 2020, midnight
Never
You must be logged in to participate in competitions.
Sign In