Eval4NLP 2021 - Explainable Quality Estimation

Organized by plj129 - Current server time: Feb. 21, 2025, 9:42 a.m. UTC

Previous

TEST: Ru-De
Aug. 20, 2021, midnight UTC

Current

TEST: Ru-De
Aug. 20, 2021, midnight UTC

End

Competition Ends
Sept. 4, 2021, noon UTC

Explainable Quality Estimation

 

[Update: 10/08/2021] Submission Instructions have been updated. We require an additional metadata.txt file to be included in the submission zip file of the test phases. Please see more information in the Submission Instructions tab.

 

This shared task (organized by the 2nd Eval4NLP workshop) consists of building a quality estimation system that (i) predicts the quality score for an input pair of source text and MT hypothesis and (ii) provides word-level evidence for its predictions as explanations. In other words, the explanations should highlight specific errors in the MT output which lead to the quality score predicted. We will evaluate how similar the generated explanations are to human explanations, using a test set with manually annotated rationales.

Important links

Phases

The competition consists of two main phases.

  • DEVELOPMENT PHASE: Submit your predictions and explanations on the dev set.
    (For each language pair, max submissions per day = 999; max submissions overall = 999)
    • Estonian-English (Et-En)
    • Romanian-English (Ro-En)
  • TEST PHASE: Submit your predictions and explanations on the test set.
    (For each language pair, max submissions per day = 5; max submissions overall = 30)
    • Estonian-English (Et-En)
    • Romanian-English (Ro-En)
    • German-Chinese (De-Zh)
    • Russian-German (Ru-De)

Evaluation Criteria

The aim of evaluation is to assess the quality of explanations, not sentence-level predictions. Therefore, the metrics for evaluation will be (1) AUC (2) AP (Average Precision) and (3) Recall at top-K for word-level explanations. The leaderboard will be sorted by the average rank of the three metrics on the target sentence explanations (i.e., Rank Target).

Terms and Conditions

  1. The training and development data derived from the MLQE-PE dataset is subject to the same terms and conditions, which, however, do not have practical implications on the use of this data for research purposes. This data is publicly available under Creative Commons Zero v1.0 Universal.
  2. The test data collected for this shared task will be distributed under the MIT License.
  3. During the test phase, for each of the language pairs, each participating team can make at most 30 submissions (max 5 submissions a day).
  4. By submitting results to this competition, the participants consent to the public release of their scores at the Eval4NLP workshop and in the associated proceedings, at the task organizers' discretion. Participants further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgment that the submission was incomplete, erroneous or deceptive.

Submission Instructions

Each submission is a zip file consisting of three or four files.

  • metadata.txt must have exactly three non-empty lines.
    • The first line contains your team name. You might use your CodaLab username as your team name.
    • The second line must be either constrained or unconstrained, indicating the submission track. constrained means that you did not train your system on word-level labels, whereas unconstrained means that you trained your system on word-level labels.
    • The third line contains a short description (2-3 sentences) of the system you used to generate the results. This description will not be shown to other participants.
  • sentence.submission with sentence-level scores, one score per line.
  • target.submission with target token-level scores. Each line must contain a sequence of scores separated by white space. The number of scores must correspond to the number of target tokens.
  • (Optional) source.submission with source token-level scores. Each line must contain a sequence of scores separated by white space. The number of scores must correspond to the number of source tokens.

Examples of the submission files for the two development phases can be found here.

DEVELOPMENT: Et-En

Start: June 19, 2021, midnight

Description: Submit your results on the dev set of Et-En

DEVELOPMENT: Ro-En

Start: June 19, 2021, midnight

Description: Submit your results on the dev set of Ro-En

TEST: Et-En

Start: Aug. 20, 2021, midnight

Description: Submit your results on the test set of Et-En

TEST: Ro-En

Start: Aug. 20, 2021, midnight

Description: Submit your results on the test set of Ro-En

TEST: De-Zh

Start: Aug. 20, 2021, midnight

Description: Submit your results on the test set of De-Zh

TEST: Ru-De

Start: Aug. 20, 2021, midnight

Description: Submit your results on the test set of Ru-De

Competition Ends

Sept. 4, 2021, noon

You must be logged in to participate in competitions.

Sign In
# Username Score
1 Raphael_NICT 1.000
2 mtreviso2 2.000
3 Gringham 4.333