SemEval 2020 Task 2: Predicting Multilingual and Cross-Lingual (Graded) Lexical Entailment

Organized by gg42554 - Current server time: Jan. 21, 2021, 3:59 p.m. UTC


Shared Task Competition (Test)
Feb. 19, 2020, midnight UTC


After SemEval 2020 Competition
Feb. 20, 2020, midnight UTC


Competition Ends



For SemEval 2020 Task 2 participants: we have published the official evaluation results. The results can be obtained here. The instructions on preparing and submitting your system descriptions papers will be follow shortly. 


Dear SemEval participants -- the evaluation period for the Task 2 has ended on March 11. The submissions will be evaluated and results published by Wednesday, March 18 the latest. We thank all teams for participating. After publishing the results, we will send detailed instructions on preparing your SemEval system description papers. 

TEST DATA RELEASED (19.02.2020)!

The test data is available at:

Detailed instructions about (1) the test data (content, formatting, languages, tasks, ...) and (2) submission procedure (how, in which format, how many runs, until when, ...) are given here:

For all questions and clarifications, feel free to email us (please prepend your subject line with [SemEval Task 2]): 

goran@informatik.uni-mannheim and/or



- Development sets released for 5 languages and 10 cross-lingual pairs, available here. Please carefully read the README file included in the archive. 

- The official evaluation period starts on February 19, 2020. This is when we will release the test sets (which will include one more surprise language). The evaluation period ends on March 11, and this is the end date by which the participants need to submit their runs. We will release detailed instructions on how and where to submit your runs (latest with the release of the test data, on Feb 19).



This shared task is about predicting binary and graded Lexical Entailment (i.e., is-a or hyponym-hypernym relation) for several different languages (multilingual component) and across languages for several language pairs (cross-lingual component).

For Graded LE, the participants need to predict the degree (on a 0-6 scale) to which the LE relation holds between two given concepts (the two concepts in each pair come from the same language in multilingual subtasks and from different languages in the cross-lingual subtasks). For Binary LE the participants merely need to predict whether the LE relation holds between two concepts or not.   

The two main branches of subtasks are as follows: 

- Subtask 1: Monolingual in multiple languages (i.e., multilingual)
- Subtask 2: Cross-lingual

We cover the following languages along with their language ISO codes:

- English: EN
- German: DE
- Italian: IT
- Croatian: HR
- Turkish: TR (currently not covered in the trial data, data preparation and annotation is in progress; it will be available beginning of September)
- One surprise evaluation language

We will evaluate systems in two different tracks: 

  • DIST track: Systems using only distributional information from large corpora
  • ANY track: All systems, using any kind of information, including lexico-semantic resources (e.g., WordNet, BabelNet)

The trial data has been released (for online and offline evaluation), see the "Data" tab. The development data will be released by Sep 5, 2019. The official evaluation (with the release of the test data) starts on January 10, 2020 and  will be open until January 31, 2020. 


Graded LE:

  • Evaluation will be performed in terms of Spearman correlation between the gold LE scores (0-6) and predicted LE scores
  • The trial and test data (for online evaluation via CodaLab) are released in the same format: a directory with a number of files in the format L1_L2_gr.txt containing pairs of concepts (one from L1 and the other from L2; for monolingual datasets, L1 = L2, e.g., en_en_gr.txt)

Binary LE:

  • Evaluation will be performed in terms of F1 score (comparison of gold binary 0/1 labels and binary predictions)
  • The trial and test data (for online evaluation via CodaLab) are released in the same format: a directory with a number of files in the format L1_L2_bin.txt containing pairs of concepts (one from L1 and the other from L2; for monolingual datasets, L1 = L2, e.g., de_de_bin.txt)


There are two evaluation tracks:

  • DIST: models/systems using distributional information (large text corpora) only. Models participating in this track are not allowed to use external lexico-semantic resources (e.g., WordNet or BabelNet) or any resources other than unlabeled textual corpora; 
  • ANY: models/systems are allowed to use any kind of resources (including lexico-semantic )

Groups can submit more than one system only if the systems differ in a meaningful way from one another, if unsure, contact the organizers. In any case, maximal number of runs per track (for the official SemEval evaluation) is 3 (i.e., max 3 runs for the DIST track; max 3 runs for the ANY track; in both cases the runs must have meaningful distinctions between them).

All data released for this task is done so under the Creative Commons License (Non-Commercial-Attribution-ShareAlike).

Organizers of the competition might choose to publicize, analyze and change in any way any content sent as a part of this task. Whenever appropriate academic citation for the sending team will be added (e.g. in a task overview paper).

Competitions should comply with any general rules of SEMEVAL.

The organizers are free to penalize or disqualify participants for any violation of the above rules or for misuse, unethical behaviour or other behaviours they agree are not accepted in a scientific competition in general and in the specific one at hand.


Start: Aug. 13, 2019, midnight

Shared Task Competition (Test)

Start: Feb. 19, 2020, midnight

After SemEval 2020 Competition

Start: Feb. 20, 2020, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In