WiC-TSV: Word-in-Context Target Sense Verification

This is a competition based on the WiC-TSV (Word-in-Context Target Sense Verification) evaluation benchmark. This competition extends the SuperGLUE WiC task but it is self-contained. The task models additional phenomena related to Word Sense Disambiguation, i.e., identifying the correct meaning of a word in context.  The main difference with WiC lies in the presence of relevant information such as hypernyms and definitions, which makes the task a direct proxy for downstream evaluation: in WiC-TSV a single word is presented with its context and relevant information, in contrast to two usages of the same word included in the original WiC dataset. This setting is arguably more realistic and resembles the usage of automatic tagging in enterprise settings. For instance, an Indonesian company may want to retrieve all sentences referring to the Java island and not other unrelated senses.

WiC-TSV is used for a shared task at the IJCAI-20 SemDeep workshop. For questions about WiC-TSV, you can contact one of the organisers (information below).

NOTE: Test set was updated on April 30. Test set contains both in-domain (WordNet/Wiktionary) and out-of-domain (cocktails, medical entities and computer science) subsets.

NEW! Results now available here - instructions for participants to write system description papers below.

Task Details

Formally, WiC is framed as a binary classification task. Each instance in WiC-TSV consists of a target word w with a corresponding target sense s represented by either its definition (subtask 1) or its hypernym/s (subtask 2), and a context c containing the target word w. The task aims to determine whether the meaning of the word w used in the context c matches the target sense s. In the following table there are some examples from the dataset. Data is available in English - training, development and test data already available here.

Sentence  Definition Hypernyms


Smoking is permitted.   
 The act of smoking tobacco or other substances breathing, external respiration, respiration, ventilation 


All that work went down the sewer  Someone who sews  needleworker






 WiC-TSV has three subtasks - participants can submit results in any of the subtasks:

Subtask 1: Definitions

In Subtask 1 systems make use of definitions for deciding whether the target word in context corresponds to the given definition or not.

Subtask 2: Hypernyms

In Subtask 2 systems make use of hypernymy information for deciding whether the target word in context is a hyponym of the given hypernym or not.

Subtask 3: Definitions + Hypernyms

In subtask 3 systems can make use of both sources of information, i.e., definitions and hypernyms.


During the test phase, test data will be provided and participants can submit their results to the leaderboard. Participants can submit results in one, two or three of the subtasks. A maximum of two outputs per subtask is allowed. Attendance to the SemDeep workshop is encouraged to share the results but not mandatory. All participants, regardless of whether they attend the workshop or not, will be asked to write a small system description paper (up to four pages).

System description papers

Participants should submit a system description paper of up to 4 pages (1 more extra-page for camera-ready), excluding references. These description papers should describe their system methodology and should be self-contained.

Please find below the reference paper for the WiC-TSV task with more details about the construction of the dataset and baselines, which you can cite in your description paper:

Anna Breit, Artem Revenko, Kiamehr Rezaee, Mohammad Taher Pilehvar and Jose Camacho-Collados (2020) 
WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context,
arXiv preprint arXiv:2004.15016

Guidelines to write a good system description paper can be found here (SemEval). All papers should follow ACL 2020 templates, which can be downloaded from the ACL website.

Please submit your papers via EasyChair, following this link.

Contact Info

Anna Breit
Artem Revenko

Semantic Web Company

Jose Camacho-Collados
Cardiff University

Mohammad Taher Pilehvar
Iran University of Science and Technology

Contact emails:

anna.breit [at] semantic-web [dot] com
- camachocolladosj [at] cardiff [dot] ac [dot] uk




Evaluation Criteria

Evaluation metrics: Accuracy and F-Measure (only accuracy in leaderboard).

In order to submit your development results, please create files named "definitions_output.txt", "hypernyms_output.txt" and/or "all_output.txt" (depending on the subtasks you are participating in) with your answers, one per line ("T" if true or "F" if false), and then compress it into a .zip file. These files should have the same number of lines as the test data. Then go to Participate (Submit / View results) -> Submit and upload your zipped system output file.

Each team is allowed to submit a maximum of two systems/runs.

Important Dates:

10 March 2020: Training data release
10 April 2020: Test data release. Evaluation start
4 May 2020: Evaluation end
5 May 2020: Results available
1 Jun 2020 (extended): System description paper deadline
1 Jul 2020: Author notifications with reviews
15 Aug 2020: Camera-ready submission deadline
Jan 2021 (to be specified): SemDeep Workshop, co-located with IJCAI


