NEW! Final results of the WiC-TSV challenge are available here.
This is a competition based on the WiC-TSV (Word-in-Context Target Sense Verification) evaluation benchmark. This competition extends the SuperGLUE WiC task but it is self-contained. The task models additional phenomena related to Word Sense Disambiguation, i.e., identifying the correct meaning of a word in context, in a retrieval setting. The main difference with WiC lies in the presence of relevant information such as hypernyms and definitions, which makes the task a direct proxy for downstream evaluation: in WiC-TSV a single word is presented with its context and relevant information, in contrast to two usages of the same word included in the original WiC dataset. This setting is arguably more realistic and resembles the usage of automatic tagging and retrieval in enterprise settings. For instance, an Indonesian company may want to retrieve all sentences referring to the Java island and not other unrelated senses.
WiC-TSV is used for a shared task at the IJCAI-20 SemDeep workshop. For questions about WiC-TSV, you can contact one of the organisers (information below).
NOTE: The dataset was updated on April 30. Test set contains both in-domain and out-of-domain subsets.
Formally, WiC-TSV is framed as a binary classification task. Each instance in WiC-TSV consists of a target word w with a corresponding target sense s represented by either its definition (subtask 1) or its hypernym/s (subtask 2), and a context c containing the target word w. The task aims to determine whether the meaning of the word w used in the context c matches the target sense s. In the following table there are some examples from the dataset. Data is available in English - training, development and test data already available here.
|Smoking is permitted.
||The act of smoking tobacco or other substances||breathing, external respiration, respiration, ventilation||
|All that work went down the sewer||Someone who sews||needleworker||
WiC-TSV has three subtasks - participants can submit results in any of the subtasks:
In Subtask 1 systems make use of definitions for deciding whether the target word in context corresponds to the given definition or not.
In Subtask 2 systems make use of hypernymy information for deciding whether the target word in context is a hyponym of the given hypernym or not.
In subtask 3 systems can make use of both sources of information, i.e., definitions and hypernyms.
During the test phase, test data will be provided and participants can submit their results to the leaderboard. Participants can submit results in one, two or three of the subtasks. A maximum of two outputs per subtask is allowed. Attendance to the SemDeep workshop is encouraged to share the results but not mandatory. All participants, regardless of whether they attend the workshop or not, will be asked to write a small system description paper (up to four pages).
Participants should submit a system description paper of up to 4 pages (1 more extra-page for camera-ready), excluding references. These description papers should describe their system methodology and should be self-contained.
Please find below the reference paper for the WiC-TSV task with more details about the construction of the dataset and baselines, which you can cite in your description paper:
Anna Breit, Artem Revenko, Kiamehr Rezaee, Mohammad Taher Pilehvar and Jose Camacho-Collados (2020)
WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context,
arXiv preprint arXiv:2004.15016
Please submit your papers via EasyChair, following this link.
Mohammad Taher Pilehvar
Iran University of Science and Technology
- anna.breit [at] semantic-web [dot] com
- camachocolladosj [at] cardiff [dot] ac [dot] uk
Evaluation metrics: Accuracy and F-Measure (only accuracy in leaderboard).
In order to submit your development results, please create files named "definitions_output.txt", "hypernyms_output.txt" and/or "all_output.txt" (depending on the subtasks you are participating in) with your answers, one per line ("T" if true or "F" if false), and then compress it into a .zip file. These files should have the same number of lines as the test data. Then go to Participate (Submit / View results) -> Submit and upload your zipped system output file.
Each team is allowed to submit a maximum of two systems/runs.
Now! Test data release. Evaluation start
Start: March 8, 2020, midnight
Start: Aug. 17, 2020, midnight
Start: Sept. 22, 2020, midnight
You must be logged in to participate in competitions.Sign In