Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)

Organized by Federico - Current server time: Sept. 23, 2020, 11:09 p.m. UTC


July 31, 2020, midnight UTC


Competition Ends

Multilingual and Cross-lingual Word-in-Context Disambiguation


Over recent years, computational lexical semantics has seen a surge of interest in a wide range of approaches, from multi-prototype embeddings to sense-based and contextualized embeddings, all aimed at providing some form of representation and understanding of a word in context . However, evaluating such a variety of approaches in a single framework is not easy. For instance, traditional Word Sense Disambiguation (WSD) fails to test latent representations unless these are linked to explicit sense inventories such as WordNet and BabelNet. To address this limitation, we propose a innovative common evaluation benchmark which allows to measure and compare the performance of the aforementioned context-based approaches. In this task, we will follow and extend Pilhevar and Camacho-Collados (2018), who proposed a benchmark consisting of semi-automatically-annotated English sentence pairs, which requires systems to determine whether a word occurring in two different sentences is used with the same meaning or not, without relying on a pre-defined sense inventory .

Task overview

Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) is the first SemEval task for Word-in-Context disambiguation which tackles the challenge of capturing the polysemous nature of words without relying on a fixed sense inventory. MCL-WiC provides a single high-quality framework for the performance evaluation of a wide range of approaches aimed at evaluating the capability of a system to deeply understand word meaning. Compared to its competitors, MCL-WiC brings the following novelties:

  • it addresses multilinguality and cross-linguality,
  • it provides coverage of all parts of speech, and
  • it covers a high number of domains and genres .
Participating systems will be asked to perform a classification task in which they indicate whether the target word is used in the same meaning (tagged as T for true), in a related meaning (R for related) or in a completely different meaning (F for false) in the same language (multilingual dataset) or across different languages (cross-lingual dataset). Below you can find two examples of sentence pairs, the first one from the multilingual part and the second one from the cross-lingual part:
  • la souris mange le fromage -- le chat court après la souris
  • click the right mouse button -- le chat court après la souris
In the first sentence pair, the target word souris will be tagged with T (True) since it is used in the same meaning in both sentences. Instead, in the second sentence pair, the target word mouse and its corresponding translation into French are used in two distinct meanings, therefore, in this case, the expected output will be F (False).


The manual annotation was performed according to the following criteria. Given a target word w occurring in two sentences in the same language (multilingual task) or a target word w in the first sentence in one language and the corresponding target word w' in the second sentence in a second language, we used the tag:

  • T if the two words are used in the same exact meaning.
  • R if the two words are used in two meanings that are lexicographically related by phenomena such as systematic polysemy, i.e., predictable on the basis of a general pattern of sense alteration observed for words denoting entities of the same category, including metonymy (such as the content and the container in glass) and synecdoche (such as the part for the whole, e.g. blade to denote a sword, or paper for an article); abstract vs. concrete, (such as fruit and its tree); auto-hyponymy (such as drink in the meaning of drink alcohol and in the meaning of drink a beverage); auto-antonymy (e.g. citation in the meaning of award and its opposite meaning of penalty).
  • F if the two words are used in two completely different meanings (such as race in the meaning of competition vs. that of breed).


Evaluation Criteria

Systems will be asked to perform a classification on each sentence pair in the dataset, for which they will have to output T, R or F depending on whether a given target word occurring in two sentences is used with the same meaning, with a related meaning or with a completely different meaning respectively. The goal is to determine to what degree systems can discriminate meanings within and across languages without necessarily relying on an explicit sense inventory.

As customary in Natural Language Understanding, results will be computed using three measures, namely precision, accuracy and F1. A thorough analysis will be carried out for each language pair (cross-lingual dataset), for the different types of approach declared by participants (context-specific embeddings, WSD, etc.), by domain and genre (i.e. formal/parliamentary vs. encyclopedic). Furthermore, we will distinguish between systems which exploit the training set provided for the given language(s) and those which do not exploit it, e.g., based on vector similarities or traditional WSD systems which output T/F/R based on sense assignment.


We will compare the performance of participating systems against a baseline classifier implemented as a feed-forward neural network. Our baseline system will be input different types of embeddings:

  • sense embeddings, such as LMMS (Loureiro and Jorge, 2019) and SensEmBERT (Scarlini et al., 2020), which combine contextualized embeddings with the knowledge derived from resources such as WordNet and BabelNet;
  • context-specific word embeddings, such as Context2vec (Melamud et al., 2016), BERT (Devlin et al., 2019) etc.
Interestingly, this will provide an effective multilingual and cross-lingual benchmark for all types of embeddings and NLU systems.


Terms and Conditions

The data of the Multilingual and Cross-lingual Word-in-Context Disambiguation are released under the CC-BY-NC 4.0 license. Attribution should be provided by citing the task and its authors.


Start: July 31, 2020, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In