Word Sense Induction and Disambiguation for the Russian Language: bts-rnc dataset

Organized by lopuhin - Current server time: May 27, 2018, 1:03 a.m. UTC

Previous

Testing: bts-rnc
Dec. 1, 2017, midnight UTC

Current

Post-competition: evaluation on test data
Feb. 4, 2018, 11:59 p.m. UTC

End

Competition Ends
Never

We invite you to participate in the ACL SIGSLAV sponsored shared task on Word Sense Induction and Disambiguation for the Russian Language. TLDR of the task: You are given a word, e.g. bank and a bunch of text fragments (aka “contexts”) where this word occurs, e.g. bank is a financial institution that accepts deposits and river bank is a slope beside a body of water. You need to cluster these contexts in the (unknown in advance) number of clusters which correspond to various senses of the word. In this example, you want to have two groups with the contexts of the company and the area senses of the word bank.

Please see full description on our website.

Similarly to SemEval 2010 Task 14 WSI&D, we use a gold standard, where each ambiguous target word is provided with a set of instances, i.e., the context containing the target word. Each instance is manually annotated with the single sense identifier according to a predefined sense inventory. Each participating system assigns the sense labels for these ambiguous word occurrences, which can be viewed as a clustering of instances, according to sense labels. To evaluate a system, the system’s labeling of contexts is compared to the gold standard labeling. We use the Adjusted Rand Index (ARI) as the quantitative measure of the clustering.

The task will feature two tracks:

  • In the “knowledge-free” track participants need to induce a sense inventory from a text corpus of their own. The participants need to use it to assign each context with a sense identifier according to this induced inventory.
  • In the “knowledge-rich” track participants are free to use a sense inventory from an existing dictionary to disambiguate the target words (yet the use of the gold standard inventory is prohibited).

The advantage of our setting is that virtually any existing word sense disambiguation approach can be used within the framework of our shared task starting from unsupervised sense embeddings to the graph-based methods that rely on lexical knowledge bases, such as WordNet.

Testing: bts-rnc

Start: Dec. 1, 2017, midnight

Description: Submit test predictions by uploading a ***zip archive*** with a .csv or .tsv file.

Post-competition: evaluation on test data

Start: Feb. 4, 2018, 11:59 p.m.

Description: Submit test predictions by uploading a ***zip archive*** with a .csv or .tsv file.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In