SemEval-2018 Task 9: Hypernym Discovery

Organized by CamachoCollados - Current server time: Nov. 22, 2017, 11:46 p.m. UTC

First phase

Practice
Dec. 31, 2017, midnight UTC

End

Competition Ends
Never

Welcome!

This is the CodaLab Competition for the SemEval-2018 Task 9: Hypernym Discovery.

Important Dates:

15 Jul 2017: Participants can register for the task
21 Aug 2017: Trial data release (already available on Data and Resources)
25 Sep 2017: Training data release
8 Jan 2018: Test data release. Evaluation start
29 Jan 2018: Evaluation end

Google Group: https://groups.google.com/d/forum/semeval2018-hypernymdiscovery

Introduction and Motivation

Hypernymy, i.e. the capability for generalization, lies at the core of human cognition. Unsurprisingly, identifying hypernymic relations has been pursued in NLP for approximately the last two decades, as successfully identifying this lexical relation contributes to improvements in Question Answering applications (Prager et al. 2008; Yahya et al. 2013) and Textual Entailment or Semantic Search systems (Hoffart et al 2014; Roller and Erk 2016). In addition, hypernymic (is-a) relations are the backbone of almost any ontology, semantic network and taxonomy (Yu et al. 2015; Wang et al. 2017), the latter being a useful resource for downstream tasks such as web retrieval, website navigation or records management (Bordea et al 2015).

Hypernym Discovery: What is New?

Traditionally, the task of identifying hypernymic relations from text corpora has been evaluated within the broader task of Taxonomy Evaluation (e.g. SemEval-2015 task 17, SemEval-2016 task 13). Alternatively, many approaches have been specializing on Hypernym Detection, i.e. the binary task consisting of, given a pair of words, deciding whether a hypernymic relation holds between them or not. This expermental setting has already led to criticisms regarding its alleged oversimplification (Levy et al 2015; Santus et al 2016; Shwartz et al 2017; Camacho-Collados et al 2017).

Inspired by recent work (Espinosa-Anke et al 2016) we propose to reformulate the problem as Hypernym Discovery, i.e. given the search space of a domain’s vocabulary, and given an input concept, discover its best (set of) candidate hypernyms. In addition to making the task more realistic in terms of actual downstream applications, this novel approach also opens up complementary evaluation procedures by enabling, for instance, Information Retrieval evaluation metrics (click on the Participate/Evaluation tab for detailed information).

In short:

  • General-Purpose Hypernym Discovery on three languages (English, Spanish, Italian)

  • Domain-Specific Hypernym Discovery on two domains (Medicine, Music)

Contact Info:

Jose Camacho-Collados
Claudio Delli Bovi
Tommaso Pasini
Roberto Navigli
Sapienza University of Rome

Vered Shwartz
Bar-Ilan University

Luis Espinosa-Anke
Sergio Oramas
Horacio Saggion
Universitat Pompeu Fabra

Enrico Santus
Singapore University

Contact emails:

- collados [at] di [dot] uniroma1 [dot] it 
- luis.espinosa [at] upf [dot] edu

Sponsors:

SemEval-2018 Task 9 Sponsors

References

Georgeta Bordea, Paul Buitelaar, Stefano Faralli, and Roberto Navigli. 2015. Semeval-2015 task 17: Taxonomy extraction evaluation (Texeval). In Proceedings of the SemEval workshop.

Jose Camacho-Collados. 2017. Why we have switched from building full-fledged taxonomies to simply detecting hypernymy relations. arXiv preprint arXiv:1703.04178.

Luis Espinosa-Anke, Jose Camacho-Collados, Claudio Delli Bovi, and Horacio Saggion. 2016. Supervised distributional hypernym discovery via domain adaptation. In Proceedings of EMNLP, pages 424–435.

Johannes Hoffart, Dragan Milchevski, and Gerhard Weikum. 2014. Stics: searching with strings, things, and cats. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, pages 1247–1248.

Omer Levy, Steffen Remus, Chris Biemann, Ido Dagan, and Israel Ramat-Gan. 2015. Do supervised distributional methods really learn lexical inference relations? In Proceedings of NAACL, pages 970–976.

John Prager, Jennifer Chu-Carroll, Eric W Brown, and Krzysztof Czuba. 2008. Question answering by predictive annotation. In Advances in Open Domain Question Answering, Springer, pages 307–347.

Stephen Roller and Katrin Erk. 2016. Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment. In Proceedings of EMNLP, pages 2163–2172.

Enrico Santus, Alessandro Lenci, Tin-Shing Chiu, Qin Lu and Chu-Ren. 2016. Nine Features in a Random Forest to Learn Taxonomical Semantic Relations. In Proceedings of LREC, pages 4557–4564.

Vered Shwartz, Enrico Santus, and Dominik Schlechtweg. 2017. Hypernyms under siege: Linguistically-motivated artillery for hypernymy detection. In Proceedings of EACL, pages 65–75.

Chengyu Wang, Xiaofeng He, and Aoying Zho. 2017. A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances. In Proceedings of EMNLP, pages 1201–1214

Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, and Gerhard Weikum. 2013. Robust question answering over the web of linked data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, pages 1107–1116.

Zheng Yu, Haixun Wang, Xuemin Lin, and Min Wang. 2015. Learning Term Embeddings for Hypernymy Identification. In Proceedings of IJCAI, pages 1390-1397.

Task Details

The Hypernym Discovery task consists of, given an input term, finding its most appropriate hypernym(s) in a pre-defined corpus (see Data and Resources for detailed information). This specific task consists of five independent (participants are allowed to submit systems on any individual subtask) but related subtasks, which are split into two larger groups, i.e. general-purpose hypernym discovery and domain-specific hypernym discovery:

 

Subtask 1: General-Purpose Hypernym Discovery

This subtask consists of discovering hypernyms in a general-purpose corpus. Therefore, in this case systems require the flexibility to provide hypernyms for terms in a wide range of domains. In this task we provide data for three different languages: English (subtask 1A), Italian (subtask 1B) and Spanish (subtask 1C).

Subtask 2: Domain-Specific Hypernym Discovery

In contrast, this subtask deals with specific domains, namely  medical (subtask 2A) and music (subtask 2B) domains. In this case participants test their systems (which may be general or specifically tailored to a target domain) in a much more focused and reduced environment.

 

Participation

The Hypernym Discovery task is especially targeted to evaluate both hypernym extraction and detection systems, as well as taxonomy learning and entity typing systems. Participants from all these areas are encouraged to participate. The task may be additionally viewed as a proxy for downstream applications, such as Information Extraction or Question Answering (e.g. what is the highest mountain in Africa?), that require specific knowledge from is-a relations, as well as a reliable evaluation benchmark for the first step of ontology learning systems, since is-a relations generally constitute their backbone.

 

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its licence.

Data and Resources

Downloads

Training data (Updated Oct 23, 2017): Training data for all subtasks, including gold hypernyms, vocabularies and an evaluation script. 

Information and direct links to download the corresponding corpus for each subtask can be found below. More information about the evaluation and how to participate on the "Participate" tab. 

 

For testing, systems are provided with terms for which they have to produce a ranked list of their extracted hypernyms. The gold standard consists of terms along with their corresponding hypernyms (up to trigrams). Training and testing data are split evenly (50% training - 50% testing). More information and direct links to corpora are reported below.

 

Subtask 1: General-Purpose Hypernym Discovery

General-purpose corpora. For the first subtask we use the 3-billion-word UMBC corpus (Han et al. 2013), which is a corpus composed of paragraphs extracted from the web as part of the Stanford WebBase Project. This is a very large corpus containing information from different domains. For Italian we use the 2-billion-word itWac corpus (Baroni et al. 2014), extracted from different sources of the web, and for Spanish a 1-billion-word Spanish corpus (Cardellino 2016), which also contains documents from different sources. Details about the corpora (including direct links for download) are summarized in the table below:

Subtask      Corpus Description Links
1A: English     3B-word UMBC corpus extracted from the web (Han et al. 2013). Temporary link to download the original PoS tagged corpus.

Tokenized [6.2GB] 

Original

1B:   Italian     1.3B-word ItWac corpusextracted from the web (.it domain) (Baroni et al. 2014) 

Tokenized [2.6GB]

Original

1C: Spanish     1.8B-word corpus extracted from various sources (Wikipedia, Europarl, AnCora, etc.) (Cardellino 2016)

Tokenized [3.2GB]

Original 

 

Input terms. We provide a balanced set of terms, with different degrees of frequency and for different domains. For English 3000 terms with their corresponding hypernyms are provided (around 10000 term-hypernym pairs), while for Spanish and Italian 2000 terms each.

Gold standard. The gold standard consists of input terms given to the systems (see above) and gold hypernyms extracted from multiple resources and manually validated (for both training and testing). See the table below for some examples.

  Term Hypernym(s) Source
English (general) dog canine, mammal, animal WordNet
Spanish (general) guacamole salsa para mojar, alimento, salsa Wikidata
Italian (general) Nina Simone musicista, pianista, persona MultiWiBi

  

Subtask 2: Domain-Specific Hypernym Discovery

Domain-specific corpora. For the medical domain a combination of abstracts and research papers provided by the MEDLINE (Medical Literature Analysis and Retrieval System) repository, which contains academic documents such as scientific publications and paper abstracts, is provided. As regards the music domain, the provided corpus is a concatenation of several music-specific corpora, i.e., music biographies from Last.fm contained in ELMD 2.0 (Oramas et al. 2016), the music branch from Wikipedia, and a corpus of album customer reviews from Amazon (Oramas et al. 2017). Details about the corpora (including direct links for download) are summarized in the table below:

Subtask       Corpus Description

Links

2A: Medical    
      130M-word subset extracted from the PubMed corpus of biomedical literature from MEDLINE, distributed by the National Library of Medicine (updated 10 Sept, some duplicated texts have been removed)

Tokenized [258MB]

Original

2B:   Music        100M-word corpus including Amazon reviews, music biographies and Wikipedia pages about theory and music genres (Oramas et al. 2016)

Tokenized [200MB]

Original 

 

Input terms. As in the previous subtask, we provide a balanced set of terms, with different degrees of frequency and for different sub-domains. We provide around 1000 terms for each domain (clinical and music).

Gold standard. In this case, we use the same procedure described in Section 2.1 restricted to the target domain, and, in addition, domain-specific taxonomies. See the table below for some examples.

  Term Hypernym(s) Source
English (clinical)       pulmonary embolism

disorder of pulmonary circulation, trunk arterial embolus, disorder, embolism

SnomedCT
English (music)    Green Day artist, rock band MusicBrainz

 

Data Availability and Copyright

All task participants are provided with trial, training and test sets for each of the subtasks. These datasets will be released under the Creative Commons License Attribution-ShareAlike 3.0 Unported License. The data are extracted semi-automatically, pre-processed and validated by experts, most of them being in the organizing team. We intend to use only data that are openly available, so no additional licenses or permissions need to be acquired for the resources as a part of this task.

 

References

Marco Baroni, Silvia Bernardini, Adriano Ferraresi and Eros Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation, 43(3): 209-226.

Cristian Cardellino. 2016. Spanish Billion Words Corpus and Embeddings (March 2016), http://crscardellino.me/SBWCE.

Lushan Han, Abhay L. Kashyap, Tim Finin, James Mayfield and Johnathan Weese. 2013. UMBC EBIQUITY-CORE: Semantic Textual Similarity Systems. In Proceedings of *SEM.

Sergio Oramas, Luis Espinosa Anke, Mohamed Sordo, Horacio Saggion and Xavier Serra. 2016. ELMD: An Automatically Generated Entity Linking Gold Standard Dataset in the Music Domain. In Proceedings of LREC.

Sergio Oramas, Oriol Nieto, Francesco Barbieri, and Xavier Serra (2017). Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features. In Proceedings of the 18th Conference of the International Society of Music Information Retrieval (ISMIR 2017).

Practice

Start: Dec. 31, 2017, midnight

Evaluation

Start: Jan. 8, 2018, midnight

Post-Evaluation

Start: Jan. 30, 2018, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In

Top Three

Rank Username Score
No data