CAPITEL-EVAL 2020 - NERC

Organized by lea - Current server time: Sept. 27, 2020, 1:27 p.m. UTC

First phase

Evaluation Phase
May 17, 2020, midnight UTC

End

Competition Ends
May 24, 2020, 11 p.m. UTC

General Overview

Information extraction tasks, formalized in the late 1980s, are designed to evaluate systems aimed at capturing pieces of information present in free text, with the goal of enabling better and faster information and content access. One important piece of information are named entities (NE) which, roughly speaking, are textual elements corresponding to names of people, places, organizations and others. Three processes can be applied to NEs: recognition (or identification), categorization (assigning a type according to a predefined set of semantic categories), and linking (disambiguating the reference). 

Since its appearance, NER tasks have had notable success, but despite the relative maturity of this subfield, work and research continues to evolve, and new techniques and models appear alongside challenging datasets in different languages, domains and textual genres. The aim of this sub-task is to challenge participants to apply their systems or solutions to the problem of identifying and classifying NEs in Spanish news articles. This two-stage process is referred to as NERC (Named Entity Recognition and Classification). 

The following NE categories will be evaluated: 

  • Person (PER)
  • Location (LOC)
  • Organization (ORG) 
  • Other (OTH)

 as defined in the Annotation Guidelines that will be delivered to participants. 

Linguistic Resources

A subset of the CAPITEL corpus will be provided (a maximum supporting data set of 1 million revised words is estimated).  The supporting data will be randomly sampled into three subsets: training, development and test. The training set will comprise 50% of the corpus, whereas the development and test sets will roughly amount to 25% each. Together with the test set release, we will release an additional collection of documents (background set) to ensure that participating teams are not be able to perform manual corrections, and also encourage features such as scalability to larger data collections. All the data will be distributed tokenized with named entities annotated in IOBES format. 

Evaluation Metrics

The metrics used for evaluation will be the following: 

  • Precision: The percentage of named entities in the system's output that are correctly recognized and classified. 
  • Recall: The percentage of named entities in the test set that were correctly recognized and classified. 
  • F-measure: The harmonic mean of Precision and Recall (macro averaged).

with the latter being used as the official evaluation score, and which will be used for the final ranking of the participating teams.

Schedule

  • March, 15: Sample set, Evaluation script and Annotation Guidelines released.
  • March, 17: Training set released.
  • April,  1: Development set released.
  • April, 29: Test set released (includes background set).
  • May,   17: Systems output submissions.
  • May,   21: Results posted and Test set with GS annotations released.
  • May,   28: Results posted and Test set with GS annotations released.
  • May,   31: Working notes paper submission.
  • June,  15: Notification of acceptance (peer-reviews).
  • June,  30: Camera ready paper submission.
  • September: IberLEF 2020 Workshop.

Evaluation Phase

Start: May 17, 2020, midnight

Description: This is the only phase of the competition. Participants have to submit prediction files in the same format as the data provided for train or development in a file "results.tsv" within a ZIP file.

Competition Ends

May 24, 2020, 11 p.m.

You must be logged in to participate in competitions.

Sign In