Information extraction tasks, formalized in the late 1980s, are designed to evaluate systems aimed at capturing pieces of information present in free text, with the goal of enabling better and faster information and content access. One important piece of information are named entities (NE) which, roughly speaking, are textual elements corresponding to names of people, places, organizations and others. Three processes can be applied to NEs: recognition (or identification), categorization (assigning a type according to a predefined set of semantic categories), and linking (disambiguating the reference).
Since its appearance, NER tasks have had notable success, but despite the relative maturity of this subfield, work and research continues to evolve, and new techniques and models appear alongside challenging datasets in different languages, domains and textual genres. The aim of this sub-task is to challenge participants to apply their systems or solutions to the problem of identifying and classifying NEs in Spanish news articles. This two-stage process is referred to as NERC (Named Entity Recognition and Classification).
The following NE categories will be evaluated:
as defined in the Annotation Guidelines that will be delivered to participants.
A subset of the CAPITEL corpus will be provided (a maximum supporting data set of 1 million revised words is estimated). The supporting data will be randomly sampled into three subsets: training, development and test. The training set will comprise 50% of the corpus, whereas the development and test sets will roughly amount to 25% each. Together with the test set release, we will release an additional collection of documents (background set) to ensure that participating teams are not be able to perform manual corrections, and also encourage features such as scalability to larger data collections. All the data will be distributed tokenized with named entities annotated in IOBES format.
The metrics used for evaluation will be the following:
with the latter being used as the official evaluation score, and which will be used for the final ranking of the participating teams.
Start: May 17, 2020, midnight
Description: This is the only phase of the competition. Participants have to submit prediction files in the same format as the data provided for train or development in a file "results.tsv" within a ZIP file.
May 24, 2020, 11 p.m.
You must be logged in to participate in competitions.
Sign In