CAPITEL-EVAL 2020 - UD

Organized by lea - Current server time: Sept. 27, 2020, 2:36 p.m. UTC

First phase

Evaluation phase
May 17, 2020, midnight UTC

End

Competition Ends
May 24, 2020, 11 p.m. UTC

General Overview

Dependency-based syntactic parsing has become popular in NLP in recent years. One of the reasons for this popularity is the transparent encoding of predicate-argument structures, which is useful in many downstream applications. Another reason is that it is better suited than phrase-structure grammars for languages with free or flexible word order.

Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) across different human languages. Moreover, the UD initiative is an open community effort with over 200 contributors which has produced more than 100 treebanks in over 70 languages. 

The aim of this sub-task is to challenge participants to apply their systems or solutions to the problem of Universal Dependency parsing of Spanish news articles as defined in the Annotation Guidelines for the CAPITEL corpus that will be shared with the participants.

Linguistics Resources

A subset of the CAPITEL corpus will be provided (a maximum supporting data set of 250,000 revised words is estimated). In addition to head and dependency relations in CoNLL-U format, this subset will be tokenized and annotated with lemmas and UD tags and features.

The entire CAPITEL supporting data will be randomly sampled into three subsets: training, development and test. The training set will comprise 50% of the corpus, whereas the development and test sets will roughly amount to 25% each. Together with the test set release, we will release an additional collection of documents (background set) to ensure that participating teams are not be able to perform manual corrections, and also encourage features such as scalability to larger data collections

Evaluation Metrics

The metrics for the evaluation phase will be the following:

  • Unlabeled Attachment Score (UAS): The percentage of words that have the correct head. 
  • Labeled Attachment Score (LAS): The percentage of words that have the correct head and dependency label.

with the latter being used as the official evaluation score, and which will be used for the final ranking of the participating teams.

Schedule

  • March, 15: Sample set, Evaluation script and Annotation Guidelines released.
  • March, 17: Training set released.
  • April,  1: Development set released.
  • April, 29: Test set released (includes background set).
  • May,   17: Systems output submissions.
  • May,   21: Results posted and Test set with GS annotations released.
  • May,   28: Results posted and Test set with GS annotations released.
  • May,   31: Working notes paper submission.
  • June,  15: Notification of acceptance (peer-reviews).
  • June,  30: Camera ready paper submission.
  • September: IberLEF 2020 Workshop.

Evaluation phase

Start: May 17, 2020, midnight

Description: This is the first and only phase of the competition. Participants have to submit prediction files in the same format as the data provided for train or development in a file "results.conllu" within a ZIP file.

Competition Ends

May 24, 2020, 11 p.m.

You must be logged in to participate in competitions.

Sign In