Dependency-based syntactic parsing has become popular in NLP in recent years. One of the reasons for this popularity is the transparent encoding of predicate-argument structures, which is useful in many downstream applications. Another reason is that it is better suited than phrase-structure grammars for languages with free or flexible word order.
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) across different human languages. Moreover, the UD initiative is an open community effort with over 200 contributors which has produced more than 100 treebanks in over 70 languages.
The aim of this sub-task is to challenge participants to apply their systems or solutions to the problem of Universal Dependency parsing of Spanish news articles as defined in the Annotation Guidelines for the CAPITEL corpus that will be shared with the participants.
A subset of the CAPITEL corpus will be provided (a maximum supporting data set of 250,000 revised words is estimated). In addition to head and dependency relations in CoNLL-U format, this subset will be tokenized and annotated with lemmas and UD tags and features.
The entire CAPITEL supporting data will be randomly sampled into three subsets: training, development and test. The training set will comprise 50% of the corpus, whereas the development and test sets will roughly amount to 25% each. Together with the test set release, we will release an additional collection of documents (background set) to ensure that participating teams are not be able to perform manual corrections, and also encourage features such as scalability to larger data collections
The metrics for the evaluation phase will be the following:
with the latter being used as the official evaluation score, and which will be used for the final ranking of the participating teams.
Start: May 17, 2020, midnight
Description: This is the first and only phase of the competition. Participants have to submit prediction files in the same format as the data provided for train or development in a file "results.conllu" within a ZIP file.
May 24, 2020, 11 p.m.
You must be logged in to participate in competitions.
Sign In