|SemEval 2019 will take place at NAACL 2019 in Minneapolis, MN June 6-7.|
The evaluation phase has started!
[December 13, 2018 - Update] The Public Data contains the training and development data for English, French and German. Please use the data published here and not the data available on GitHub. The Input Data contains the (unannotated) test data, on which you should run your trained parsers and submit the result.
The Starting Kit contains the full text of Twenty Thousand Leagues Under the Sea in English and French. While the first five chapters were annotated for UCCA and are available as part of the task training/development/test data, the rest of the book is unannotated. We release the tokenized text here, with features calculated by spaCy 2.0.11 and UDPipe 1.2 (using UD v2.2 models), as optional extra data (e.g., for unsupervised pre-training). Note that in German, the whole book is annotated and is split into the training/development/test data.
|We are glad to announce that Microsoft has generously provided some sponsorship for SemEval this year, to encourage student participation! One student system paper author, nominated by the task organizers, will have SemEval Workshop registration fees waived (note: this is *only* workshop registration fees. Conference fees, travel, and lodging are not sponsored).|
Semantic represetation is receiving growing attention in NLP in the past few years, and many proposals for semantic schemes have recently been put forth. Examples include Abstract Meaning Representation, Broad-coverage Semantic Dependencies, Universal Decompositional Semantics, Parallel Meaning Bank, and Universal Conceptual Cognitive Annotation. These advances in semantic representation, along with corresponding advances in semantic parsing, hold promise benefit essentially all text understanding tasks, and have already demonstrated applicability to summarization, paraphrase detection, and semantic evaluation (using UCCA; see below).
In addition to their potential applicative value, work on semantic parsing poses interesting algorithmic and modelling challenges, which are often different from those tackled in syntactic parsing, including reentrancy (e.g., for sharing arguments across predicates), and the modelling of the interface with lexical semantics. Semantic parsing into such schemes has been much advanced by recent SemEval workshops, including two tasks on Broad-coverage Semantic Dependency Parsing and two tasks on AMR parsing. We expect that a SemEval task on UCCA parsing to have a similar effect. Moreover, given the conceptual similarity between the different semantic representations, it is likely that work on UCCA parsing will directly contribute to the development of other semantic parsing technology. Furthermore, conversion scripts are available between UCCA and the SDP, CoNLL-U and AMR formats. Teams that participated in past shared tasks on SDP, UD and AMR, are encouraged to participate using similar systems and a conversion-based protocol.
UCCA is a cross-linguistically applicable semantic representation scheme, building on the established Basic Linguistic Theory typological framework. It has demonstrated applicability to multiple languages, including English, French and German (with pilot annotation projects on Czech, Russian and Hebrew), and stability under translation. It has proven useful for defining semantic evaluation measures for text-to-text generation tasks, including machine translation, text simplification and grammatical error correction.
UCCA supports rapid annotation by non-experts, assisted by an accessible annotation interface. The interface is powered by an open-source, flexible web-application for syntactic and semantic phrase-based annotation in general, and for UCCA annotation in particular.1
The task consists in parsing text according to the UCCA semantic annotation. The task starts from pre-tokenized text.
UCCA represents the semantics of linguistic utterances as directed acyclic graphs (DAGs), where terminal (childless) nodes correspond to the text tokens, and non-terminal nodes to semantic units that participate in some super-ordinate relation. Edges are labelled, indicating the role of a child in the relation the parent represents. Nodes and edges belong to one of several layers, each corresponding to a “module” of semantic distinctions.
UCCA’s foundational layer covers the predicate-argument structure evoked by predicates of all grammatical categories (verbal, nominal, adjectival and others), the inter-relations between them, and other major linguistic phenomena such as semantic heads and multi-word expressions. It is the only layer for which annotated corpora exist at the moment, and will thus be the target of this shared task. The layer’s basic notion is the Scene, describing a state, action, movement or some other relation that evolves in time. Each Scene contains one main relation (marked as either a Process or a State), as well as one or more Participants. For example, the sentence “After graduation, John moved to Paris” (see figure) contains two Scenes, whose main relations are “graduation” and “moved”. “John” is a Participant in both Scenes, while “Paris” only in the latter. Further categories account for inter-Scene relations and the internal structure of complex arguments and relations (e.g. coordination, multi-word expressions and modification).
UCCA distinguishes primary edges, corresponding to explicit relations, from remote edges (appear dashed in the figure) that allow for a unit to participate in several super-ordinate relations. Primary edges form a tree in each layer, whereas remote edges enable reentrancy, forming a DAG.
UCCA graphs may contain implicit units with no correspondent in the text. The figure shows the annotation for the sentence “A similar technique is almost impossible to apply to other crops, such as cotton, soybeans and rice.”. The sentence was used by to compare different semantic dependency schemes. It includes a single Scene, whose main relation is “apply”, a secondary relation “almost impossible”, as well as two complex arguments: “a similar technique” and the coordinated argument “such as cotton, soybeans, and rice.” In addition, the Scene includes an implicit argument, which represents the agent of the “apply” relation.
While parsing technology is well-established for syntactic parsing, UCCA has several distinct properties that distinguish it from syntactic representations, mostly UCCA’s tendency to abstract away from syntactic detail that does not affect argument structure. For instance, consider the following examples where the concept of a Scene has a different rationale from the syntactic concept of a clause. First, non-verbal predicates in UCCA are represented like verbal ones, such as when they appear in copula clauses or noun phrases. Indeed, in the figure, “graduation” and “moved” are considered separate Scenes, despite appearing in the same clause. Second, in the same example, “John” is marked as a (remote) Participant in the graduation Scene, despite not being explicitly mentioned. Third, consider the possessive construction in “John’s trip home”. While in UCCA “trip” evokes a Scene in which “John” is a Participant, a syntactic scheme would analyze this phrase similarly to “John’s shoes”.
The differences in the challenges posed by syntactic parsing and UCCA parsing, and more generally semantic parsing, motivate the development of targeted parsing technology to tackle it.
The UCCA annotation guidelines can be found here.
Note that for the purpose of the shared task, the Time (T) category is merged with the Adverbial (D) category. That is, all the instances of T should be replaced by D.
The list of the UCCA categories relevant to the task can be found here.
Several baselines have been proposed, using different classifiers (sparse perceptron or feedforward neural network), and using conversion-based approaches that use existing parsers for other formalisms to parse UCCA by constructing a two-way conversion protocol between the formalisms.
TUPA has shown superior performance over all such approaches, and will thus serve as a strong baseline for system submissions to the shared task.
The code and documentation for TUPA can be found here.
More information including the resources can be found in UCCA general resource page.
For more questions kindly look at other sections of the site. Questions that are left unanswered may be inquired in the dedicated group.
Participant systems in the task will be evaluated in four settings:
English in-domain setting, using the Wiki corpus.
English out-of-domain setting, using the Wiki corpus as training and development data, and 20K Leagues as test data.
German in-domain setting, using the 20K Leagues corpus.
French setting with no training data (except trial data), using the 20K Leagues corpus as development and test data.
In order to allow both even ground comparison between systems and using hitherto untried resources, we will hold both an open and a closed track for submissions in the English and German settings. Closed track submissions will only be allowed to use the gold-standard UCCA annotation distributed for the task in the target language, and will be limited in their use of additional resources. Concretely, the additional data they will be allowed to use will only consist of that used by TUPA, which consists of automatic named entity annotations provided by spaCy1, and automatic POS tags and syntactic dependency relations provided by UDPipe.2 In addition, the closed track will allow the use of word embeddings provided by fastText3 for all languages.
Systems in the open track, on the other hand, will be allowed to use any additional resource, such as UCCA annotation in other languages, dictionaries or datasets for other tasks, provided that they make sure not to use any additional gold standard annotation over the same text used in the UCCA corpora.4 In both tracks, we will require that submitted systems will not be trained on the development data. Development data can be used for tuning. Due to the absence of an established pilot study for French, we will only hold an open track for this setting. Training for French is allowed on the trial data (15 sentences).
The four settings and two tracks result in a total of 7 competitions, where a team may participate in anywhere between 1 and 7 of them. We will encourage submissions in each track to use their systems to produce results in all settings. In addition, we will encourage closed-track submissions to also submit to the open track.
To convert manually:
pip install semstr
python -m semstr.convert [filenames] -f [format] -o [out_dir]
Note that while the NeGra export format preserves all the information in the UCCA graphs, conversion to the sdp, conllu, conll and amr formats is lossy, due to the bilexical dependency structure (and due to reentrancies in AMR not being separated to primary and remote). Below are the labeled scores of converting the English Wiki corpus to these formats and back to the standard format:
In order to evaluate how similar an output UCCA structure is to a gold UCCA graph, we use DAG F1-score . Formally, over two UCCA annotations G1 and G2 that share their set of leaves (tokens) W and for a node v in G1 or G2 , define its yield (yield(v) subset or equal W) as its set of leaf descendants. Define a pair of edges ((v1,u1) in G1) and ((v2,u2) in G2) to be matching if (yield(u1) = yield(u2)) and they have the same label. Labeled Precision and Recall are defined by dividing the number of matching edges in G1 and G2 by |E1| and |E2| respectively. DAG F1-score is their harmonic mean. We will report Precision, Recall and F1 scores both for primary and remote edges. For the sake of this task's evaluation implicit units are disregarded and do not count for the evaluation. Also, the measures are indifferent to the position of the Function category.
The Center (C) category is disregarded by the evaluation in the two following cases:
1. If the unique child v of a node u is annotated as C, then v is disregarded. So in this case, if v is a leaf, u will be considered as a leaf instead of v and if v is not a leaf, the child nodes of v will be considered as the child nodes of u.
2. If v is a unique center in a unit u (i.e. the other children of u are not annotated as centers), and w is a unique center in v, then v is disregarded. That is, the child nodes of v (including w) will be considered as the child nodes of u.
Normalization will be automatically run before the evaluation using this script.
For each of the seven competitions, we will report winning systems according to the Primary F1-score and according to the Remote F1-score.
For a more fine-grained evaluation, Precision, Recall and F1 scores of specific category (edge labels) will also be reported. UCCA labels can be divided into categories that correspond to Scene elements (States, Processes, Participants, Adverbials), non-Scene elements (Elaborators, Connectors, Centers), and inter-Scene Linkage (Parallel Scenes, Linkage, Ground). We will report performance for each of these sets separately, leaving out Function and Relator units that do not belong to any particular model.
To evaluate manually:
pip install semstr
python -m semstr.evaluate
We are not aware of any such annotation, but include this restriction for completeness.↩
Participants in the task will submit their results in the following format:
Please try the evaluation code as early as possible.
Submit your system outputs here.
The results of the practice phase can be found here.
The results of the evaluation phase will be available at the end of the phase.
The results of the post evaluation phase can be found here.
Baseline models are available here.
To run these models, first install tupa:
pip install tupa==1.3.8
python -m tupa <DATA> -m <MODEL> -o <OUTDIR>
python -m tupa dev/closed/UCCA_English-Wiki -m ucca-bilstm-20180917 -o out/closed/UCCA_English-Wiki
These models are the baseline models for the following competition tracks:
Competitors are not allowed to use the test set or the dev set for training, use external data in competitions where it is stated they should not and violate any other rule of the competition.
Groups should not submit more than one system unless the systems differ in a meaningful way from one another, if unsure, contact the organizers.
All data released for this task is done so under the Creative Commons License (licenses could also be found with the data).
Organizers of the competition might choose to publicize, analyze and change in any way any content sent as a part of this task. Whenever appropriate academic citation for the sending group would be added (e.g. in a paper summarizing the task).
Competitions should comply with any general rules of SEMEVAL.
The organizers are free to penalized or disqualify for any violation of the above rules or for misuse, unethical behaviour or other behaviours they agree are not accepted in a scientific competition in general and in the specific one at hand.
Start: Aug. 20, 2018, midnight
Description: Develop and train your system, and try evaluating on development data.
Start: Dec. 13, 2018, midnight
Description: Run the trained system on test data and upload for evaluation.
Start: Feb. 1, 2019, midnight
You must be logged in to participate in competitions.Sign In