Overview:
Abstract Meaning Representation (AMR) is a compact, readable, whole-sentence semantic annotation. Annotation components include entity identification and typing, PropBank semantic roles, individual entities playing multiple roles, entity grounding via wikification, as well as treatments of modality, negation, etc.
Here is an example AMR for the sentence “The London emergency services said that altogether 11 people had been sent to hospital for treatment due to minor wounds.”
(s / say-01 :ARG0 (s2 / service :mod (e / emergency) :location (c / city :wiki ‘‘London’’ :name (n / name :op1 ‘‘London’’))) :ARG1 (s3 / send-01 :ARG1 (p / person :quant 11) :ARG2 (h / hospital) :mod (a / altogether) :purpose (t / treat-03 :ARG1 p :ARG2 (w / wound-01 :ARG1 p :mod (m / minor)))))
Note the inclusion of PropBank semantic frames (‘say-01’, ‘send-01’, ‘treat-03’, ‘wound-01’), grounding via wikification (‘London’), and multiple roles played by an entity (e.g. ‘11 people’ are the ARG1 of send-01, the ARG1 of treat-03, and the ARG1 of wound-01).
In 2016 SemEval held its first AMR parsing challenge and received strong submissions from 11 diverse teams. In 2017 we have extended the challenge to both parsing of biomedical data and generation. This subtask is concerned with the latter:
Subtask 2: AMR-to-English Generation
In this completely new subtask, participants will be provided with AMRs and will have to generate valid English sentences. Scoring will make use of human evaluation. The domain of this subtask will be general news and discussion forum, much like was done in 2016's parsing task.
For the AMR from above:
(s / say-01 :ARG0 (s2 / service :mod (e / emergency) :location (c / city :wiki ‘‘London’’ :name (n / name :op1 ‘‘London’’))) :ARG1 (s3 / send-01 :ARG1 (p / person :quant 11) :ARG2 (h / hospital) :mod (a / altogether) :purpose (t / treat-03 :ARG1 p :ARG2 (w / wound-01 :ARG1 p :mod (m / minor)))))
a correct answer would, of course, be "The London emergency services said that altogether 11 people had been sent to hospital for treatment due to minor wounds." However, another correct answer would be "London emergency services say that altogether eleven people were sent to the hospital for treating of their minor wounds." Sentences will be automatically scored by single-reference BLEU and possibly other automated metrics as well. However, they will also be scored by human preference judgments, using the methods (and interface) employed by WMT. Ultimately, the results judged best by human evaluators get the SemEval trophy.
Example general-domain data with AMRs can be found here
Existing AMR-related research: Kevin Knight has been keeping a list here. It is hard to keep up though, so please send email to jonmay@isi.edu if yours is missing and you want a citation)
Participation is a two-phase process:
Participation in each phase is more or less the same:
The primary trophy-determining metric for this subtask will be a human judgement obtained by the union of (possibly empty) sets of SemEval participants, other NLP researchers, other individuals known to the task organizer, and crowdsourced workers. The TrueSkill algorithm, as described in the WMT 2016 findings paper, will be used to produce a numerical metric.
Automated metrics, which may include but are not limited to BLEU, will be used in the online submission system. These metrics are not official.
We welcome the proposal of human and automated metrics for this task, since it is not at all clear that the above proposed methods are in fact the best way to evaluate systems. That being said, unless otherwise indicated by the task organizer, the trophy-determining metric is that listed above.
By submitting to the 'Evaluation' phase of this track you agree to the public release of your submissions' scores at the SemEval 2017 workshop and in the associated publicly available proceedings, at the task organizer's discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and other metrics as the task organizer sees fit. You accept that the ultimate decision of metric choice and score value is that of the task organizer. You further agree that your system will be named according to the team name provided at the time of submission or to a suitable shorthand, as determined by the task organizer. You agree that the task organizer is under no obligation to release scores and that scores may be withheld if it is the task organizer's judgement that the submission was incomplete, deceptive, or violated the letter or spirit of the competition's rules. Inclusion or exclusion of a submission's scores is not an endorsement or unendorsement of a team or individual's submission, system, or science. You further acknowledge that all trophy-making decisions are made at the sole discretion of the task organizer and that the organizer may present zero or more trophies. The definition of what constitutes a trophy is up to the task organizer.
Start: Aug. 1, 2016, midnight
Description: Generate from News/Forum AMRs from LDC2016E25. See 'Evaluation' under 'Learn the Details' for information on how to submit.
Start: Jan. 9, 2017, midnight
Description: Generation from the SemEval 2017 Task 9 News/Forum AMR Evaluation corpus. This data will be released when the evaluation period begins. See 'Evaluation' under 'Learn the Details' for information on how to submit.
Jan. 21, 2017, midnight
You must be logged in to participate in competitions.
Sign In