[inactive] SemEval-2018 task 7

Organized by dbusca - Current server time: Sept. 25, 2018, 4:37 p.m. UTC

Previous

Practice - Subtask 1.2
Sept. 1, 2017, midnight UTC

Current

Practice - Subtask 2
Sept. 1, 2017, midnight UTC

End

Competition Ends
Never

Welcome!

Please note that this task is being moved to another CodaLab page:

https://competitions.codalab.org/competitions/17422

If you wish to participate, please subscribe there. This page will be closed.

Contact e-mails: {buscaldi, gabor}@lipn.univ-paris13.fr 

There will be three evaluation scenarios:

  • Subtask 1.1 Relation Classification (clean data)

  • Subtask 1.2 Relation Classification (noisy data)

  • Subtask 2 Relation Extraction and Classification (clean data)

Participants taking part in all three subtasks should prepare three text files following the required format, one per subtask. It is possible to take part in only a subset of the scenarios. Moreover, for subtask 2. it is possible to take part only in the extraction scenario; in this case, relation labels have to be replaced by the label 'ANY'. For each scenario, participants are allowed to submit up to three runs. Participants may use additional external resources, as long as they declare this at submission time. However, participants may not manually annotate the test data.

 

Submission format

The submission format is the same for all subtasks and should have the form:

RELATION-TYPE(entity-id,entity-id[,REVERSE])

Entities in the relation should be given in the order in which they occur in the text (ascending order of the entity ID). The REVERSE flag is optional for the extraction scenario. For the classification scenario, it is used for encoding directionality when argument order is the inverse of the order in which the appear in the text.

For subtask 2, if the submission concerns only the relation extraction prediction, the flag ANY should be used as relation type:

ANY(entity-id,entity-id). There is no need to specify the directionality of the relation, as this is not evaluated for the extraction task.

 

Evaluation metrics

For subtasks 1.1 and 1.2 which are usual classification tasks, the following class-based evaluation metrics are used:

      • for every distinct class: precision, recall and F1-measure (β=1)

      • global evaluation, for the set of classes:

        • macro-average of the F1-measures of every distinct class

        • micro-average of the F1-measures of every distinct class

The official ranking of submission will be according to the macro-average F1 score.

For subtask 2., two evaluations will be performed on the same submission.

For the extraction task, the following measures will be calculated:

  • precision:percentage of pairs of entities that were correctly connected (directionality and relation labels are ignored)
  • recall: percentage of pairs of entities connected in the gold standard that were found (directionality and relation labels are ignored)
  • F1-score (β=1): harmonic mean of precision and recall.

For the classification task, only correctly connected pairs with a correct directionality (when relevant) and correct relation label that are in the gold standard are considered as a correct instance. Skipped instances (that are in the gold standard but are not extracted) are considered as wrong. The class-based evaluation metrics are:

        • Precision: percentage of relation instances that were correctly connected, with a correct directionality and a correct label assignment. Erroneously extracted instances which are not in the gold standard, instances labeled with ANY,  and instances with a wrong directionality are considered as wrong predictions.

        • Recall: percentage of relation instances belonging to a specific relation in the gold standard that were found, with a correct directionality and correct label assignment.

        • F-measure (β=1): harmonic mean of previous precision and recall

      • global metrics:

        • macro-average of the F1-measures of every distinct class
        • micro-average of the F1-measures of every distinct class

Participants will be evaluated and ranked on both extraction and classification by default. The official score for the extraction task is F1-score. The official score of the classification task is macro-average F1 score. if only relation extraction prediction is submitted, classification metrics will not be considered. 

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its licence.

Semantic Relations

Relation instances are to be classified into one of the following relations: USAGE, RESULT, MODEL, PART_WHOLE, TOPIC, COMPARISON.

1. USAGE is an asymmetrical relation. It holds between two entities X and Y, where, for example: 

X is used for Y

X is a method used to perform a task Y

X is a tool used to process data Y

X is a type of information/representation of information used by/in a system Y)

 

2. RESULT is an asymmetrical relation. It holds between two entities X and Y, where, for example: 

X gives as a result Y (where Y is typically a measure of evaluation)

X yields Y (where Y is an improvement or decrease)

a feature of a system or a phenomenon X yields Y (where Y is an improvement or decrease)

 

3. MODEL-FEATURE is an asymmetrical relation. It holds between two entities X and Y, where, for example:

X is a feature/an observed characteristic of Y

X is a model of Y

X is a tag(set) used to represent Y

 

4. PART_WHOLE is an asymmetrical relation. It holds between two entities X and Y, where, for example:

X is a part, a component of Y

X is found in Y

Y is built from/composed of X  

 

5. TOPIC is an asymmetrical relation. It holds between two entities X and Y, where, for example:

X deals with topic Y

X (author, paper) puts forward Y (an idea, an approach)

 

6. COMPARE is a symmetrical relation. It holds between two entities X and Y, where:

X is compared to Y (e.g. two systems, two feature sets or two results)

  



Subtasks

For each subtask, training and test data include abstracts of papers from the ACL Anthology Corpus with pre-annotated entities that represent concepts. Two types of tasks are proposed:

1) identifying pairs of entities that are instances of any of the six semantic relations (extraction task),

2) classifying instances into one of the six semantic relation types (classification task).


 

  • Subtask 1: Relation classification

The subtask is decomposed into two scenarios according to the data used: classification on clean data and classification on noisy data. The task is identical for both scenarios: given a relation instance consisting of two entities in context, predict the semantic relation between the entities. A relation instance is identified by the unique ID of the two entities.

For the subtask 1,  instances with directionality are provided in both the training data and the test data and they are not to be modified or completed in the test data; the relation label is provided in the training data and has to be predicted for the test data.

  •  1.1Relation classification on clean data -  The classification task is performed on data where entities are manually annotated, following the ACL RD-TEC 2.0 guidelines. Entities represent domain concepts specific to NLP, while high-level scientific terms (e.g. "hypothesis", "experiment") are not annotated.

Example (annotated text):

<abstract>

The key features of the system include: (i) Robust efficient <entity id="H01-1041.8">parsing</entity> of <entity id="H01-1041.9">Korean</entity> (a <entity id="H01-1041.10">verb final language</entity> with <entity id="H01-1041.11">overt case markers</entity> , relatively <entity id="H01-1041.12">free word order</entity> , and frequent omissions of <entity id="H01-1041.13">arguments</entity> ). (ii) High quality <entity id="H01-1041.14">translation</entity> via <entity id="H01-1041.15">word sense disambiguation</entity> and accurate <entity id="H01-1041.16">word order generation</entity> of the <entity id="H01-1041.17">target language</entity> .(iii) <entity id="H01-1041.18">Rapid system development</entity> and porting to new <entity id="H01-1041.19">domains</entity> via <entity id="H01-1041.20">knowledge-based automated acquisition of grammars</entity> .

</abstract>

Relation instances in the annotated text (provided for test data):

 
(H01-1041.8,H01-1041.9)

(H01-1041.10,H01-1041.11,REVERSE)

(H01-1041.14,H01-1041.15,REVERSE)

 

Submission format with predictions:

USAGE(H01-1041.8, H01-1041.9)

MODEL-FEATURE(H01-1041.10, H01-1041.11,REVERSE)

USAGE(H01-1041.14, H01-1041.15,REVERSE)

 

 

  • 1.2Relation classification on noisy data - The task is identical to 1.1., but the entities are annotated automatically and contain noise. The annotation comes from the ACL-RelAcS corpus and it is based on a combination of automatic terimonolgy extraction and external ontologies1. Entities are therefore terms specific to the given corpus, and include high-level terms (e.g. "algorithm", "paper", "method"). They are not always full NPs and they may include noise (verbs, irrelevant words). Relations were manually annotated in the training data and in the gold standard, between automatically annotated entities. Do not try to correct entity annotation in any way in your submission. 

 

Example (annotated text):

<abstract>

This <entity id="L08-1203.8">paper</entity> introduces a new <entity id="L08-1203.9">architecture</entity> that aims at combining molecular <entity id="L08-1203.10">biology</entity> <entity id="L08-1203.11">data</entity> with <entity id="L08-1203.12">information</entity> automatically <entity id="L08-1203.13">extracted</entity> from relevant <entity id="L08-1203.14">scientific literature</entity>

</abstract>

Relation instances in the annotated text (provided for test data):

(L08-1203.8,L08-1203.9)

(L08-1203.12,L08-1203.14)

 

Submission format for predictions:

TOPIC(L08-1203.8,L08-1203.9)

PART_WHOLE(L08-1203.12,L08-1203.14)

 

  • Subtask 2: Relation extraction and classification

This subtask combines the extraction task and the classification task. The training data for this scenario is the same that is used for subtask 1.1, i.e. manually annotated entities, semantic relations with relation types between these entities. The test data contains different abstracts than in 1.1 and only entity annotation is provided.

For the extraction task, participants need to identify pairs of entities in the abstracts that correspond to an instance of any of the six relations. For the classification task, relation labels of the extracted relations need to be predicted similarly to subtask 1. 

A single submission file is needed for the two tasks, which will be evaluated seperately on 1) the detection of relation instances and 2) the prediction of the semantic relation types. However, it is possible to submit results for the extraction task only: in this case, the label ANY has to be used instead of one of the six relation labels.

Relation directionality is not takent into account for the evaluation of the extraction task. Directionality is taken into account when relevant for the classification task (5 out of the 6 semantic relations are asymmetrical). For the classification evaluation, instances with wrongly paired arguments wrong directionality or with no attempted prediction (label 'ANY') are considered as wrong. If only the label ANY is used in the submission, it is considered that the classification task is not attempted and will not be evaluated.

For the classification task, arguments have to be sorted in the order in which they appear in the text, i.e. ascending order of their IDs. Directionality encoded as follows: if the first argument of the semantic relation comes second in the text, the REVERSE attribute has to be added, e.g.:
USAGE(P03-1068.1,P03-1068.2,REVERSRE)

Example of annotated text (same as for subtask 1.1):

<abstract>

The key features of the system include: (i) Robust efficient <entity id="H01-1041.8">parsing</entity> of <entity id="H01-1041.9">Korean</entity> (a <entity id="H01-1041.10">verb final language</entity> with <entity id="H01-1041.11">overt case markers</entity> , relatively <entity id="H01-1041.12">free word order</entity> , and frequent omissions of <entity id="H01-1041.13">arguments</entity> ). (ii) High quality <entity id="H01-1041.14">translation</entity> via <entity id="H01-1041.15">word sense disambiguation</entity> and accurate <entity id="H01-1041.16">word order generation</entity> of the <entity id="H01-1041.17">target language</entity> .(iii) <entity id="H01-1041.18">Rapid system development</entity> and porting to new <entity id="H01-1041.19">domains</entity> via <entity id="H01-1041.20">knowledge-based automated acquisition of grammars</entity> .

</abstract>

 

Submission format for a combined extraction+classification scenario (same as 1.1., but relation instances are not given in advance):

USAGE(H01-1041.8, H01-1041.9)

MODEL-FEATURE(H01-1041.10, H01-1041.11,REVERSE)

USAGE(H01-1041.14, H01-1041.15,REVERSE)

 

Submission format for an extraction only scenario:

ANY(H01-1041.8, H01-1041.9)

ANY(H01-1041.10, H01-1041.11)

ANY(H01-1041.14, H01-1041.15)

 

1Kata Gábor, Haïfa Zargayouna, Davide Buscaldi, Isabelle Tellier, Thierry Charnois: Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature. In: LREC 2016, Portoroz, Slovenia, May 2016.   



 

Organizers

Davide Buscaldi1, Thierry Charnois1, Kata Gábor1, Behrang QasemiZadeh2, Anne-Kathrin Schumann3, Isabelle Tellier4, Haïfa Zargayouna1

1 LIPN, UMR CNRS, Université Paris 13;

2 DFG Collaborative Research Centre 991, Heinrich-Heine University Düsseldorf

3 ProTechnology GmbH, Dresden, former: Department of Applied Linguistics, Translation and Interpreting, Saarland University

4 Laboratoire Lattice, CNRS and Université Sorbonne Nouvelle

 

Practice - Subtask 1.1

Start: Sept. 1, 2017, midnight

Practice - Subtask 1.2

Start: Sept. 1, 2017, midnight

Practice - Subtask 2

Start: Sept. 1, 2017, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 gabor.kata -