SemEval 2019 Task 2

Organized by alfredo - Current server time: April 8, 2025, 7:01 p.m. UTC

Trial Subtask 2.1 -- frame specific role

June 30, 2018, midnight UTC

Current

Trial Subtask 2.2 -- generic role induction

June 30, 2018, midnight UTC

End

Competition Ends

Never

Overview
Subtasks Description and Evaluation
Datasets

SemEval 2019 Unsupervised Lexical Semantic Frame Induction Task

Lexical Frame Induction is defined as the process of grouping verbs and their dependant words in type-feature structures (i.e. frames) in a fully unsupervised manner. The Berkeley FrameNet database is the most well known resource of such typed-feature structures. While lexical frame resources are proved to be helpful (even essential) in a range of NLP tasks and linguistics investigations, building them for new languages and domains is resource intensive and thus expensive. This problem can be alleviated using unsupervised lexical frame induction methods. The goal of this task is to develop a benchmark and allow comparison of unsupervised frame induction systems for building lexical frame resources for verbs and their arguments.

Please join the dedicated google group for inquiries about this shared task.

To contact the task organizers directly, please send an email to semeval-2019-task-2-organizers@googlegroups.com.

Announcements

A bigger development training set is available now (to access this data, you must register with LDC)
The scorer for the task is available for download (04/07/2018).
The complete trial data will be available on July 15th.
A minimal example of evaluation data is added (03/07/2018).

Organizers

Regina Stodden, SFB 991 Duessldorf
Marie Candito, Université Paris Diderot (Paris 7)
Miriam R. L. Petruck, International Computer Science Institute (ICSI)
Rainer Osswald, SFB 991, Heinrich Heine Universität Düsseldorf
Laura Kallmeyer, SFB 991, Heinrich Heine Universität Düsseldorf
Alfredo Maldonado, Trinity College Dublin
Behrang Qasemizadeh, SFB 991, Heinrich Heine Universität Düsseldorf
Timm Lichte, SFB 991, Heinrich Heine Universität Düsseldorf

Task Description and Subtasks

Target verbs and their arguments (syntactic dependants) in a test set must be clustered into automatically-learned frame structures. The test set consists of approximately 5000 frames from 1000 randomly chosen sentences from the PTB 3.0. The organizers provide free evaluation access to the full PTB 3.0 (courtesy of LDC under the specified evaluation license).

Since this is an unsupervised task, sentences are not annotated for frames. The chosen verbs in the sentences for testing will be revealed to participants at a later stage near to the evaluation period. The gold annotations for the test set will be revealed to participants only after the evaluation and submission periods.

The assumed frame structures have a head and an arbitrary number of slots/roles. A verb lexicalizes the head of a frame and (some of) the verb's arguments are the slot fillers for the frame that the verb evokes. For the test set, the position of the target verbs and their argument are given (i.e., subcategorization frames are assumed a priori rather than being determined by systems).

Further instructions regarding obtaining the data and the format of files can be found in the Datesets tab.

Participants are allowed to use additional corpora and tools for training as long as these resources/tools do not contain or use explicit (supervised) semantic annotations regarding word senses, frame groupings, and semantic roles. For instance, while using a lexical semantic resource such as Princeton WordNet is not allowed, participants can (and are encouraged to use) WordNet-like lexical databases that are built automatically.

Participants are invited to partake in one or more of the following subtasks:

Subtask 1: Grouping Verbs to Frame Type Clusters

For this subtask, participants are required to assign occurrences of the target verbs to a number of cluster, in such a way that verbs belonging to the same cluster evoke the same frame type. For instance, in the following examples:

a. Trump leads the world, backward.
b. Disrespecting international laws leads to many complications.
c. Rosenzweig heads the climate impacts section at NASA's Goddard Institute.

we expect that the verbs `to lead' in ex. a and `to head' in ex. c end up in one cluster (e.g., call it Leadership after FrameNet) whereas `to lead' in ex. b will end up in another cluster (e.g., call it Cause) in which instances of verbs `originate', `produce', an so on (when they are used in the same sense) can be found. As exemplified above, the subtask 1 goes beyond the verb-sense induction task by requiring grouping of synonym, troponym, (even) antonym, ... senses of verbs together.

Our annotations for this subtask will be based on FrameNet definitions, where it is covered.

Subtask 2.1: Clustering arguments of verbs to frame-specific slots

For this subtask, arguments of verbs must be grouped to a number of frame-specific slots similar to FrameNet. That is, we assume argument groupings are specific to frame types and that they are not necessarily shared with other frames. As a result, participating in Subtask 2.1 demands participation in Subtask 1 since evaluations of argument groupings are done per frame cluster. However, one could build frame specific slot-clusters by using a heuristic/assumption such as a frame per verb-form.

Subtask 2.2: Clustering arguments of verbs to generic roles

In contrast to subtask 2.1, here verb arguments are clustered into a set of generic roles that are defined independently of frame definitions. Hence, this subtask is very similar to unsupervised semantic role induction. Providing frame clustering (i.e., subtask 1) is not mandatory for this subtask and groupings of verbs arguments into latent semantic roles are evaluated disregarding the frames that the verbs belong to.

Evaluation Setting

The evaluation framework consists of a number of clustering evaluation measures. The induced groupings for frame types/slots/roles are evaluated as clusters of examples. These examples are compared to instances with the gold standard annotations. The scorer reports performance with respect to a range of evaluation metrics; however, for official ranking of systems we use

Purity, inverse-Purity and their harmonic mean proposed by Stein-bach et al. (2000);
and, the harmonic mean of BCubed's Precision and Recall proposed by Bagga and Baldwin (1998).

The scorer program can be downloaded from here (also source codes from Github) and used in your local machine (requires Java 1.8).

Evaluation Baselines

For subtask 1, we report results from clusterings that are formed by the assumption that each verb-form evokes a frame (i.e., one frame per verb), assigning each verb-occurrence to a cluster (i.e., one cluster per instance), and grouping verbs to randomly generated clusters (i.e, random baseline). Additionally, we report an additional baseline measure using the method proposed by Kallmeyer e. al. (2018).

For subtask 2.1 and 2.2, we use grouping verb arguments by the type of syntactic relations to the head verb, e.g., subjects form one cluster, objects from another cluster, and so on (i.e., one frame per syntactic category) as well as a random baseline.

Datasets

To obtain a valid license for the full test and training sets, participants must agree to "SemEval 2019 Task 2 Evaluation Agreement" as set by LDC to access the Penn Treebank 3.0 (PTB). In a nutshell, LDC's agreement states that 1) the data can only be used for the purpose of this shared task, and 2) that users' access to data is temporary and they must destroy the data at the completion of the shared task. Accepting this agreement is necessary since the test set is part of the PTB: a subset of 1000 sentences that are randomly chosen from the texts in the Wall Street Journal section of the PTB.

To obtain a valid license:

Download the SemEval 2019 Task 2 Evaluation Agreement (click on the hyperlink assigned to this text).
Email a copy of the signed completed agreement to the LDC membership office ldc@ldc.upenn.edu with the shared task's organizers in cc shared-task-2-at-semeval-2019-organizers@googlegroups.com.

once your request is processed by LDC, you will receive additional instructions from LDC (and the organizers) to obtain the data including a dev-set which contains gold annotations for 200 frames. Please note that obtaining the license from the LDC may take a few days.

The full blind testing set will be available here later (around December and close to the evaluation period).

A minimal example can be downloaded from here.

Data Format

Participants are provided with a CoNLL-U format of the tokenized and automatically-parsed (in the universal dependencies formalism) sentences in PTB 3.0. For instance, for the input sentence

Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.

which appeared as the first sentence in wsj_0001 section of PTB, the CoNLL-U file will contain the following record:

#20001001

1	Pierre	Pierre	PROPN	NNP	_	2	compound	_	_
2	Vinken	Vinken	PROPN	NNP	_	9	nsubj	_	_
3	,	,	PUNCT	,	_	2	punct	_	_
4	61	61	NUM	CD	_	5	nummod	_	_
5	years	year	NOUN	NNS	_	6	nmod:npmod	_	_
6	old	old	ADJ	JJ	_	2	amod	_	_
7	,	,	PUNCT	,	_	2	punct	_	_
8	will	will	AUX	MD	_	9	aux	_	_
9	join	join	VERB	VB	_	0	root	_	_
10	the	the	DET	DT	_	11	det	_	_
11	board	board	NOUN	NN	_	9	dobj	_	_
12	as	as	ADP	IN	_	15	case	_	_
13	a	a	DET	DT	_	15	det	_	_
14	nonexecutive	nonexecutive	ADJ	JJ	_	15	amod	_	_
15	director	director	NOUN	NN	_	9	nmod	_	_
16	Nov.	Nov.	PROPN	NNP	_	9	nmod:tmod	_	_
17	29	29	NUM	CD	_	16	nummod	_	_
18	.	.	PUNCT	.	_	9	punct	_	_

in which the first line (starts with #) denotes the identifier assigned to this sentence. These identifiers together with the assigned positions to the word forms (numbers in the first column) in the sentences are used to form evaluation instances. For example, for subtask 1, the following string:

#20001001 9 join.Becoming-a-member

states that the verb at position 9 (with the lemma `join') of the sentence assigned to id #20001001 evokes a frame of type Becoming-a-member; note the format we use to convey this information, i.e.:

#sent-id position verb-lemma.ftype

In a similar way, for subtask 1.1 we have the record:

#20001001 9 join.Becoming-a-member vinken-:-2-:-New-member board-:-11-:-Group

and for the subtaks 2.2 we have

#20001001 9 join.Becoming-a-member vinken-:-2-:-Agent board-:-11-:-Patient

(note the difference between the slot/role labels). Here, we assumed that the frame has two core arguments.

As exemplified, for the subtasks 2.1 and 2.2, the records have the format:

#sent-id position verb-lemma.{ftype} (word, position, slot-type)\supscript<+>

in which (word, position, slot-type) are delimited using the char-sequence of -:-.

In the trial dataset, alongside the positional information that determines the shape of frames, the labels for frame types and slots/roles are provided, too. For the final testing set, labels that must be predicted by systems are replaced by the keyword UKN. For instance, the testing set for task 1 will look like:

#20001001 9 join.UKN

and for task 2.1:

#20001001 9 join.UKN vinken-:-2-:-UKN board-:-11-:-UKN

and for the subtask 2.2 we have

#20001001 9 join.NA vinken-:-2-:-UKN board-:-11-:-UKN

(Note that for subtask 2.2 the prediction of the frame type is not necessary).

Evaluation Scripts

The scorer program can be downloaded from here (also source codes from Github) and used in your local machine (requires Java 1.8).

Trial subtask 1: grouping verbs to frame

Start: June 30, 2018, midnight

Description: Subtask 1: grouping verbs to frame clusters

Trial Subtask 2.1 -- frame specific role

Start: June 30, 2018, midnight

Description: Subtask 2.1: grouping roles to frame specific slots

Trial Subtask 2.2 -- generic role induction

Start: June 30, 2018, midnight

Description: Subtask 2.2 -- induction of semantic roles

Competition Ends

Never

You must be logged in to participate in competitions.

Competition

SemEval 2019 Task 2

Previous

Current

End

SemEval 2019 Unsupervised Lexical Semantic Frame Induction Task

Announcements

Organizers

Task Description and Subtasks

Subtask 1: Grouping Verbs to Frame Type Clusters

Subtask 2.1: Clustering arguments of verbs to frame-specific slots

Subtask 2.2: Clustering arguments of verbs to generic roles

Evaluation Setting

Evaluation Baselines

Datasets

To obtain a valid license:

Data Format

Evaluation Scripts

Trial subtask 1: grouping verbs to frame

Trial Subtask 2.1 -- frame specific role

Trial Subtask 2.2 -- generic role induction

Competition Ends