Lexical Frame Induction is defined as the process of grouping verbs and their dependant words in type-feature structures (i.e. frames) in a fully unsupervised manner. The Berkeley FrameNet database is the most well known resource of such typed-feature structures. While lexical frame resources are proved to be helpful (even essential) in a range of NLP tasks and linguistics investigations, building them for new languages and domains is resource intensive and thus expensive. This problem can be alleviated using unsupervised lexical frame induction methods. The goal of this task is to develop a benchmark and allow comparison of unsupervised frame induction systems for building lexical frame resources for verbs and their arguments.
Please join the dedicated google group for inquiries about this shared task.
To contact the task organizers directly, please send an email to semeval-2019-task-2-organizers@googlegroups.com.
Target verbs and their arguments (syntactic dependants) in a test set must be clustered into automatically-learned frame structures. The test set consists of approximately 5000 frames from 1000 randomly chosen sentences from the PTB 3.0. The organizers provide free evaluation access to the full PTB 3.0 (courtesy of LDC under the specified evaluation license).
Since this is an unsupervised task, sentences are not annotated for frames. The chosen verbs in the sentences for testing will be revealed to participants at a later stage near to the evaluation period. The gold annotations for the test set will be revealed to participants only after the evaluation and submission periods.
The assumed frame structures have a head and an arbitrary number of slots/roles. A verb lexicalizes the head of a frame and (some of) the verb's arguments are the slot fillers for the frame that the verb evokes. For the test set, the position of the target verbs and their argument are given (i.e., subcategorization frames are assumed a priori rather than being determined by systems).
Participants are allowed to use additional corpora and tools for training as long as these resources/tools do not contain or use explicit (supervised) semantic annotations regarding word senses, frame groupings, and semantic roles. For instance, while using a lexical semantic resource such as Princeton WordNet is not allowed, participants can (and are encouraged to use) WordNet-like lexical databases that are built automatically.
Participants are invited to partake in one or more of the following subtasks:
For this subtask, participants are required to assign occurrences of the target verbs to a number of cluster, in such a way that verbs belonging to the same cluster evoke the same frame type. For instance, in the following examples:
a. Trump leads the world, backward.
b. Disrespecting international laws leads to many complications.
c. Rosenzweig heads the climate impacts section at NASA's Goddard Institute.
we expect that the verbs `to lead' in ex. a and `to head' in ex. c end up in one cluster (e.g., call it Leadership after FrameNet) whereas `to lead' in ex. b will end up in another cluster (e.g., call it Cause) in which instances of verbs `originate', `produce', an so on (when they are used in the same sense) can be found. As exemplified above, the subtask 1 goes beyond the verb-sense induction task by requiring grouping of synonym, troponym, (even) antonym, ... senses of verbs together.
Our annotations for this subtask will be based on FrameNet definitions, where it is covered.
For this subtask, arguments of verbs must be grouped to a number of frame-specific slots similar to FrameNet. That is, we assume argument groupings are specific to frame types and that they are not necessarily shared with other frames. As a result, participating in Subtask 2.1 demands participation in Subtask 1 since evaluations of argument groupings are done per frame cluster. However, one could build frame specific slot-clusters by using a heuristic/assumption such as a frame per verb-form.
In contrast to subtask 2.1, here verb arguments are clustered into a set of generic roles that are defined independently of frame definitions. Hence, this subtask is very similar to unsupervised semantic role induction. Providing frame clustering (i.e., subtask 1) is not mandatory for this subtask and groupings of verbs arguments into latent semantic roles are evaluated disregarding the frames that the verbs belong to.
The evaluation framework consists of a number of clustering evaluation measures. The induced groupings for frame types/slots/roles are evaluated as clusters of examples. These examples are compared to instances with the gold standard annotations. The scorer reports performance with respect to a range of evaluation metrics; however, for official ranking of systems we use
The scorer program can be downloaded from here (also source codes from Github) and used in your local machine (requires Java 1.8).
For subtask 1, we report results from clusterings that are formed by the assumption that each verb-form evokes a frame (i.e., one frame per verb), assigning each verb-occurrence to a cluster (i.e., one cluster per instance), and grouping verbs to randomly generated clusters (i.e, random baseline). Additionally, we report an additional baseline measure using the method proposed by Kallmeyer e. al. (2018).
For subtask 2.1 and 2.2, we use grouping verb arguments by the type of syntactic relations to the head verb, e.g., subjects form one cluster, objects from another cluster, and so on (i.e., one frame per syntactic category) as well as a random baseline.
To obtain a valid license for the full test and training sets, participants must agree to "SemEval 2019 Task 2 Evaluation Agreement" as set by LDC to access the Penn Treebank 3.0 (PTB). In a nutshell, LDC's agreement states that 1) the data can only be used for the purpose of this shared task, and 2) that users' access to data is temporary and they must destroy the data at the completion of the shared task. Accepting this agreement is necessary since the test set is part of the PTB: a subset of 1000 sentences that are randomly chosen from the texts in the Wall Street Journal section of the PTB.
once your request is processed by LDC, you will receive additional instructions from LDC (and the organizers) to obtain the data including a dev-set which contains gold annotations for 200 frames. Please note that obtaining the license from the LDC may take a few days.
The full blind testing set will be available here later (around December and close to the evaluation period).
A minimal example can be downloaded from here.
Participants are provided with a CoNLL-U format of the tokenized and automatically-parsed (in the universal dependencies formalism) sentences in PTB 3.0. For instance, for the input sentence
Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
which appeared as the first sentence in wsj_0001 section of PTB, the CoNLL-U file will contain the following record:
#20001001
1 | Pierre | Pierre | PROPN | NNP | _ | 2 | compound | _ | _ |
2 | Vinken | Vinken | PROPN | NNP | _ | 9 | nsubj | _ | _ |
3 | , | , | PUNCT | , | _ | 2 | punct | _ | _ |
4 | 61 | 61 | NUM | CD | _ | 5 | nummod | _ | _ |
5 | years | year | NOUN | NNS | _ | 6 | nmod:npmod | _ | _ |
6 | old | old | ADJ | JJ | _ | 2 | amod | _ | _ |
7 | , | , | PUNCT | , | _ | 2 | punct | _ | _ |
8 | will | will | AUX | MD | _ | 9 | aux | _ | _ |
9 | join | join | VERB | VB | _ | 0 | root | _ | _ |
10 | the | the | DET | DT | _ | 11 | det | _ | _ |
11 | board | board | NOUN | NN | _ | 9 | dobj | _ | _ |
12 | as | as | ADP | IN | _ | 15 | case | _ | _ |
13 | a | a | DET | DT | _ | 15 | det | _ | _ |
14 | nonexecutive | nonexecutive | ADJ | JJ | _ | 15 | amod | _ | _ |
15 | director | director | NOUN | NN | _ | 9 | nmod | _ | _ |
16 | Nov. | Nov. | PROPN | NNP | _ | 9 | nmod:tmod | _ | _ |
17 | 29 | 29 | NUM | CD | _ | 16 | nummod | _ | _ |
18 | . | . | PUNCT | . | _ | 9 | punct | _ | _ |
in which the first line (starts with #) denotes the identifier assigned to this sentence. These identifiers together with the assigned positions to the word forms (numbers in the first column) in the sentences are used to form evaluation instances. For example, for subtask 1, the following string:
#20001001 9 join.Becoming-a-member
states that the verb at position 9 (with the lemma `join') of the sentence assigned to id #20001001 evokes a frame of type Becoming-a-member; note the format we use to convey this information, i.e.:
#sent-id position verb-lemma.ftype
In a similar way, for subtask 1.1 we have the record:
#20001001 9 join.Becoming-a-member vinken-:-2-:-New-member board-:-11-:-Group
and for the subtaks 2.2 we have
#20001001 9 join.Becoming-a-member vinken-:-2-:-Agent board-:-11-:-Patient
(note the difference between the slot/role labels). Here, we assumed that the frame has two core arguments.
As exemplified, for the subtasks 2.1 and 2.2, the records have the format:
#sent-id position verb-lemma.{ftype} (word, position, slot-type)\supscript<+>
in which (word, position, slot-type) are delimited using the char-sequence of -:-.
In the trial dataset, alongside the positional information that determines the shape of frames, the labels for frame types and slots/roles are provided, too. For the final testing set, labels that must be predicted by systems are replaced by the keyword UKN. For instance, the testing set for task 1 will look like:
#20001001 9 join.UKN
and for task 2.1:
#20001001 9 join.UKN vinken-:-2-:-UKN board-:-11-:-UKN
and for the subtask 2.2 we have
#20001001 9 join.NA vinken-:-2-:-UKN board-:-11-:-UKN
(Note that for subtask 2.2 the prediction of the frame type is not necessary).
The scorer program can be downloaded from here (also source codes from Github) and used in your local machine (requires Java 1.8).
Start: June 30, 2018, midnight
Description: Subtask 1: grouping verbs to frame clusters
Start: June 30, 2018, midnight
Description: Subtask 2.1: grouping roles to frame specific slots
Start: June 30, 2018, midnight
Description: Subtask 2.2 -- induction of semantic roles
Never
You must be logged in to participate in competitions.
Sign In