Metaphors is a form of figurative language used pervasively in our everyday lives. Consider the following examples:
... the alligator's teeth are like white daggers ...
... he swam in a sea of diamonds ...
... authority is a chair, it needs legs to stand ...
... in Washington, people change dance partners frequently, but not the dance ...
... Robert Muller is like a bulldog – he will get what he wants ...
The goal in this shared task is to detect, at a word level, all content-word metaphors in a given text (we will also have a separate evaluation for just the verbs, as many researchers are working specifically on verb metaphors). We will use the VU Amsterdam Metaphor Corpus (VUA) as the dataset, which consists of text fragments sampled across four genres from the British National Corpus (BNC): Academic, News, Conversation, and Fiction. The data is annotated according to the MIPVU procedure with the inter-annotator reliability of κ > 0.8.
The dataset used in this shared task is the VU Amsterdam Metaphor Corpus. We provide a script to parse VUAMC.xml, which is not provided here due to licensing concerns. We also provide a set of features used to construct the baseline classification model for prediction of metaphor/non-metaphor classes at the word level, and instructions on how to replicate our baselines. All submissions will be evaluated against these baselines. The metric for comparison will the F-measure.
Please refer to this metaphor shared task page for code and further details and instructions on how to get started.
Training phase: Data is released for developing and training your metaphor detection software. You could do cross-validation on the training data, or partition the training data further to have a held-out set for preliminary evaluations, and/or set apart a subset data for development/tuning of hyper-parameters. However the released data is used, the goal is to have N final systems (or versions of a system) ready for test when the test data is released.
Testing phase: Test instances are released. Each team generates predictions for the test instances, for up to N models. The submitted predictions are evaluated (by us) against the true labels. Submissions will be anonymized -- only the highest score of all submitted systems per day will be displayed. The metric will be F-measure with Precision and Recall also available via the detailed results link.
NOTE: Your submission must be named "answer.txt" and compressed into "answer.zip" for the evaluation to work.
We plan to have N=12 (up to 12 submissions per team) — this way, a number of versions of your system can be evaluated, without overwhelming the competition. You can try as many submissions as are allowed by CodaLab (1000), but only the last 12 submissions will be valid for comparative evaluation purposes with other systems participating in the shared task. For generating a task summary paper, we require that each submission provide the mandatory team name/individual, method name and method description. Specifically, please include any 3rd party datasets you train your system on (if any) in the method description textbox.
By submitting results to this competition, you consent to the public release of your scores at the Figurative Language Processing-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
Training and testing data will be made available and persist on github after the competition to encourage research in metaphor detection. You agree not to redistribute the these data except in the manner prescribed by its licence.
Start: Feb. 12, 2018, midnight
Description: Verbs Testing
Start: Feb. 12, 2018, midnight
Description: All POS Testing
March 8, 2018, 11:59 p.m.
You must be logged in to participate in competitions.Sign In