Metaphor Shared Task

Organized by cleong - Current server time: May 23, 2019, 7:38 p.m. UTC


Verbs Testing
Feb. 12, 2018, midnight UTC


All POS Testing
Feb. 12, 2018, midnight UTC


Competition Ends
March 8, 2018, 11:59 p.m. UTC


** As a service to the metaphor research community, we decided to keep the registration for the shared task open although the competition had ended. If you are interested, you can access the scripts, data and instructions for the shared task. You can also evaluate your system to get the performance numbers (precision, recall, and F), and compare them with respect to systems that actually participated in the shared task. These and other details of the shared task will appear in the paper “A report on the 2018 VUA metaphor detection shared task” to be published at NAACL proceedings.

If you need to cite, the BibTeX entry is:

@inproceedings{cleong-naacl-flp-2018, title={A report on the 2018 VUA metaphor detection shared task}, author={Chee Wee Leong and Beigman Klebanov, Beata and Ekaterina Shutova}, booktitle={Proceedings of the Workshop on Figurative Language Processing}, month={June}, address={New Orleans, LA}, year={2018}}


This is an overview page of the metaphor shared task section in the First Workshop on Figurative Language Processing, co-located with NAACL 2018.

Important Dates

  • 1/12/18: 1st CFP for the shared task; CodaLab competition is open; training data and auxiliary scripts can be downloaded
  • 2/12/18: 2nd CFP for the shared task; test data can be downloaded and results submitted; performance will be tracked on CodaLab dashboard
  • 3/8/18: Last day for submitting predictions on test data
  • 3/15/18: Papers describing the systems are due for paper submission information, see here
  • 4/2/18: Notifications for papers are sent
  • 4/16/18: Camera ready papers are due
  • 6/5/18 or 6/6/18: Workshop

Contact Info and Emails:
  • Ben Leong, Educational Testing Service; cleong [AT] ets [DOT] org
  • Beata Beigman Klebanov, Educational Testing Service, bbeigmanklebanov [AT] ets [DOT] org 
  • Ekaterina Shutova, University of Cambridge, es407 [AT] cam [DOT] ac [DOT] uk

Metaphor Detection

Metaphors is a form of figurative language used pervasively in our everyday lives. Consider the following examples:

... the alligator's teeth are like white daggers  ...

... he swam in a sea of diamonds ...

... authority is a chair, it needs legs to stand ...

... in Washington, people change dance partners frequently, but not the dance ...

... Robert Muller is like a bulldog – he will get what he wants ...


The goal in this shared task is to detect, at a word level, all content-word metaphors in a given text (we will also have a separate evaluation for just the verbs, as many researchers are working specifically on verb metaphors). We will use the VU Amsterdam Metaphor Corpus (VUA) as the dataset, which consists of text fragments sampled across four genres from the British National Corpus (BNC): Academic, News, Conversation, and Fiction. The data is annotated according to the MIPVU procedure with the inter-annotator reliability of κ > 0.8.



Shared Task on VUA Metaphor Dataset

The dataset used in this shared task is the VU Amsterdam Metaphor Corpus. We provide a script to parse VUAMC.xml, which is not provided here due to licensing concerns. We also provide a set of features used to construct the baseline classification model for prediction of metaphor/non-metaphor classes at the word level, and instructions on how to replicate our baselines. All submissions will be evaluated against these baselines. The metric for comparison will the F-measure.

Please refer to this metaphor shared task page for code and further details and instructions on how to get started.

Training phase: Data is released for developing and training your metaphor detection software. You could do cross-validation on the training data, or partition the training data further to have a held-out set for preliminary evaluations, and/or set apart a subset data for development/tuning of hyper-parameters. However the released data is used, the goal is to have N final systems (or versions of a system) ready for test when the test data is released.

Testing phase: Test instances are released. Each team generates predictions for the test instances, for up to N models. The submitted predictions are evaluated (by us) against the true labels. Submissions will be anonymized -- only the highest score of all submitted systems per day will be displayed. The metric will be F-measure with Precision and Recall also available via the detailed results link.

NOTE: Your submission must be named "answer.txt" and compressed into "" for the evaluation to work.

We plan to have N=12 (up to 12 submissions per team) — this way, a number of versions of your system can be evaluated, without overwhelming the competition. You can try as many submissions as are allowed by CodaLab (1000), but only the last 12 submissions will be valid for comparative evaluation purposes with other systems participating in the shared task. For generating a task summary paper, we require that each submission provide the mandatory team name/individual, method name and method description. Specifically, please include any 3rd party datasets you train your system on (if any) in the method description textbox.

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the Figurative Language Processing-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

Training and testing data will be made available and persist on github after the competition to encourage research in metaphor detection. You agree not to redistribute the these data except in the manner prescribed by its licence.

Verbs Testing

Start: Feb. 12, 2018, midnight

Description: Verbs Testing

All POS Testing

Start: Feb. 12, 2018, midnight

Description: All POS Testing

Competition Ends

March 8, 2018, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In