The Second Shared Task on Metaphor Detection

Organized by cleong - Current server time: July 11, 2020, 5:27 p.m. UTC

First phase

VUA Verbs
Jan. 12, 2020, midnight UTC

End

Competition Ends
April 17, 2020, midnight UTC

This is an overview page of the metaphor shared task section in the Second Workshop on Figurative Language Processing, co-located with ACL 2020.

Metaphor Detection

Metaphors is a form of figurative language used pervasively in our everyday lives. Consider the following examples:

... it would change the trajectory of your legal career ...

... Washington and the media just explodes on you, you just don’t know where you are at the moment ...

... those statements are deeply concerning ....

... Out of the abundance of the heart, the mouth speaks, and the hand tweets ... 

... the fake news were personalized and delivered above the radar and beyond the radar ...

 

The goal in this shared task is to detect, at word level, all content-word metaphors in a given text (we will also have a separate evaluation for just the verbs, as many researchers are working specifically on verb metaphors). We will use two datasets: (1) a subset of  ETS Corpus of Non-Native Written English, which contains essays written by test-takers for the TOEFL test and was annotated for argumentation relevant metaphors in [Beata Beigman Klebanov, Chee Wee Leong, and Michael Flor. 2018. A Corpus of Non-Native Written English Annotated for Metaphor. NAACL]; (2) VU Amsterdam Metaphor Corpus (VUA) dataset (also used in the 2018 shared task), which consists of text fragments sampled across four genres from the British National Corpus (BNC): Academic, News, Conversation, and Fiction. The data is annotated according to the MIPVU procedure as described in [Gerard Steen, Aletta Dorst, Berenike Herrmann, Anna Kaal, Tina Krennmayr, and Trijntje Pasma. 2010. A Method for Linguistic Metaphor Identification. Amsterdam: John Benjamins.]

 

Important Dates

  • January 12, 2020: 1st CFP for the shared task; CodaLab competition is open; training data and auxiliary scripts can be downloaded
  • February 12 14, 2020: 2nd CFP for the shared task; test data can be downloaded and results submitted; performance will be tracked on CodaLab dashboard
  • March 22 April 11 April 16 11:59pm, 2020: Last day for submitting predictions on test data
  • April 18 April 23, 2020: Papers describing the systems are due for paper submission information 
  • May 8 May 15, 2020: Notifications for papers are sent
  • May 18 May 25, 2020: Camera Ready version of papers
  • July 9/10, 2020: Workshop

 

Contact Info and Emails:
  • Ben Leong, Educational Testing Service; cleong [AT] ets [DOT] org
  • Beata Beigman Klebanov, Educational Testing Service, bbeigmanklebanov [AT] ets [DOT] org 
  • Rutuja Ubale, Educational Testing Service, rubale [AT] ets [DOT] org
  • Chris Hamill, Educational Testing Service, chamill [AT] ets [DOT] org

Shared Task on TOEFL Datasets and VUA Metaphor

As mentioned, there will be two datasets available for this shared task. You can elect to participate in the shared task using one or/and two of the datasets. Within each dataset, you can also elect to participate using the AllPOS or/and VERBS evaluations.

For the TOEFL essays dataset, we will provide a Data License Agreement for all interested participants to gain access to the 240 texts (180 in the training partition, 60 in the testing partition). Please download the agreement here, fill it out, and send it to Chris Hamill (chamill@ets.org). Chris will contact the individuals and teams who completed and submitted the agreement on how to access the 240 texts on the release dates of the training and testing datasets. Additionally, independent of the texts, we also provide a set of features that can be used "as is" to construct the baseline classification model for prediction of metaphor/non-metaphor classes at the word level, and instructions on how to replicate our baselines. All submissions will be evaluated against these baselines. The metric for comparison will the F1 measure (for class metaphor). Please refer to this TOEFL Github shared task page for code and further details and instructions on how to get started.

For the VUA texts dataset, we provide a script to parse VUAMC.xml, which is not provided here due to licensing concerns. Up until recently, VUAMC.xml was downloadable as http://ota.ahds.ac.uk/headers/2541.xml but right now the site appears unresponsive. We have written to the Amsterdam Metaphor Lab to ask them to fix the problem; once the original site becomes responsive or we learn of an alternative download link, we will post a note here. In the meanwhile, please write to Ben Leong (cleong@ets.org) to obtain the copy that we used for the 2018 shared task. We also provide a set of features used to construct the baseline classification model for prediction of metaphor/non-metaphor classes at the word level, and instructions on how to replicate our baselines. All submissions will be evaluated against these baselines. The metric for comparison will the F1 measure (for class metaphor). Please refer to this VUA Github shared task page for code and further details and instructions on how to get started.

Training phase: Data is released for developing and training your metaphor detection software. You could do cross-validation on the training data, or partition the training data further to have a held-out set for preliminary evaluations, and/or set apart a subset data for development/tuning of hyper-parameters. However the released data is used, the goal is to have N final systems (or versions of a system) ready for test when the test data is released.

Testing phase: Test instances are released. Each team generates predictions for the test instances, for up to N models. The submitted predictions are evaluated (by us) against the true labels. Submissions will be anonymized -- only the highest score of all submitted systems per day will be displayed. The metric will be F-measure with Precision and Recall also available via the detailed results link.

NOTE: Your submission must be named "answer.txt" and compressed into "answer.zip" for the evaluation to work. For more detailed instructions, please refer to this PDF.

We plan to have N=12 (up to 12 submissions per team) — this way, a number of versions of your system can be evaluated, without overwhelming the competition. You can try as many submissions as are allowed by CodaLab (150 per phase for the entire competition), but only the last 12 submissions will be valid for comparative evaluation purposes with other systems participating in the shared task. That means, you need to make sure that you submit the best performing system that was evaluated by CodaLab and include this system in the last 12 submissions for each phase you want to participate in. For generating a task summary paper, we require that each submission provide the mandatory team name/individualmethod name and method description. Specifically, please include any 3rd party datasets you train your system on (if any) in the method description textbox.


Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the Figurative Language Processing-2020 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

Training and testing data will be made available and persist on github after the competition to encourage research in metaphor detection. You agree not to redistribute the these data except in the manner prescribed by its licence.

VUA Verbs

Start: Jan. 12, 2020, midnight

Description: VUA dataset, Verbs only

VUA AllPOS

Start: Jan. 12, 2020, midnight

Description: VUA dataset, All Parts of Speech

TOEFL Verbs

Start: Jan. 12, 2020, midnight

Description: TOEFL dataset, Verbs only

TOEFL AllPOS

Start: Jan. 12, 2020, midnight

Description: TOEFL dataset, All Parts of Speech

Competition Ends

April 17, 2020, midnight

You must be logged in to participate in competitions.

Sign In