SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge

Organized by simono - Current server time: Jan. 16, 2018, 11:42 a.m. UTC

Current

Post-Evaluation Phase
Jan. 30, 2018, midnight UTC

Next

Post-Evaluation Phase
Jan. 30, 2018, midnight UTC

News!

The evaluation phase started on January 8 and ends on January 29. The official test data set has been released. To find out more and take part part in the competition, click on "participate"!

This is the official webpage and CodaLab competition page of the SemEval 2018 Shared Task 11, Machine Comprehension using Commonsense Knowledge. In this task, systems will be presented with narrative texts about everyday activities and are required to answer multiple-choice questions based on this text.

This task assesses how the inclusion of commonsense knowledge in the form of script knowledge would benefit machine comprehension systems. Script knowledge is defined as the knowledge about everyday activities, i.e. sequences of events describing stereotypical human activities (also called scenarios), for example baking a cake, taking a bus, etc. In addition to what is mentioned in the text, a substantial number of questions require inference using script knowledge about different scenarios, i.e. answering the questions requires knowledge beyond the facts mentioned in the text. 

Each question is associated with a set of two answers. Answers are short and limited to a few words. The texts used in this task cover more than 100 everyday scenarios, hence include a wide variety of human activities.

Consider the following reading text from the planting a tree scenario...

My backyard was looking a little empty, so I decided I would plant something. I went out and bought tree seeds. I found a spot in my yard that looked like it would get enough sunshine. There, I dug a hole for the seeds. Once that was done, I took my watering can and watered the seeds.

 

... and the following questions on the text.

 

A. Why was the tree planted in that spot?
  1. to get enough sunshine
  2. there was no other space
B. What was used to dig the hole?
  1. a shovel
  2. their bare hands
C. Who took the watering can?
  1. the grandmother
  2. the gardener

 

While for question A, it is easy to find the correct answer ("to get enough sunshine") from the text, questions B and C are more complicated to answer. For a person, it is clear that the most plausible answers are "a shovel" and "the gardener", although both are not explicitly mentioned in the texts. Participating systems should be able to answer such questions using common sense knowledge or, more specifically, script knowledge. We encourage participants to make use of existing resources for script knowledge, such as DeScript, RKP, or OMCS; script knowledge representations such as narrative chainsevent embeddings or event paraphrase sets; and other knowledge sources such as Wikipedia. We do not put constraints on the form of common sense knowledge that is used, i.e. participants are encouraged to use any external resources that could improve their systems.

 

Preliminary Timeline

 Aug 14, 2017  Trial Data Release, Practise Phase starts 
 Sep 25, 2017  Training and Development Data Release 
 Jan 8, 2018  Test Data Release, Evaluation Phase starts 
 Jan 29, 2018    End of Evaluation Phase 

 

Contact

Google Group/Mailing List

To stay up-to-date, please join our Google group!

 

Organizers

Simon Ostermann [web|mail: simono (at) coli.uni-sb.de]

Ashutosh Modi [web|mail: ashutosh (at) coli.uni-sb.de]

Michael Roth [web|mail: mroth (at) coli.uni-sb.de]

Stefan Thater [web|mail: stth (at) coli.uni-sb.de]

Manfred Pinkal [web|mail: pinkal (at) coli.uni-sb.de]

Evaluation

In our evaluation, we measure how well a system is capable of correctly answering questions that may involve commonsense knowledge. As evaluation metric, we use accuracy, calculated as the ratio between correctly answered questions and all questions in our evaluation data. Additional studies will be performed at a later point in order to assess system performance with regard to specific question types and based on whether a question is directly answerable, or only inferable from the text. As such, the evaluation will provide an approximation for how well systems are able to take into account commonsense knowledge.

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its licence.

Practice Phase

Start: Aug. 14, 2017, midnight

Description: Practice Phase for Task 11. Submitted results are compared against the dev data and can be submitted to the leaderboard.

Evaluation Phase

Start: Jan. 8, 2018, midnight

Description: Evaluation Phase for Task 11. Submitted results are compared against the test data. The leaderboard is hidden.

Post-Evaluation Phase

Start: Jan. 30, 2018, midnight

Description: Post-Evaluation Phase for Task 11. Submitted results are compared against the test data and can be submitted to the leaderboard.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In

Top Three

Rank Username Score
1 jogonba2 1.0000
2 ahashi_syuu 1.0000
3 USDU 1.0000