This is the official webpage and CodaLab competition page of the SemEval 2018 Shared Task 11, Machine Comprehension using Commonsense Knowledge. In this task, systems will be presented with narrative texts about everyday activities and are required to answer multiple-choice questions based on this text.
This task assesses how the inclusion of commonsense knowledge in the form of script knowledge would benefit machine comprehension systems. Script knowledge is defined as the knowledge about everyday activities, i.e. sequences of events describing stereotypical human activities (also called scenarios), for example baking a cake, taking a bus, etc. In addition to what is mentioned in the text, a substantial number of questions require inference using script knowledge about different scenarios, i.e. answering the questions requires knowledge beyond the facts mentioned in the text.
Each question is associated with a set of two answers. Answers are short and limited to a few words. The texts used in this task cover more than 100 everyday scenarios, hence include a wide variety of human activities.
Consider the following reading text from the planting a tree scenario...
... and the following questions on the text.
While for question A, it is easy to find the correct answer ("to get enough sunshine") from the text, questions B and C are more complicated to answer. For a person, it is clear that the most plausible answers are "a shovel" and "the gardener", although both are not explicitly mentioned in the texts. Participating systems should be able to answer such questions using common sense knowledge or, more specifically, script knowledge. We encourage participants to make use of existing resources for script knowledge, such as DeScript, RKP, or OMCS; script knowledge representations such as narrative chains, event embeddings or event paraphrase sets; and other knowledge sources such as Wikipedia. We do not put constraints on the form of common sense knowledge that is used, i.e. participants are encouraged to use any external resources that could improve their systems.
|Aug 14, 2017||Trial Data Release, Practise Phase starts|
|Sep 25, 2017||Training and Development Data Release|
|Jan 8, 2018||Test Data Release, Evaluation Phase starts|
|Jan 29, 2018||End of Evaluation Phase|
To stay up-to-date, please join our Google group!
Simon Ostermann [web|mail: simono (at) coli.uni-sb.de]
Ashutosh Modi [web|mail: ashutosh (at) coli.uni-sb.de]
Michael Roth [web|mail: mroth (at) coli.uni-sb.de]
Stefan Thater [web|mail: stth (at) coli.uni-sb.de]
Manfred Pinkal [web|mail: pinkal (at) coli.uni-sb.de]
In our evaluation, we measure how well a system is capable of correctly answering questions that may involve commonsense knowledge. As evaluation metric, we use accuracy, calculated as the ratio between correctly answered questions and all questions in our evaluation data. Additional studies will be performed at a later point in order to assess system performance with regard to specific question types and based on whether a question is directly answerable, or only inferable from the text. As such, the evaluation will provide an approximation for how well systems are able to take into account commonsense knowledge.
By submitting results to this competition, you consent to the public release of your scores at the SemEval-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
You agree not to redistribute the test data except in the manner prescribed by its licence.
Start: Aug. 14, 2017, midnight
Description: Practice Phase for Task 11. Submitted results are compared against the dev data and can be submitted to the leaderboard.
Start: Jan. 8, 2018, midnight
Description: Evaluation Phase for Task 11. Submitted results are compared against the test data. The leaderboard is now public.
Start: Jan. 30, 2018, midnight
Description: Post-Evaluation Phase for Task 11. Submitted results are compared against the test data and can be submitted to the leaderboard.
You must be logged in to participate in competitions.Sign In