SemEval 2020 Task 4 - Commonsense Validation and Explanation

Organized by Shuailong - Current server time: Dec. 5, 2019, 5:39 p.m. UTC


Aug. 15, 2019, midnight UTC


Evaluation - Subtask A
Jan. 10, 2020, midnight UTC


Competition Ends


Welcome to Commonsense Validation and Explanation Challenge!

The task is to directly test whether a system can differentiate natural language statements that make sense from those that do not make sense. We designed three subtasks. The first task is to choose from two natural language statements with similar wordings which one makes sense and which one does not make sense; The second task is to find the key reason from three options why a given statement does not make sense; The third task asks machine to generate the reasons and we use BLEU to evaluate them.

Formally, each instance in our dataset is composed of 10 sentences: {s1, s2, o1, o2, o3, r1, r2, r3}. s1 and s2 are two similar statements which in the same syntactic structure and differ by only a few words, but only one of them makes sense while the other does not. They are used on our first subtask called Validation, which requires the model to identify which one makes sense. For the against-common-sense statement s1 or s2, we have three optional sentences o1, o2 and o3 to explain why the statement does not make sense. Our subtask 2, named Explanation (Multi-Choice), requires that the only one correct reason be identified from two other confusing ones. For the same against-common-sense statement s1 or s2, our subtask 3 naming Explanation (Generation), asks the participants to generate the reason why it does not make sense. The 3 referential reasons r1, r2 and r3 are used for evaluating task 3.


Task A: Validation
Task: Which statement of the two is against common sense?
Statement1: He put a turkey into the fridge.
Statement2: He put an elephant into the fridge.
Task B: Explanation (Multi-Choice)
Task: Select the most corresponding reason why this statement is against common sense.
Statement: He put an elephant into the fridge.
A: An elephant is much bigger than a fridge.
B: Elephants are usually white while fridges are usually white.
C: An elephant cannot eat a fridge.
Task C: Explanation (Generation)
Task: Generate the reason why this statement is against common sense and we will use BELU to evaluate it.
Statement: He put an elephant into the fridge.
Referential Reasons:
1. An elephant is much bigger than a fridge.
2. A fridge is much smaller than an elephant.
3. Most of the fridges aren’t large enough to contain an elephant.


For more detailed information, please refer to this link.

Please contact the task organisers or post on the competition forum if you have any further queries.


Senmaking task consists of 3 subtasks. Participating teams should participate in at-least one of the subtasks. Relevant scripts and datasets are available at: Github

Task A and B are evaluated by accuracy and Task C is evaluated using BLEU. To improve the reliability of the evaluation of Task C, we use a random subset of the test set and will do a human evaluation to further evaluate the systems with relatively high BLEU score.

Submitted systems

  • Teams are allowed to use the development set for training.
  • Teams can use additional resources such as pretrained language models, knowledge bases etc.
  • Only one final submission will be recorded per team. The codalab website will only show an updated submission if results are higher.


  • All data released for this task is done so under the CC BY-SA 4.0 License (licenses could also be found with the data).
  • Organizers of the competition might choose to publicize, analyze and change in any way any content submitted as a part of this task. Wherever appropriate, academic citation for the sending group would be added (e.g. in a paper summarizing the task).

The teams wishing to participate in SemEval 2020 should strictly adhere to the following deadlines.

Task Schedule for SemEval2020

  • Trial data ready July 31, 2019
  • Training data ready September 4, 2019
  • Test data ready December 3, 2019
  • Evaluation start January 10, 2020
  • Evaluation end January 31, 2020
  • Paper submission due February 23, 2020
  • Notification to authors March 29, 2020
  • Camera ready due April 5, 2020
  • SemEval workshop Summer 2020

Competitions should comply with any general rules of SEMEVAL.

The organizers are free to penalized or disqualify for any violation of the above rules or for misuse, unethical behaviour or other behaviours they agree are not accepted in a scientific competition in general and in the specific one at hand.

Please contact the task organisers or post on the competition forum if you have any further queries.

Submission phases

Practice Phase

In this phase, feel free to make yourself familiar with the task, the input data format, the submission data format, and the the submission process.

Evaluation Phase

For formal evaluation phase, train your models on our provided training set, use our dev set if you need, and make prediction on formal test set. You are also welcome to use any external resources or pretrained models. The result will not show on the leaderboard until the end of the evaluation period. To avoid data leakage between subtasks, each subtask has its own phase. Evaluation of subtask A is released first, which is to choose the sensical statement. Subtask C is released after task A, which is to generate the reason why the nonsensical sentence does not make sense. Then Subtask B is released, which is to choose the correct reason out of the three candidate reasons. You are not required to attend each subtask. The evaluation for each subtask will last for 1 week. To evaluate a particular subtask you can just wait for its evlution phase to come.

Submission format

Please refer to Participate -> Files -> Starting Kit for submission file format as well as everything you need to know to make a valid submission.


Start: Aug. 15, 2019, midnight

Description: Practice phase: submit result on trial data and get result for a taste of the data and task

Evaluation - Subtask A

Start: Jan. 10, 2020, midnight

Description: Evaluation phase: train your model on offical training set and you may use official validation set during training. Feel free to use additional resources such as knowledge bases etc. Submit results on official test data and get result for competition. Note that only the final valid submission on CodaLab will be taken as the official submission to the competition.

Evaluation - Subtask C

Start: Jan. 17, 2020, midnight

Evaluation - Subtask B

Start: Jan. 24, 2020, midnight


Start: Jan. 31, 2020, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 ehsantaher 100.0
2 Shuailong 100.0
3 xuanwang 98.9