SemEval-2018 Task 12 - The Argument Reasoning Comprehension Task

Organized by ivan.habernal - Current server time: May 23, 2019, 6:48 p.m. UTC


Test (evaluation on test data)
Jan. 8, 2018, midnight UTC


Post-Competition (evaluation on test data)
Jan. 31, 2018, 10 a.m. UTC


Competition Ends


This is the CodaLab Competition for SemEval-2018 Task 12: The Argument Reasoning Comprehension Task.


Given an argument consisting of a claim and a reason, the goal is to select the correct warrant that explains reasoning of this particular argument. There are only two options given and only one answer is correct.


Topic: There She Is, Miss America

Additional info: In 1968, feminists gathered in Atlantic City to protest the Miss America pageant, calling it racist and sexist. Is this beauty contest bad for women?)

Argument: Miss America gives honors and education scholarships. And since ..., Miss America is good for women.

  • a) scholarships would give women a chance to study
  • b) scholarships would take women from the home

Only (a) fills the gap in this argument; (b) would in fact lead to the opposite claim (such that Miss America is not good for women)


Reasoning is a crucial part of natural language argumentation. In order to comprehend an argument, one has to reconstruct and analyze its reasoning. As arguments are highly contextualized, most reasoning-related content is left implicit and usually presupposed. Thus, argument comprehension requires not only language understanding and logic skills, but it also heavily depends on common sense. We define a new task, argument reasoning comprehension. Given a natural language argument with a reason and a claim, the goal is to choose the correct implicit reasoning from two options. The challenging factor is that both options are plausible and lexically very close while leading to contradicting claims. We created a new freely licensed dataset based on authentic arguments from news comments.

Additional information


  • Ivan Habernal, UKP TU-Darmstadt:
  • Henning Wachsmuth, Webis, Bauhaus-Universität Weimar
  • Iryna Gurevych, UKP TU-Darmstadt
  • Benno Stein, Webis, Bauhaus-Universität Weimar

Evaluation criteria

Systems participating in the task must classify all instances in the test data set.

The classification results for must be submitted in a delimited text file namedanswer.txt. Each line consists of two fields separated by horizontal whitespace (a single tab or space character). The first field is the instance ID. The second field is either 0 (which means thatwarrant0is the predicted answer) or 1 (which means thatwarrant1is the predicted answer).

Sample data, result files, and scorer are available on GitHub.

To submit the results, placeanswer.txtin a ZIP file (in the top-level directory), and then upload it to CodaLab according to the instructions at Participating in a Competition.

Systems will be scored using accuracy (accuracy = correct predictions / all instances).

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the SemEval-2018 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to redistribute the test data except in the manner prescribed by its licence.

Trial (evaluation on dev data)

Start: June 1, 2017, midnight

Test (evaluation on test data)

Start: Jan. 8, 2018, midnight

Post-Competition (evaluation on test data)

Start: Jan. 31, 2018, 10 a.m.

Competition Ends


You must be logged in to participate in competitions.

Sign In