SemEval 2019 Task 8: Fact Checking in Community Question Answering Forums

Organized by tsvm - Current server time: Nov. 16, 2018, 12:14 p.m. UTC

Previous

Trial
Aug. 20, 2018, midnight UTC

Current

Practice
Oct. 1, 2018, midnight UTC

Next

Evaluation
Jan. 10, 2019, midnight UTC

SemEval 2019 Task 8 on Fact-Checking in Community Forums

This is the website of the SemEval-2019 task on fact checking in community question answering forums. The task features two subtasks.

Subtask A: decide whether a question asks for a factual information, an opinion/advice or is just socializing. Subtask B: decide whether an answer to a factual question is true, false, or does not constitute a proper answer.

To illustrate the problem, we are giving an example from the "Qatar Living" forum:

=======================================================================

  • Q: I have heard its not possible to extend visit visa more than 6 months? Can U please answer me.. Thankzzz...
  • a1: Maximum period is 9 Months....
  • a2: 6 months maximum
  • a3: This has been answered in QL so many times. Please do search for information regarding this. BTW answer is 6 months.

=======================================================================

According to SemEval-2016 Task 3, all three answers would be good since they are formally answering the question. Nevertheless, a1 contains false information, while a2 and a3 are correct, as can be established from an official government website.

The current task aims at solving the problem of detecting true factual information in online forums.

 

Subtask A: Question Classification

In order to be able to detect whether a fact statement is true or false, first it needs to be identified whether the statement is actually factual. Therefore the goal of this subtask is to classify questions from the forum in three categories:

  • Factual: The question is asking for factual information, which can be answered by checking various information sources, and it is not ambiguous (e.g., "What is Ooredoo customer service number?").
  • Opinion: The question asks for an opinion or an advice, not for a fact. (e.g., "Can anyone recommend a good Vet in Doha?"")
  • Socializing: Not a real question, but intended for socializing or for chatting. This can also mean expressing an opinion or sharing some information, without really asking anything of general interest. (e.g., "What was your first car?"")

 

Subtask B: Answer Classification

Given that we have been asked for factual statements (detected in Subtask A), it is now the goal to predict whether the answers are actually factual and whether the fact is true or not. The goal of this task is, given the question is asking about a fact, to classify its answers in the following categories:

  • Factual - TRUE: The answer is True and can be proven with an external resource. (Q: "I wanted to know if there were any specific shots and vaccinations I should get before coming over [to Doha]."; A: "Yes there are; though it varies depending on which country you come from. In the UK; the doctor has a list of all countries and the vaccinations needed for each.").
  • Factual - FALSE: The answer gives a factual response, but it is False, it is partially false or the responder is not giving a certain response (i.e. is unsure about) (Q: "Can I bring my pitbulls to Qatar?"; A: "Yes you can bring it but be careful this kind of dog is very dangerous.").
  • Non-Factual: When the answer does not provide factual information to the question; it can be an opinion or an advice that cannot be verified. (e.g., "Its better to buy a new one.").

 Usage of external information:

Participants may use external sources of information to perform classification for both Subtask A and Subtask B:

  •  Intra-forum evidence: from the QatarLiving forum itself. Old threads in the forum may contain enough information to estimate the factuality of the answers in Subtask B. You can download an archive with QL threads from here: http://alt.qcri.org/semeval2016/task3/data/uploads/QL-unannotated-data-subtaskA.xml.zip
  • Web information: The use of all sources of web information is allowed as participants seem fit. For example, to perform factuality classification for Subtask B, participants may query a search engine to fetch relevant documents from the Internet.

Example uses of such sources are described in https://arxiv.org/pdf/1803.03178.pdf.

Contacts

For any questions related to the task, please contact the organizers: semeval-2019-task-8-organizers@googlegroups.com

Feel free join the Google group for task-related news and discussions: semeval-2019-task-8

Evaluation

Both subtasks correspond to three-way classification problems.

Submissions will be scored based on Accuracy, macro-F1 and AvgRec. Furthermore, Subtask B will also be scored based on mean average precision (MAP), where the "Factual - True" instances are considered to be positive, and the remaining instances to be negative.

The official metric for both subtasks is Accuracy.

Terms and Conditions

...

Trial

Start: Aug. 20, 2018, midnight

Description: A small set of trial data is provided to test the scripts, evaluation on CodaLab, etc. Submissions will be scored based on the trial data.

Practice

Start: Oct. 1, 2018, midnight

Description: The training and development sets are available. The submissions are scored on the dev data.

Evaluation

Start: Jan. 10, 2019, midnight

Description: The test set is now available and the submissions are scored on the test data.

Post-Evaluation

Start: Feb. 1, 2019, midnight

Description: The evaluation phase is closed.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 tsvm 0.51