Note: The official results are now available here.
This is the website of the SemEval-2019 task on fact checking in community question answering forums. The task features two subtasks.
Subtask A: decide whether a question asks for a factual information, an opinion/advice or is just socializing.
Subtask B: decide whether an answer to a factual question is true, false, or does not constitute a proper answer.
To illustrate the problem, we are giving an example from the "Qatar Living" forum:
Q: I have heard its not possible to extend visit visa more than 6 months? Can U please answer me.. Thankzzz...
answer 1: Maximum period is 9 Months....
answer 2: 6 months maximum
answer 3: This has been answered in QL so many times. Please do search for information regarding this. BTW answer is 6 months.
According to SemEval-2016 Task 3, all three answers would be good since they are formally answering the question. Nevertheless, a1 contains false information, while a2 and a3 are correct, as can be established from an official government website.
The current task aims at solving the problem of detecting true factual information in online forums.
In order to be able to detect whether a fact statement is true or false, first it needs to be identified whether the statement is actually factual. Therefore the goal of this subtask is to classify questions from the forum in three categories:
Given that we have been asked for factual statements (detected in Subtask A), it is now the goal to predict whether the answers are actually factual and whether the fact is true or not. The goal of this task is, given the question is asking about a fact, to classify its answers in the following categories:
Participants may use external sources of information to perform classification for both Subtask A and Subtask B:
Example uses of such sources are described in https://arxiv.org/pdf/1803.03178.pdf.
For any questions related to the task, please contact the organizers: firstname.lastname@example.org
Feel free join the Google group for task-related news and discussions: email@example.com
Both subtasks correspond to three-way classification problems.
Submissions will be scored based on Accuracy, macro-F1 and AvgRec. Furthermore, Subtask B will also be scored based on mean average precision (MAP), where the "Factual - True" instances are considered to be positive, and the remaining instances to be negative.
The official metric for both subtasks is Accuracy.
Start: Aug. 20, 2018, midnight
Description: A small set of trial data is provided to test the scripts, evaluation on CodaLab, etc. Submissions will be scored based on the trial data.
Start: Oct. 1, 2018, midnight
Description: The training and development sets are available. The submissions are scored on the dev data.
Start: Jan. 20, 2019, 11:59 p.m.
Description: The test set is now available and the submissions are scored on the test data.
Start: Feb. 2, 2019, noon
Description: The evaluation phase is closed. The official results are announced.
You must be logged in to participate in competitions.Sign In