The core mission is to automatically determine the veracity of rumours. The task falls into two parts; task A, in which responses to a rumourous post are classified according to stance, and task B, in which the statements themselves are classified for veracity. Each is described in more detail below.
Latest news! The competition ran successfully in winter 2019. Data are now freely available at https://figshare.com/articles/RumourEval_2019_data/8845580 This site has been restored from an old backup following a Codalab system crash, so is out of date. The paper is the best reference for this competition: https://www.aclweb.org/anthology/S19-2147 The competition is not running in 2019-2020. The Google group for the task is still extant. |
Related to the objective of predicting a rumour's veracity, the first subtask will deal with the complementary objective of tracking how other sources orient to the accuracy of the rumourous story. A key step in the analysis of the surrounding discourse is to determine how other users in social media regard the rumour. We propose to tackle this analysis by looking at the replies to the post that presented the rumourous statement, i.e. the originating rumourous (source) post. We will provide participants with a tree-structured conversation formed of posts replying to the originating rumourous post, where each post presents its own type of support with regard to the rumour. We frame this in terms of supporting, denying, querying or commenting on (SDQC) the claim. Therefore, we introduce a subtask where the goal is to label the type of interaction between a given statement (rumourous post) and a reply post (the latter can be either direct or nested replies). Each tweet in the tree-structured thread will have to be categorised into one of the following four categories:
The goal of the second subtask is to predict the veracity of a given rumour. The rumour is presented as a post reporting or querying a claim but deemed unsubstantiated at the time of release. Given such a claim, and a set of other resources provided, systems should return a label describing the anticipated veracity of the rumour as true or false. The ground truth of this task is manually established by journalist and expert members of the team who identify official statements or other trustworthy sources of evidence that resolve the veracity of the given rumour. Additional context will be provided as input to veracity prediction systems; this context will consist of snapshots of relevant sources retrieved immediately before the rumour was reported, including a snapshot of an associated Wikipedia article, a Wikipedia dump, news articles from digital news outlets retrieved from NewsDiffs, as well as preceding tweets from the same event. Critically, no external resources may be used that contain information from after the rumour's resolution. To control this, we will specify precise versions of external information that participants may use. This is important to make sure we introduce time sensitivity into the task of veracity prediction. We take a simple approach to this task, using only true/false labels for rumours. In practice, however, many claims are hard to verify; for example, there were many rumours concerning Vladimir Putin's activities in early 2015, many wholly unsubstantiable. Therefore, we also expect systems to return a confidence value in the range of 0-1 for each rumour; if the rumour is unverifiable, a confidence of 0 should be returned.
A submission should be a JSON format file, called "answer.json", consisting of two fields, one for task A and one for task B, like so:
{ "subtaskaenglish": { "tweetid1": "comment", "tweetid2": "query", "tweetid3": "support", "redditid1": "deny" } "subtaskbenglish": { "tweetthreadid1": ["true",1.0] "redditthreadid2": ["false",0.0] } "subtaskadanish": { "tweetid4": "comment", "tweetid5": "query", "tweetid6": "comment", "redditid2": "deny" } "subtaskbdanish": { "tweetthreadid3": ["false",1.0] "redditthreadid4": ["false",0.0] } "subtaskarussian": { "tweetid7": "comment", "tweetid8": "comment", "tweetid9": "support", "redditid3": "comment" } "subtaskbrussian": { "tweetthreadid5": ["true",1.0] "redditthreadid6": ["true",0.0] } }
E.g. a Twitter tweet/thread ID might be something like 514957228327907328. A Reddit post/thread ID might be something like dbmdk4o.
Subtask A (SDQC) takes one element per comment. Comments should be classified into four categories; support, deny, query and comment. Performance is evaluated using accuracy. Although training threads for task B (verification) come with three labels, true, false and unverified, you should classify into two classes; true and false. Again, accuracy will be calculated. Classes are followed by a confidence score, which will be used to calculate an RMSE, in order to give a more nuanced view of performance on task B. Unverified items should receive a confidence score of zero.
You will need to zip up your answer file to submit it. It should be at the top level of the archive.
If you are completing only one subtask, or not all languages, you can omit or leave empty the other fields.
Use of the data indicates acceptance of the Twitter and Reddit terms of service.
Start: Aug. 6, 2018, midnight
Start: Jan. 10, 2019, midnight
Start: Feb. 1, 2019, midnight
Never
You must be logged in to participate in competitions.
Sign In