CodaLab - Competition

RumourEval 2018-2019

Organized by ggorrell - Current server time: April 1, 2025, 7:46 a.m. UTC

Evaluation

Jan. 10, 2019, midnight UTC

Current

Post-Evaluation

Feb. 1, 2019, midnight UTC

End

Competition Ends

Never

Overview
Evaluation
Terms and Conditions

Welcome to RumourEval 2018-2019!

The core mission is to automatically determine the veracity of rumours. The task falls into two parts; task A, in which responses to a rumourous post are classified according to stance, and task B, in which the statements themselves are classified for veracity. Each is described in more detail below.

Latest news! The competition ran successfully in winter 2019. Data are now freely available at https://figshare.com/articles/RumourEval_2019_data/8845580

This site has been restored from an old backup following a Codalab system crash, so is out of date. The paper is the best reference for this competition: https://www.aclweb.org/anthology/S19-2147

The competition is not running in 2019-2020.

The Google group for the task is still extant.

Task A (SDQC)

Related to the objective of predicting a rumour's veracity, the first subtask will deal with the complementary objective of tracking how other sources orient to the accuracy of the rumourous story. A key step in the analysis of the surrounding discourse is to determine how other users in social media regard the rumour. We propose to tackle this analysis by looking at the replies to the post that presented the rumourous statement, i.e. the originating rumourous (source) post. We will provide participants with a tree-structured conversation formed of posts replying to the originating rumourous post, where each post presents its own type of support with regard to the rumour. We frame this in terms of supporting, denying, querying or commenting on (SDQC) the claim. Therefore, we introduce a subtask where the goal is to label the type of interaction between a given statement (rumourous post) and a reply post (the latter can be either direct or nested replies). Each tweet in the tree-structured thread will have to be categorised into one of the following four categories:

Support: the author of the response supports the veracity of the rumour they are responding to.
Deny: the author of the response denies the veracity of the rumour they are responding to.
Query: the author of the response asks for additional evidence in relation to the veracity of the rumour they are responding to.
Comment: the author of the response makes their own comment without a clear contribution to assessing the veracity of the rumour they are responding to.

Task B (verification)

The goal of the second subtask is to predict the veracity of a given rumour. The rumour is presented as a post reporting or querying a claim but deemed unsubstantiated at the time of release. Given such a claim, and a set of other resources provided, systems should return a label describing the anticipated veracity of the rumour as true or false. The ground truth of this task is manually established by journalist and expert members of the team who identify official statements or other trustworthy sources of evidence that resolve the veracity of the given rumour. Additional context will be provided as input to veracity prediction systems; this context will consist of snapshots of relevant sources retrieved immediately before the rumour was reported, including a snapshot of an associated Wikipedia article, a Wikipedia dump, news articles from digital news outlets retrieved from NewsDiffs, as well as preceding tweets from the same event. Critically, no external resources may be used that contain information from after the rumour's resolution. To control this, we will specify precise versions of external information that participants may use. This is important to make sure we introduce time sensitivity into the task of veracity prediction. We take a simple approach to this task, using only true/false labels for rumours. In practice, however, many claims are hard to verify; for example, there were many rumours concerning Vladimir Putin's activities in early 2015, many wholly unsubstantiable. Therefore, we also expect systems to return a confidence value in the range of 0-1 for each rumour; if the rumour is unverifiable, a confidence of 0 should be returned.

Evaluation Criteria

A submission should be a JSON format file, called "answer.json", consisting of two fields, one for task A and one for task B, like so:

{
    "subtaskaenglish": {
	"tweetid1": "comment",
	"tweetid2": "query",
	"tweetid3": "support",
	"redditid1": "deny"
    }
    "subtaskbenglish": {
        "tweetthreadid1": ["true",1.0]
        "redditthreadid2": ["false",0.0]
    }
    "subtaskadanish": {
	"tweetid4": "comment",
	"tweetid5": "query",
	"tweetid6": "comment",
	"redditid2": "deny"
    }
    "subtaskbdanish": {
        "tweetthreadid3": ["false",1.0]
        "redditthreadid4": ["false",0.0]
    }
    "subtaskarussian": {
	"tweetid7": "comment",
	"tweetid8": "comment",
	"tweetid9": "support",
	"redditid3": "comment"
    }
    "subtaskbrussian": {
        "tweetthreadid5": ["true",1.0]
        "redditthreadid6": ["true",0.0]
    }
}

E.g. a Twitter tweet/thread ID might be something like 514957228327907328. A Reddit post/thread ID might be something like dbmdk4o.

Subtask A (SDQC) takes one element per comment. Comments should be classified into four categories; support, deny, query and comment. Performance is evaluated using accuracy. Although training threads for task B (verification) come with three labels, true, false and unverified, you should classify into two classes; true and false. Again, accuracy will be calculated. Classes are followed by a confidence score, which will be used to calculate an RMSE, in order to give a more nuanced view of performance on task B. Unverified items should receive a confidence score of zero.

You will need to zip up your answer file to submit it. It should be at the top level of the archive.

If you are completing only one subtask, or not all languages, you can omit or leave empty the other fields.

Terms and Conditions

Use of the data indicates acceptance of the Twitter and Reddit terms of service.

Practice

Start: Aug. 6, 2018, midnight

Evaluation

Start: Jan. 10, 2019, midnight

Post-Evaluation

Start: Feb. 1, 2019, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Competition

RumourEval 2018-2019

Previous

Current

End

Welcome to RumourEval 2018-2019!

Task A (SDQC)

Task B (verification)

Evaluation Criteria

Terms and Conditions

Practice

Evaluation

Post-Evaluation

Competition Ends