Exploring how artificial intelligence technologies could be leveraged to combat fake news.
The goal of the Fake News Challenge is to explore how artificial intelligence technologies, particularly machine learning and natural language processing, might be leveraged to combat the fake news problem. We believe that these AI technologies hold promise for significantly automating parts of the procedure human fact checkers use today to determine if a story is real or a hoax.
Assessing the veracity of a news story is a complex and cumbersome task, even for trained experts. Fortunately, the process can be broken down into steps or stages. A helpful first step towards identifying fake news is to understand what other news organizations are saying about the topic. We believe automating this process, called Stance Detection, could serve as a useful building block in an AI-assisted fact-checking pipeline. So stage #1 of the Fake News Challenge (FNC-1)focuses on the task of Stance Detection.
Stance Detection involves estimating the relative perspective (or stance) of two pieces of text relative to a topic, claim or issue. The version of Stance Detection we have selected for FNC-1 extends the work of Ferreira & Vlachos. For FNC-1 we have chosen the task of estimating the stance of a body text from a news article relative to a headline. Specifically, the body text may agree, disagree, discuss or be unrelated to the headline.
For additional details, see http://www.fakenewschallenge.org
For competition related announcements, sign up at https://groups.google.com/forum/#!search/fnc-1-compete
Teams will be evaluated based on a weighted, two-level scoring system:
Level 1: Classify headline and body text as related or unrelated 25% score weighting
Level 2: Classify related pairs as agrees, disagrees, or discusses 75% score weighting
Rationale: The related/unrelated classification task is expected to be much easier and is less relevant for detecting fake news, so it is given less weight in the evaluation metric. The Stance Detection task (classify as agrees, disagrees or discuss) is both more difficult and more relevant to fake news detection, so is to be given much more weight in the evaluation metric
Concretely, if a [HEADLINE, BODY TEXT] pair in the test set has the target label unrelated, a team’s evaluation score will be incremented by 0.25 if it labels the pair as unrelated.
If the [HEADLINE, BODY TEXT] test pair is related, a team’s score will be incremented by 0.25 if it labels the pair as any of the three classes: agrees, disagrees, or discusses.
The team’s evaluation score will so be incremented by an additional 0.75 for each related pair if gets the relationship right by labeling the pair with the single correct class: agrees, disagrees, or discusses.
You must submit your predictions as a zip file containing a file called `submission.csv` (with no directories). This file must have the same structure as the file `train_stances.csv` from the training data. Namely, it should be a CSV file with header, containing three columns `Headline`, `Body ID`, `Stance`. The order of rows must be the same as in the training/test data files.
Below is an excerpt from a correctly formatted submission file for predictions on the training data:
Headline,Body ID,Stance Police find mass graves with at least '15 bodies' near Mexico town where 43 students disappeared after police clash,712,unrelated Hundreds of Palestinians flee floods in Gaza as Israel opens dams,158,agree "Christian Bale passes on role of Steve Jobs, actor reportedly felt he wasn't right for part",137,unrelated HBO and Apple in Talks for $15/Month Apple TV Streaming Service Launching in April,1034,unrelated Spider burrowed through tourist's stomach and up into his chest,1923,disagree 'Nasa Confirms Earth Will Experience 6 Days of Total Darkness in December' Fake News Story Goes Viral,154,agree
To validate the format of your predictions you can use the Development phase of this competition or the scorer from the official GitHub repository.
Start: May 27, 2017, midnight
Description: Check your submission format for evaluation. This phase is simply to get used to the codalab interface and make sure you understand the format of the submissions. Go ahead, make a submission with the training_stances.CSV file (i.e. output is same a training data)
Start: May 31, 2017, 11:59 p.m.
Description: Evaluation on Test Data. This is the real deal, and you will be submitting results on the test data. Each team gets a maximum 6 submissions. The leader board positions are hidden to prevent teams from fitting to the test set. The 6 submissions are simply to ensure each team gets to change their submissions at most 6 five times. Only the latest submission will be considered for the final scoring.
June 2, 2017, 11:59 p.m.
You must be logged in to participate in competitions.Sign In