Note: This is an archival page for results submitted prior to Sept 2022. For the current FEVER competition page go to https://codalab.lisn.upsaclay.fr/competitions/7308
Participants will be invited to develop systems to identify evidence and reason about truthfulness of a given claim that we have generated. Our dataset currently contains 200,000 true and false claims. The true claims are written by humans annotators extracting information from Wikipedia.
The purpose of the FEVER challenge is to evaluate the ability of a system to verify information using evidence from Wikipedia.
Find out more about the challenge on our website http://fever.ai and submit system descriptions to Softconf
All deadlines are calculated at 11:59pm Pacific Daylight Savings Time (UTC -7h).
Our scoring considers classification accuracy and evidence recall.
For a detailed description of the data annotation process and baseline results see the paper
The data will be distributed in JSONL format with one example per line (see http://jsonlines.org/ for more details).
In addition to the task-specific dataset, the full set of Wikipedia pages (segmented at the sentence level) will be distributed on the data tab or on our website https://sheffieldnlp.github.io/fever.
The training and development data will contain 4 fields:
id
: The ID of the claimlabel
: The annotated label for the claim. Can be one of SUPPORTS|REFUTES|NOT ENOUGH INFO
.claim
: The text of the claim.evidence
: A list of evidence sets (lists of [Annotation ID, Evidence ID, Wikipedia URL, sentence ID]
tuples) or a [Annotation ID, Evidence ID, null, null]
tuple if the label is NOT ENOUGH INFO
.Below are examples of the data structures for each of the three labels.
{
"id": 62037,
"label": "SUPPORTS",
"claim": "Oliver Reed was a film actor.",
"evidence": [
[
[<annotation_id>, <evidence_id>, "Oliver_Reed", 0]
],
[
[<annotation_id>, <evidence_id>, "Oliver_Reed", 3],
[<annotation_id>, <evidence_id>, "Gladiator_-LRB-2000_film-RRB-", 0]
],
[
[<annotation_id>, <evidence_id>, "Oliver_Reed", 2],
[<annotation_id>, <evidence_id>, "Castaway_-LRB-film-RRB-", 0]
],
[
[<annotation_id>, <evidence_id>, "Oliver_Reed", 1]
],
[
[<annotation_id>, <evidence_id>, "Oliver_Reed", 6]
]
]
}
{
"id": 78526,
"label": "REFUTES",
"claim": "Lorelai Gilmore's father is named Robert.",
"evidence": [
[
[<annotation_id>, <evidence_id>, "Lorelai_Gilmore", 3]
]
]
}
{
"id": 137637,
"label": "NOT ENOUGH INFO",
"claim": "Henri Christophe is recognized for building a palace in Milot.",
"evidence": [
[
[<annotation_id>, <evidence_id>, null, null]
]
]
}
The test data will follow the same format as the training/development examples, with the label and evidence fields removed.
{
"id": 78526,
"claim": "Lorelai Gilmore's father is named Robert."
}
{
"id": 78526,
"predicted_label": "REFUTES",
"predicted_evidence": [
["Lorelai_Gilmore", 3]
]
}
In the first instance, submissions will be measured on the correctness of the label assignment for claims. If a claim is labelled as supported or refuted, additionally the evidences will be checked against the list of annotated evidences. The label will be considered as correct if atleast one provided evidence matches the annotated evidences.
At a later date (before the workshop) we will manually assess evidence marked as false-positive and release updated scores and an update to the corpus.
@inproceedings{Thorne18Fever, author = {Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit}, title = {{FEVER}: a Large-scale Dataset for Fact Extraction and VERification}, booktitle = {NAACL-HLT}, year = {2018} }
Start: April 3, 2018, midnight
Start: July 24, 2018, midnight
Start: July 28, 2018, 7 a.m.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | __Dani__ | 0.3965 |
2 | mitchell.dehaven | 0.4005 |
3 | krishnamrith12 | 0.4003 |