Multi-FC

Organized by lucaschaves - Current server time: March 30, 2025, 7:49 p.m. UTC

Current

Test
Aug. 29, 2019, 6:53 p.m. UTC

End

Competition Ends
Never

MultiFC

Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

The MultiFC is the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 English fact-checking websites, paired with textual sources and rich metadata, and labeled for veracity by human expert journalists. In the figure below you can see one example of a claim instance. Entities are obtained via entity linking. Article and outlink texts, evidence search snippets and pages are not shown.

example

References: 
Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, and Jakob Grue Simonsen. 2019. MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. In EMNLP. Association for Computational Linguistics.

https://copenlu.github.io/publication/2019_emnlp_augenstein/

MultiFC: Evaluation

The problem is a multiclass classification problem. Each sample (a claim) contains the context in which they occurred, evidence pages and rich metadata. You must predict the Claim veracity. Labels include both straight-forward ratings of veracity (‘correct’, ‘incorrect’), but also labels that would be more difficult to map onto a linear veracity scale (e.g. ‘grassroots movement!’,‘misattributed’, ‘not the whole story’).  
You are given for training data and development data containing labels. You must train a model which predicts the label for the test.tsv file.  
To prepare your submission, remember to use  make sure that the predictions on the test.predict file is in the same order as in test.tsv. Each line of the test.predict should be the label as a string (e.g. correct).  

This is the process:

  • Generating predictions: We provide you with labeled training data and development validation and unlabeled test data. Make predictions for both datasets. However, you will receive feedback on your performance only for the test data.
  • Submitting your predictions:  Generate the test.predict file and submit it using a .zip file. Make sure that each line is a label and that each label corresponds to the same line in the test.tsv file.

This competition only allows you to submit the prediction results (no code).:

The submissions are evaluated using the F1_score metric with the two options 'micro' and 'macro'.

MultiFC: Rules

You may submit a maximum of 5 submissions every day and 50 in total.

MultiFC: Organizers

Organizers of the task:

Lucas Chaves Lima [CodaLab competition organizer] 
lcl@di.ku.dk
University of Copenhagen

Isabelle Augenstein [Lead author of EMNLP 2019 paper]
augenstein@di.ku.dk
University of Copenhagen

Christina Lioma
c.lioma@di.ku.dk
University of Copenhagen

Dongsheng Wang
wang@di.ku.dk
University of Copenhagen

Casper Hansen
c.hansen@di.ku.dk
University of Copenhagen

Christian Hansen
chrh@di.ku.dk
University of Copenhagen

Jakob Grue Simonsen
simonsen@di.ku.dk
University of Copenhagen

Test

Start: Aug. 29, 2019, 6:53 p.m.

Description: Test phase: create models and submit the results on the test data; feed-back are provided on the test set only.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 igw212 3.0000
2 wabywang 5.0000
3 sr5387 4.0000