Hi,
I have tried a few days to reproduce the models mentioned in the paper "MultiFC : A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims", including "claim-only" and "crawled ranked" model, but I could not match the performances mentioned in the paper, there is always a gap of about 4-5 points. So I wonder is there any code or baseline system available? For example, data preprocessing, models or training code.
Hi,
Do you mind open source your evaluation scripts. The results drop about 10 points from valid set to test set, I suspect that my evaluation scripts maybe not correct.
Unfortunately, we currently don’t have the bandwidth to wrap this up in a way that it would be well-documented enough to release it. However, you might find the code for the multi-task learning model we extend useful: https://github.com/coastalcph/mtl-disparate
Posted by: lucaschaves @ Oct. 29, 2019, 7:05 p.m.All right. But I still hope that you can publish the evaluation script. My result on the verification set has reached 60% of the micao_f1 value, but it shows only 40% after submission on test set.
Posted by: cococold @ Oct. 30, 2019, 12:58 a.m.Hi cococold,
Is your work reproducing the baseline system publicly available?
Posted by: jeswan @ March 24, 2020, 7:26 p.m.Due to the control during the epidemic, I cannot connect to our private server where we store the project. We will public our work as soon as possible.
Posted by: cococold @ March 25, 2020, 2:15 a.m.Hi cococold,
Any luck with accessing your server? Planning to try and reproduce these results manually and it would be a great help to reference your work! Thanks!
Posted by: jeswan @ April 1, 2020, 2:16 a.m.Hi, are you still planning to publish your work? I have a few questions about details of the baseline system that are not answered by the paper so I'd be very curious to see the implementation.
Posted by: wangrat @ March 6, 2022, 7:54 p.m.Hi
Could it be that the final results for micro-F1 and macro-F1 are calculated differently than in the paper?
Because in the paper, I suppose the full results on micro and macro F1 are calculated by taking the mean of the micro - and macro F1 values of all domains,
but when I submit my predictions for the pomt-domain (labels for other domain are set on a label that only pomt uses (e.g. full flop)). Then I supose
I would get results below 1/26 for both micro and macro F1, but I got 0.101403609281 micro-F1, which I think it's stange.
Could You please give more information on how the micro -and macro F1 is calculated on Codalab and how to reformat the results to the method used in the paper.
Thanks in advance!
Marlon