> Is there any baseline system?

Hi,
I have tried a few days to reproduce the models mentioned in the paper "MultiFC : A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims", including "claim-only" and "crawled ranked" model, but I could not match the performances mentioned in the paper, there is always a gap of about 4-5 points. So I wonder is there any code or baseline system available? For example, data preprocessing, models or training code.

Posted by: cococold @ Oct. 28, 2019, 2:38 p.m.

Hi,
Do you mind open source your evaluation scripts. The results drop about 10 points from valid set to test set, I suspect that my evaluation scripts maybe not correct.

Posted by: cococold @ Oct. 29, 2019, 2:33 p.m.

Unfortunately, we currently don’t have the bandwidth to wrap this up in a way that it would be well-documented enough to release it. However, you might find the code for the multi-task learning model we extend useful: https://github.com/coastalcph/mtl-disparate

Posted by: lucaschaves @ Oct. 29, 2019, 7:05 p.m.

All right. But I still hope that you can publish the evaluation script. My result on the verification set has reached 60% of the micao_f1 value, but it shows only 40% after submission on test set.

Posted by: cococold @ Oct. 30, 2019, 12:58 a.m.

Hi cococold,

Is your work reproducing the baseline system publicly available?

Posted by: jeswan @ March 24, 2020, 7:26 p.m.

Due to the control during the epidemic, I cannot connect to our private server where we store the project. We will public our work as soon as possible.

Posted by: cococold @ March 25, 2020, 2:15 a.m.

Hi cococold,

Any luck with accessing your server? Planning to try and reproduce these results manually and it would be a great help to reference your work! Thanks!

Posted by: jeswan @ April 1, 2020, 2:16 a.m.

Hi, are you still planning to publish your work? I have a few questions about details of the baseline system that are not answered by the paper so I'd be very curious to see the implementation.

Posted by: wangrat @ March 6, 2022, 7:54 p.m.

Hi

Could it be that the final results for micro-F1 and macro-F1 are calculated differently than in the paper?
Because in the paper, I suppose the full results on micro and macro F1 are calculated by taking the mean of the micro - and macro F1 values of all domains,
but when I submit my predictions for the pomt-domain (labels for other domain are set on a label that only pomt uses (e.g. full flop)). Then I supose
I would get results below 1/26 for both micro and macro F1, but I got 0.101403609281 micro-F1, which I think it's stange.
Could You please give more information on how the micro -and macro F1 is calculated on Codalab and how to reformat the results to the method used in the paper.

Thanks in advance!
Marlon

Posted by: MarlonSa @ April 21, 2022, 12:20 p.m.
Post in this thread