I have 2 questions about the final evaluation procedure.
1. Do we need to submit exact 215,307?
2. Will the final evaluation base on the remaining 50% pairs (the part not use to evaluate in phase 1) or 100% pairs?
1. Yes, you make the same submission file as before.
2. The evaluation will be based on a new hold-out set (50%).
can you further clarify a bit more about the number of submitting pairs?
It's said in the description that "If less than 215,307 pairs are provided, we will calculate the F1 measure only for these records and won't append the rest of the submission with incorrect pairs just because they are missing". Does that mean we can submit less than 215,307 pairs?
Yes, you can submit less than that number. In this case we will calculate TPR, FPR, FNR, FPR using that data and update the leaderboard.Posted by: spirinus @ Sept. 30, 2016, 5 p.m.
I am also concerning about this issue. In the web page, it clearly stated that:
For both Phase 1 and Phase 2 the participants have to submit 215,307 matching pairs
it is clear to me that we can submit more or less than that number for validation, but in the final submission, all participants have to submit 215,307 pairs. Is it right?Posted by: namkhanhtran @ Oct. 2, 2016, 8:37 a.m.
Let me explain how it works exactly. There is a large file with 215,307 ground truth labels. We partition it into two parts 50% (+ 1 line) each. The first part is used for Phase 1 and the second for Phase 2. When you submit a file with your predictions (it could be any number of lines but we tell you that there are 215,307 total known labels and you can use this information as you wish), we compute precision, recall, F1 against the corresponding test1/test2 collection and update the leaderboard.Posted by: spirinus @ Oct. 2, 2016, 3:17 p.m.