We think there may be a bug in the online evaluation script.
On the validation set, the online result is higher than our local result. And we found that if we don't remove the duplicate candidates of a cast in the local `eval.py` script, the result is consistent with the online result. So, we think that the online script may not remove the duplicate candidates.
Thanks very much. We have fixed the bug and rerun all the submissions.
Posted by: wider @ July 18, 2019, 2:20 a.m.