I have a question about the reproducibility for final submission. I build my model with Pytorch framework and I apply all the recommend methods to ensure the ability to reproduce the results. However, I can not get an exact result on different machines. The results are very close with each other (+ or - 0.4%). I know that it is very difficult to reproduce an exact result on different platforms with Tensorflow as well.
Here is the link for Pytorch https://pytorch.org/docs/stable/notes/randomness.html.
"Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seed"
Would you count the score that you get on your platform after you run the submit code as the final ranking score ?
Posted by: wduong @ Sept. 28, 2021, 2:23 p.m.Hi, thanks for your question
In your case, +-0.4 is fairly reasonable from a scientific perspective. We run your code again for checking the reliability of the predictions that participants submit on Codalab. If it's relatively stable (just like in your case), we will respect your result in the leaderboard. Don't worry. If there are extreme cases where two groups' results/accuracies are too close after we compute the final score (overall of both tasks), and rerunning their code lead to fluctuation which influences the final ranking, we MAY offer tied winners or similar (this will be dicussed if we observer such cases in the evaluation stage of the competition).
Xiaoxi