Our eval results have a large difference between the online and offline eval metrics, which should be the same.
We need the example submission to check whether we have used the wrong test/val set image lists.
Oh, I find out why our local evaluation is so different from the online evaluation ... The data lists are different.
I was using the data lists from SODA-2d, and the data split is done by using SSLAD_Track_3-master (official code).
However, I should actually use the data split on the Track 3B website. (https://competitions.codalab.org/competitions/33993)