CodaLab -

> evaluation procedure

Hi, we are wondering how the models are actually evaluated and how the large amount of border tiles are handled in this context? In particular, we (like probably some of the other participants) noticed the following effects:

- changing axes of the submission file from (#files, w, h) to (#files, h, w) does not affect the IOU score returned by the system
- changing order of the #files of the submission file does not affect the IOU score returned by the system
- submitting an all zero array returns the currently leading IOU score (0.7010)

Is this behaviour intended?

Posted by: mwlan @ April 27, 2021, 8:41 a.m.

Plotting predictions of some of the public submissions shows that the evaluation procedure seems strongly biased by the background class and the IOU score returned by the system does not account well for the positive class. This is to be expected for such a task that is by definition imbalanced. However, since it seems that background here does not only include valid nowater pixels but also a large amount of nodata (border) pixels the imbalance is further emphasized. We can cope with this imbalance during training and validation, but at last also the testing procedure should take this into account, otherwise predicting all background returns a too good IOU (as is the case at the moment).

In the following plots of a few predictions with respective submission IOUs:

https://ibb.co/yyvq3dN
https://ibb.co/7bdPx0Q
https://ibb.co/FHbskTB
https://ibb.co/zH7C4YJ

We are well aware that this is only a partial picture. The plots just aim to underline the above statement that currently a high IOU does not necessarily represent a good segmentation of the positive class and perhaps some rethinking of the evaluation procedure or at least some more details about it might be required. Please feel free to correct us if we are wrong - happy to discuss this further.

Posted by: mwlan @ April 28, 2021, 7:36 a.m.

I am also wondering why the all-zero submission can reach an IoU of ~0.7 (which are the current top 3 results), as the task's [evaluation] shows that label 1 (flooded) is our RoI region, but the IoU seems to calculate the label 0 IoU (if the evaluation procedure is correct). So could you please give the detailed evaluation criteria, such as is this positive IoU? mean IoU? or label 0 / label 1 IoU? And the all white/ all black regions in the vh & vv images are to be calculated for the final result? Feel free to let me know if I misunderstood something above. Waiting for the reply~

Posted by: jzsherlock @ April 29, 2021, 8:34 a.m.

Are there more detailed explanations or examples concerning the above conditions updated for this challenge?

Posted by: jzsherlock @ May 10, 2021, 12:42 p.m.

I am also wondering what the organizers have to say to this and if there are any updates / explanations planned regarding the evaluation procedure for the upcoming phase 2. At least on the basis of the above mentioned observations we get the impression that the ranking of submissions is rather meaningless in its current state. Any input on this topic would be highly appreciated.

Posted by: mwlan @ May 11, 2021, 7:46 p.m.

Just a brief update: Also in phase 2 submitting an all zero array achieves top rank with IOU of 0.7685.

Posted by: mwlan @ May 17, 2021, 11:35 a.m.

Hi mwlan,

We are aware of this possible hack and we will make an announcement about it soon. Thanks for bringing it up!

Posted by: Shubhankar @ May 22, 2021, 12:04 p.m.

Post in this thread

Forums

ETCI 2021 Competition on Flood Detection Forum

> evaluation procedure