CodaLab -

> A question about evaluation metrics

As mentioned in Evaluation page of this contest, the competition metrics involve MRAE, SID and MOS. The former two are distortion measurements while the third one is perception based. As pointed by "The perception and distortion tradeoff", The accuracy w.r.t. the ground truth typically scarifies the perceptional quality.
More ambiguously, even for distortion measurements, it still need to make a decision to tradeoff between MRAE and SID, based on our experiments.
Extreme optimization for MRAE would increase the SID error and vice verse, but there is no clue that how we should set weight for every loss.
So I was wondering whether we need an overall metrics for this contest to eliminate the aforementioned ambiguity.

Posted by: Vacat @ Aug. 5, 2018, 10:23 a.m.

The final ranking will be based on a fidelity/accuracy measure.
It would be based on MRAE and when a tie we'll use SID, and thenPSNR, MSE in this order to establish the ranking.

There will be also a MOS ranking, but since it is a subjective/perceptual measure we have no other easy metrics to provide to the participants such that to optimize for.

Posted by: mehrdad.shoeiby @ Aug. 6, 2018, 1:58 a.m.

Thank you for explanation. But as a regression problem, such a tie could not happen except two identical submissions.
Taking the following situation into account:
submission #1: MRAE: 0.1202, SID: 60; #2: MRAE: 0.1201, SID: 63.
If we follow the ranking policy, we would say the #2 is better than #1.
But since MRAE of these two submissions is nearly equivalent, it seems more fair to declare a draw (w.r.t. MRAE), then compare the SID such that #1 surpasses #2.
So I was wondering if truncation strategy should be introduced in ranking policy, e.g. rounding to 3 decimal places for MRAE, to address the aforementioned issue.
Thank you for your attention again.

Posted by: Vacat @ Aug. 6, 2018, 2:57 a.m.

It seems to me also that MRAE of 0.1202 and 0.1201 are identical. We may round it to 2 or 3 decimal points when we are assessing the results.
We will decide on these when we see the final results to ensure a fair assessment. For the current results, I think 2 decimal points is reasonable.

Posted by: mehrdad.shoeiby @ Aug. 6, 2018, 3:55 a.m.

Post in this thread

Forums

PIRM 2018 Spectral Image Super-Resolution Challenge- Track 1 Forum

> A question about evaluation metrics