CodaLab -

> Training from validation data / Using pre-trained models from validation data - not recommended

Dear all,

We have been receiving several questions about the usage of validation data and the pre-trained models from the validation data.
Technically, we cannot force someone not to use the validation GT for whatever the purpose as the data has been publicly available from NTIRE 2019.

However, using the validation data for training does not follow the basic idea of having a validation set.
Our main goal of hosting competitions and making the dataset public is to pursue the good for the community.

Validation data exists to let people
1) validate the effectiveness of their methods and the modifications on the data outside the training set.
2) compare their methods without using the test set.

If validation data is used for training, we have no (or little) data to check the generalizability of the developed methods.
If someone chooses to construct their own validation set, the comparison with the other methods gets complicated.
As the environment for training/validation is different, it is difficult to perform scientific analyses between them.
To compare such methods in a fair manner, additional effort must be done: one needs to retrain all the methods in a unified environment.
The community cannot get concrete knowledge without the 3rd party effort.

We understand that such concerns wouldn't have been an issue if we did not release the validation GT.
However, CodaLab online server provides limited power to evaluate on 1/10 of the validation set.
Thus, we decided to release the validation GT to let people analyze their solutions on their own, not to encourage them to use it for training.

We keep our statement: we don't recommend using the validation data for training for the good of the community.
Should a pre-trained model (trained from the validation data) be used, the model can be trained from scratch using the training data and then be used for further development.

Best,
Seungjun

Posted by: SeungjunNah @ Jan. 20, 2021, 4:25 a.m.

Great points. Another point is, training with validation data wouldn't matter much at test phase since test GT is not available.

Also maybe it's a good idea to use meta info (Extra Data [1] / No Extra Data [0] ) to indicate if validation set used for training or not (like 2), and request the top ranking submissions at validation phase to confirm that?

Posted by: zhihongp @ Jan. 20, 2021, 8:21 p.m.

Dear zhihongp,

1) Even without the test set GT, using validation data for training is an issue.

One may achieve good performance by a specific method that is trained with training + validation data.
In such a case, we cannot distinguish if the source of the achievement originates from the method itself or the data.

What we expect from public competitions is a fair benchmark under a controlled environment and the knowledge we get by comparing the proposed methods.
With both the method and data varying, little analysis can be done.
We gain little knowledge from the comparison.

2) Thanks for the suggestion.
If we decide to mark the validation data as extra data, we will notice it.

Posted by: SeungjunNah @ Jan. 22, 2021, 3:39 a.m.

Post in this thread

Forums

NTIRE 2021 Image Deblurring Challenge - Track1. Low Resolution Forum

> Training from validation data / Using pre-trained models from validation data - not recommended