First of all, thank you so much for this really great competition. Whether I win or not, I enjoy the whole (learning) process...
One of the questions that bothers me for quite a while is about the data allowed for training the model.
In the competition rules it is stated in section 10: "You may use data other than the competition data to develop and test your models and submissions." From this I infer that it is also allowed to use all the auxiliary datasets in the data folder of this challenge to train the model (if you are allowed to use external data to train the model, why not these data...?)
However in the project description it is stated that: "Participants are asked to train their models on the training set and submit their results, predicting the labels of the public test set." This gives the suggestion that the final model takes ONLY the training set as input to train the model (so no auxiliary data set nor external data sets).
My question is: is it allowed to use the auxiliary data sets together with the training set to train the final model that makes the predictions for the public and (later) private test sets, or is it absolutely necessary to have ONLY the training set as the sole basis for the predictions?
Thanks in advance for your response!Posted by: boogie @ Sept. 30, 2020, 12:34 p.m.
Yes, it is allowed to use the auxiliary data and any external dataset (that you have declared on) to train your final model.
It is also allowed to use the public test set (and the full public test set that will be released on October 8 ).
It is not allowed to use the private test set in any way for training your final model or preliminary analysis. The private test set need to be used only for making a final prediction.
(MAFAT Challenge Team)