Hello,
I am little confused about the datasets and how the evaluation process will develop. At this moment I only worked for the task 1, but I think this applies also to the task 2. Here are my questions:
1. How I am supposed to use the trial and practice datasets? I mean it would be weird to train on a part of practice dataset for the practice evaluation and then evaluate the model on the whole practice dataset (I am referring to CodaLab evaluation). I thought about training my model on trial dataset and evaluate it on practice dataset. Is this correct? I also used the split on the github repository to compare my results with the baseline.
2. To my understanding the evaluation dataset will be blind, so no "Gold" labels will be given. Can then I use both trial and practice dataset to train/validate my models?
Thank you,
Andrei
Dear Andrei,
1- Trial and Practice leaderboards are here for practising only and make it easier for contributors to benchmark their own models. Training a model on trial data and evaluating on practice data completely makes sense -so is the other way round, and is indeed what we had in mind proposing this leaderboards.
2 - We do our best to make sure the blind test data will have the same repartition and same semantics as both trial and practice datasets, so in the end you can train a model aggregating both trial and practice data and making your own train/dev/test partition, and as you suggested use this model to predict on blind test data.
3 - I take this opportunity to confirm to everyone that using other data sources is also possible. You can choose any complementary data you could have access to or any data augmentation procedure, as long as you detail the augmentation process in your paper.
So I see that my intuition was correct.
Also the fact that we can use data from other sources is great!
Thank you very much for your fast answer!
Posted by: andrei.avram @ March 26, 2020, 2:07 p.m.You're welcome, thanks for you contribution !
Posted by: dmariko @ March 26, 2020, 2:19 p.m.