VQA Real Image Challenge (Open-Ended) Forum

Go back to competition Back to thread list Post in this thread

> Challenge Scores

Will the official Challenge scores be based on a team's best submission score or last submission score?

Posted by: kiblee @ May 25, 2016, 1:22 p.m.

They will be based on best "test-dev" performance.

Posted by: vqateam @ May 25, 2016, 2:57 p.m.

Do you mean best "Challenge test2015" scores? I thought test-dev was for debugging and validation purposes.

Posted by: kiblee @ May 25, 2016, 11:49 p.m.

Out of all the entries you submit either to Real Challenge test2015 phase OR to Real test2015 phase and submit to leaderboard, the entry which has the highest accuracy on the test-dev split of the dataset, will be counted for the final challenge results.

Posted by: vqateam @ May 26, 2016, 3:18 a.m.

That doesn't make sense... wouldn't that allow for overfitting since we can submit unlimited times to test-dev? Shouldn't the challenge rankings be based on the whole test set?

Posted by: daylen @ May 26, 2016, 6:22 p.m.

It seems like forums on other categories are not being checked, so allow me to copy/paste it here.
In Real images, if you submit to both Test and Challenge Test, then the submission with highest on "test-dev" split will be the final one.
How does it work on abstract scenes as there is no test-dev split?

Posted by: MIL @ May 29, 2016, 11:55 a.m.

@daylen -- No, that would not lead to v=overfitting because you will be ranked according to your accuracies on test-challenge split of the dataset. Test-dev will only be used to select which entry of yours to rank.
@MIL -- Please check the forum for abstract open-ended competition.

Posted by: vqateam @ May 29, 2016, 12:26 p.m.
Post in this thread