CodaLab -

> Possible data leak

Hello,
First, I want to thank organizators for this challenge, I appreciate all hard work that you've put into collecting and labeling the data, however while browsing I've found strange correlation.
In training and validation data TEMPLATE_ID field is correlated with SUBJECT_ID and indeed, all subject's images are insequnce. For PoC I've created a CoLab notebook (https://colab.research.google.com/drive/1IigvctNRv3Z241tJus4cwGWFV_L-9RUg?usp=sharing) where simple logistic regression is trained only on the diffetence between TEMPLATE_ID of every subject.

Is it intended correlation and if not, will you update the dataset or participants will be filtered by their apporoach after the competition is over?

P.S. I've submitted predictions for the fitted model and it is now my latest submition.

Posted by: vuvko @ May 31, 2020, 11:46 a.m.

Hi,

you are right, this was not intended and unfortunately in the validation phase there is not much we can do with this. However, the test data are different, they don't suffer from this issue so it's definitely worth making an algorithm that doesn't rely on this :-) Nevertheless, thank you for making this observation and sharing it with us.

Tomas

Posted by: tomass @ June 1, 2020, 10:04 a.m.

Thank you for your reply!
I was just concerned because there was no forum thread or information from you regarding this problem.

Posted by: vuvko @ June 1, 2020, 12:27 p.m.

Post in this thread

Forums

ECCV 2020 ChaLearn Looking at People Fair Face Recognition challenge Forum

> Possible data leak