ChaLearn Looking at People 2015 - Track 1: Age Estimation Forum

Go back to competition Back to thread list Post in this thread

> Human error level?

Can the organizers provide the human error level for this task? I think it would be interesting to see how the machines compare to that.

Posted by: Raducu @ July 16, 2015, 3:38 p.m.

Dear Raducu,

Thank you for your interesting suggestion. In fact, for the workshop we will provide several statistics about the data set and share all this information with you. However, at the moment we prefer to avoid sharing all this information during competition. Just some comments:

-The error metric is somehow pessimistic, going faster to error 1 per sample when you move away from the mean vote (so participants are obtaining very good results in my opinion at this moment, and hopefully, they will be improved in the remaining almost two months of competition).

-We analyzed how standard SOTA methods generalize when training real age -> predicting real age, and training apparent age -> predicting apparent age on our dataset, obtaining slightly better results in the second case, showing that it is feasible to design a system with high "apparent age" generalization capabilities on the provided data.

best regards

PS: A lot of discussion about the problem, data, results, statistics, and so on in our ICCV 2015 workshop, where we hope most of you can attend :)

Posted by: sergio.escalera.guerrero @ July 18, 2015, 7:21 a.m.

If we assume that the mean distance of the human estimates from the "true age" would be equal to the standard deviation (x - mu = sigma), then the average human level error would be:

1 - exp(-(x-mu)^2 / (2 * sigma^2)) = 1 - exp(- 1 / 2) = 0.39

I'm not sure if this makes sense (in particular as the ground truth depends on individual estimates), but the number kind of makes sense with respect to the leaderboard.

What do you think?

Heikki

Posted by: mahehu @ July 19, 2015, 6:01 p.m.

Yes, I thought about this too, and I think that using the standard deviation provided with the training set is a good estimation of human judgement. I was actually looking to have this confirmed by the organizers, but if some else thought of it as well, it means that it's a good way to approximate the human error level on this task.

Thank you,
Radu

Posted by: Raducu @ July 19, 2015, 6:22 p.m.

Dear Heikki and Radu,
This is right what you comment (if I understood it correctly). As Heikki says it depends on individual estimates so we need to compute that mean for all raters based on each particular deviation of the image. As you comment that per image deviation is a good indicator about correlation among raters votes. We will compute that mean human error level and let you know asap.
best regards

Posted by: sergio.escalera.guerrero @ July 19, 2015, 7:41 p.m.

Dear participants,
As suggested by some of you, we estimated the mean human error based on the mean of all image errors, where each image error is the mean of the error of all votes performed in relation to the mean vote of that image (and each particular image deviation). This is somehow an "objective" value to compare with your obtained results. The human error level is: 0.34.
Of course there exists the bias that the votes are used to compute the mean vote which is later used for evaluation, since we did not ask for new votes which are not included in the mean vote computation (which will provide a more real value about human error), but it gives us an intuition about the complexity of the task.
best regards

Posted by: sergio.escalera.guerrero @ July 20, 2015, 8:44 a.m.
Post in this thread