Secret url:
https://competitions.codalab.org/competitions/15790?secret_key=cf627d89-2ea0-435a-af88-99dc7f371cbd
A computer vision challenge is proposed for undergraduate students in which the challenger must predict the class of a person (major or minor) based on a picture of his/her face.
“Welcome to Lothlorien, the elves Heaven of the middle earth where everything is peaceful. The kingdom possesses many secrets which cannot be disclosed to young subjects. Gandalf, thanks to his great deeds earned the responsibility to guard young people from accessing the old secrets. Unfortunately there is no id card in the venerable kingdom and Gandalf is not very good to recognize adults from lying teenagers. During this challenge a classifier must be trained to help Gandalf distinguish if a given elf can access the secrets of the wiser. For all those who fail, THEY SHALL NOT PASS.”
This challenge aims at addressing the issue of resources access (website, drug purchase, violent movie, etc.) based on the age of a person. Indeed a lot of violent content is accessible on the internet and 45 % of children under 12 are not monitored by parental control. For this sake, we rely on the person's real-time image to estimate his age category. Facial aging effects are mainly correlated to bone movement and growth, skin wrinkles and reduction of muscle strength [1]. Human observation lacking of accuracy, we want to find an automatic algorithm to make this distinction.
A labeled dataset of face images is provided. The dataset was created based on the Wiki dataset [2] and preprocessed using a pretrained convolutional neural network: VGG [3]. Your task is to predict if a given person is over 30 year-old (label = 1) or under 30 year-old (label 0).
References:
[1] M. G. Rhodes, ‘Age Estimation of Faces: A Review’(2009), Appl. Cognitive Psychology. 23, pp 1–12.
[2] IMDB-Wiki – 500k+ face images with age and gender labels link
[3] VGG Face Descriptor, link
Credits:
This challenge is Brought to you by Cyan team from M2 AIC (2016-2017)
Red team for its valuable testing and remarks
Cyan team members:
Herilalaina RAKOTOARISON
Anh Khoa NGO HO
Mihaela SOROSTINEAN
Nawel MEDJKOUNE
Laurent CENTINSOY
Ahmed MAZARI (Team coordinator)
Contact: ahmed [dot] mazari [at] outlook [dot] com
Aknowledgements:
This challenge was created using Chalab, the Machine Learning Challenge creator, with the help of Professor Isabelle Guyon (guyon [at] chalearn.org) link
The Chalab wizard is to be found here: link
Logo: link
The submission will be evaluated using the bac_binary
How to enter the challenge ?
To enter the challenge, you should go to the “Participate” page. Once logged in, you’ll have access to everything you need to get started. First, we give you a starting-kit and some instructions on how to use it. Second, we provide you with a sample submission that you can use to try a submission on the competition page. Third and last, we give you access to the data. You’ll notice that we provide both raw data and preprocessed data, feel free to use both.
Evaluation
The goal of classification is to learn a classifier that can distinguish the two classes. We can have differents manner to set the separator as shown in the figure below :
Fig. 1 : A decision boundary from various classifiers.
When the classes are imbalanced (i.e. when one label frequency is way higher than the other one like in figure 2), the classical error is not an appropriate metric. For instance if a class was 99% and 1%, a model predicting 1 all the time would make 1% error only.
Fig. 2 : In this example, the red class is way less frequent than the blue one.
The points show examples distribution in two dimensions. The axes (X,Y) represent the coordinates of the examples and the blue, red colors representes the two classes.
But that still would not be a good classifier. To circumvent this issue the BAC binary metric which takes into account the class imbalance can be used. BAC metric is normalized to cope with unbalanced classes such that the expected value of the score for a "trivial guess" based on class prior probabilities is 0 and the optimal score is 1.It is defined as follow :
BAC = ( T P / P + T N / N ) / 2
Such that :
TP: Number of True Positive examples
TN: Number of True Negative examples
P: Number of Positive class
N: Number of Negative class
Some other techniques can be considered :
Balancing the training set classes by oversampling the minor class or undersampling the major class.
Fig. 3 : In this example, we define one way to oversample and undersample classes.
FInd a smart classifier that adjusts the class weights and decision boundary or is sensitive to unbalanced classes.
Evaluation metric : accuracy criteria can't be applied to unbalanced data because it comes up with a naive threshold to decide between classes and the accuracy computation is based upon a simple count of a number of errors but doesn't take into consideration which classes are being confused and the data distribution of each class. For that reason we rather use BAC Binary metric
[1] Box Drawings for Learning with Imbalanced Data. Siong Thye Goh and Cynthia Rudin. KDD-2014, August 24–27, 2014, New York, NY, USA.
[2] Class Imbalance, Redux. Wallace, Small, Brodley and Trikalinos. IEEE Conf on Data Mining. 2011.
[3] Machine learning with imbalanced data sets. Link
[4] Learning from unbalanced classes. Link
Submissions must be submitted before the 2017-04-30 23:59. You may submit 5 submissions every day and 100 in total.
This competition is organized solely for test purposes. No prizes will be awarded. The authors decline responsibility for mistakes, incompleteness or lack of quality of the information provided in the challenge website. The authors are not responsible for any contents linked or referred to from the pages of this site, which are external to this site. The authors intended not to use any copyrighted material or, if not possible, to indicate the copyright of the respective object. The authors intended not to violate any patent rights or, if not possible, to indicate the patents of the respective objects. The payment of royalties or other fees for use of methods, which may be protected by patents, remains the responsibility of the users.
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS" THE ORGANIZERS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE THROUGH THIS WEBSITE.
Participation in the organized challenge is not-binding and without obligation. Parts of the pages or the complete publication and information might be extended, changed or partly or completely deleted by the authors without notice.
Start: Nov. 11, 2016, 3:21 p.m.
Description: Development phase: create models and submit them or directly submit results on validation and/or test data; feed-back are provided on the validation set only.
Start: April 30, 2017, 11:59 p.m.
Description: Final phase: submissions from the previous phase are automatically cloned and used to compute the final score. The results on the test set will be revealed when the organizers make them available.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | red | 0.5359 |
2 | nawelmdk | 0.5185 |
3 | guyon | 0.5177 |