Organized by automl.chalearn - Current server time: June 25, 2018, 3:49 p.m. UTC


Dec. 7, 2015, midnight UTC


Dec. 31, 2050, 11 p.m. UTC


Competition Ends

This is a clone of the AutoML challenge Round 3, set up for practice purposes. Please refer to the main AutoML challenge site for instructions.


Here is a sample result submission for round 3.

Here is a sample code submission (works of all rounds).

More information about the (new version of) the starting kit is found [here].


This challenge is brought to you by ChaLearn. Contact the organizers.



This challenge is concerned with regression and classification problems (binary, multi-class, or multi-label) from data already formatted in fixed-length feature-vector representations. Each task is associated with a dataset coming from a real application. The domains of application are very diverse and are drawn from: biology and medicine, ecology, energy and sustainability management, image, text, audio, speech, video and other sensor data processing, internet social media management and advertising, market analysis and financial prediction.
All datasets present themselves in the form of data matrices with samples in lines and features (or variables) in columns. For instance, in a medical application, the samples may represent patient records and the features may represent results of laboratory analyses. The goal is to predict a target value, for instance the diagnosis "diseased" or "healthy" in the case of a medical diagnosis problem.
The identity of the datasets and the features is concealed (except in round 0) to avoid the use of domain knowledge and push the participants to design fully automated machine learning solutions.
In addition, the tasks are constrained by:

  • A Time Budget.
  • A Scoring Metric.

Task, scoring metric and time budget are provided with the data, in a special "info" file.

Time Budget

The Codalab platform provides computational resources shared by all participants. To ensure the fairness of the evaluation, when a code submission is evaluated, its execution time is limited to a given Time Budget, which varies from dataset to dataset. The time budget is provided with each dataset in its "info" file. The organizers reserve the right to adjust the time budget by supplying the participants with new info files.
The participants who submit results (instead of code) are NOT constrained by the Time Budget, since they can run their code on their own platform. This may be advantageous for entries counting towards the Final phases (immediately following a Tweakathon). The participants wishing to also enter the AutoML phases, which require submitting code, can submit BOTH results and code (simultaneously). See the Instructions for details.

Scoring Metrics

The scoring program computes a score by comparing submitted predictions with reference "target values". For each sample i, i=1:P, the target value is:

  • a continuous numeric coefficient yi, for regression problem;
  • a vector of binary indicators [yik] in {0, 1}, for multi-class or multi-label classification problems (one per class k);
  • a single binary indicator yi in {0, 1}, for binary classification problems.

The participants must turn in prediction values matching as closely as possible the target value, in the form of:

  • a continuous numeric coefficient qi for regression problem;
  • a vector of numeric coefficients [qik] in the range [0, 1] for multi-class or multi-label classification problems (one per class k);
  • a single numeric coefficients qi in the range [0, 1] for binary classification problems.

The Starting Kit contains the Python implementation of all scoring metrics used to evaluate the entries. Each dataset has its own metric (scoring criterion), specified in its "info" file. All scores are re-normalized such that the expected value of the score for a "trivial guess" based on class prior probabilities is 0 and the optimal score is 1. Multi-label problems are treated as multiple binary classification problems and are evaluated by the average of the scores of each binary classification sub-problem.
The scores are taken from the following list:

  • R2: R-square or "coefficient of determination" used for regression problems: R2 = 1-MSE/VAR, where MSE=< (yi - qi)2> is the mean-square-error and VAR= < (yi - m)2> is the variance, with m=< yi >.
  • ABS: A coefficient similar to the R2 but based on mean absolute error (MAE) and mean absolute deviation (MAD): ABS =  1-MAE/MAD, with MAE=< abs(yi - qi) > and MAD=< abs(yi - m) >.
  • BAC: Balanced accuracy, which is the average of class-wise accuracy for classification problems (or the average of sensitivity (true positive rate) and specificity (true negative rate) for the special case of binary classification). For binary classification problems, the class-wise accuracy is the fraction of correct class predictions when qi is thresholded at 0.5, for each class. The class-wise accuracy is averaged over all classes for multi-label problems. For multi-class classification problems, the predictions are binarized by selecting the class with maximum prediction value argmaxk qik before computing the class-wise accuracy. We normalize the BAC with the formula BAC := (BAC-R)/(1-R), where R is the expected value of BAC for random predictions (i.e. R=0.5 for binary classification and R=(1/C) for C-class classification problems).
  • AUC: Area under the ROC curve, used for ranking and for binary classification problems. The ROC curve is the curve of sensitivity vs. 1-specificity, when a threshold is varied on the predictions. The AUC is identical to the BAC for binary predictions. The AUC is calculated for each class separately before averaging over all classes. We normalize it with the formula: AUC := 2AUC-1, making it de-facto identical to the so-called Gini index.
  • F1 score: The harmonic mean of precision and recall. Precision=positive predictive value=true_positive/all_called_positive. Recall=sensitivity=true positive rate=true_positive/all_real_positive. Prediction thresholding and class averaging is handled similarly as in the case of the BAC. We also normalize F1 with F1 := (F1-R)/(1-R), where R is the expected value of F1 for random predictions (i.e. R=0.5 for binary classification and R=(1/C) for C-class classification problems).
  • PAC: Probabilistic accuracy PAC = exp(- CE) based on the cross-entropy or log loss, CE = - < sumk log(qik) > for multi-class classification and CE = - <yi log(qi) + (1-yi) log(1-qi)> for binary classification and multi-label problems. Class averaging is performed after taking the exponential in the multi-label case. We normalize with PAC := (PAC-R)/(1-R), where R is the score obtained using qi =< yi > or qik=< yik > (i.e. using as predictions the fraction of positive class examples as an estimate of the prior probability).

We note that for R2, ABS, and PAC the normalization uses a "trivial guess" corresponding to the average target value qi =< yi > or qik=< yik >. In contrast, for BAC, AUC, and F1 the "trivial guess" is a random prediction of one of the classes with uniform probability.
In all formulas the brackets < . > designates the average over all P samples indexed by i: < yi > = (1/P) sumi (yi). Only R2 and ABS make sense for regression; we compute the other scores for completeness by replacing the target values by binary values after thresholding them in the mid-range.

Leaderboard score calculation

Each round includes five datasets from different application domains, spanning various levels of difficulty. The participants (or their submitted programs) provide prediction results for the withheld target values (called "solution"), for all 5 datasets. Independently of any intervention of the participants, the original version of the scoring program supplied by the organizers is run on the server to compute the scores. For each dataset, the participants are ranked in decreasing order of performance for the prescribed scoring metric associated with the given task. The overall score is computed by averaging the ranks over all 5 datasets and shown in the column <rank> on the leaderboard.

We ask the participants to test their systems regularly while training to produce intermediate prediction results, which will allow us to make learning curves (performance as a function of training time). Using such learning curves, we will adjust the "time budget" in subsequent rounds (eventually giving you more computational time!). But only the last point (corresponding to the file with the largest order number) is used for leaderboard calculations.

The results of the LAST submission made are used to compute the leaderboard results (so you must re-submit an older entry that you prefer if you want it to count as your final entry). This is what is meant by “Leaderboard modifying disallowed”. In phases marked with a [+], the participants with the three smallest <rank> are eligible for prizes, if they meet the Terms and Conditions.

Training, validation and test sets

For each dataset, a labeled training set is provided for training and two unlabeled sets (validation set and test set) are provided for testing.

Phases and rounds

The challenge is run in multiple Phases grouped in rounds, alternating AutoML contests and Tweakathons. There are 6 six rounds: Round 0 (Preparation round), followed by 5 rounds of progressive difficulty (Novice, Intermediate, Advanced, Expert, and Master). Except for round 0 (preparation) and round 5 (termination), all rounds include 3 phases, alternating Tweakathons and AutoML contests:

Phase in round [n] Goal Duration Submissions Data Leaderboard scores Prizes
[+] AutoML[n] Blind test of code Short NONE (code migrated) New datasets, not downloadable Test set results Yes
Tweakathon[n] Manual tweaking 1 month Code and/or results Datasets downloadable Validation set results No
[+] Final[n] Results of Tweakathon revealed Short NONE (results migrated) NA Test set results Yes

The results of the last submission made are shown on the leaderboard. Submissions are made in Tweakathon phases only. The last submission of one phase migrates automatically to the next one. If code is submitted, this makes it possible to participate to subsequent phases without making new submissions. Prizes are attributed for phases marked with a [+] during which there is NO submission. The total prize pool is $30,000 (see Rewards and Terms and Conditions for details).

Code vs. result submission

To participate in the AutoML[n] phase, code must be submitted in Tweakathon[n-1]. To participate in the Final[n], code or results must be submitted in Tweakathon[n]. If both code and (well-formatted) results are submitted, in  Tweakathon[n] the results are used for scoring rather than re-running the code in Tweakathon[n] and Final[n]. The code is executed when results are unavailable or not well formatted. Hence there is no disadvantage to submitting both results and code. There is no obligation to submit the code, which has produced the results provided. Using mixed submissions of results and code, different methods can be used to enter the Tweakathon/Final phases and to enter the AutoML phases.


There are 5 datasets in each round spanning a range of difficulties:

  • Different tasks: regression, binary classification, multi-class classification, multi-label classification.
  • Class balance: Balanced or unbalanced class proportions.
  • Sparsity: Full matrices or sparse matrices.
  • Missing values: Presence or absence of missing values.
  • Categorical variables: Presence or absence of categorical variables.
  • Irrelevant variables: Presence or absence of additional irrelevant variables (distractors).
  • Number Ptr of training examples: Small or large number of training examples.
  • Number N of variables/features: Small or large number of variables.
  • Aspect ratio Ptr/N of the training data matrix: Ptr>>N, Ptr~=N or Ptr<<N.

We will progressively introduce difficulties from round to round (each round cumulating all the difficulties of the previous ones plus new ones): Some datasets may be recycled from previous challenges, but reformatted into new representations, except for the final MASTER round, which includes only completely new data.

  1. NOVICE: Binary classification problems only; no missing data; no categorical variables; moderate number of features (<2000); balanced classes; BUT sparse and full matrices; presence of irrelevant variables; various Ptr/N.
  2. INTERMEDIATE: Multi-class and binary classification problems + additional difficulties including: unbalanced classes; small and large number of classes (several hundred); some missing values; some categorical variables; up to 5000 features.
  3. ADVANCED: All types of classification problems, including multi-label + additional difficulties including: up to 300,000 features.
  4. EXPERT: Classification and regression problems, all difficulties.
  5. MASTER: Classification and regression problems, all difficulties, completely new datasets.



This challenge is brought to you by ChaLearn. Contact the organizers.

Challenge Rules

  • General Terms: This challenge is governed by the General ChaLearn Contest Rule Terms, the Codalab Terms and Conditions, and the specific rules set forth.
  • Announcements: To receive announcements and be informed of any change in rules, the participants must provide a valid email.
  • Conditions of participation: Participation requires complying with the rules of the challenge. Prize eligibility is restricted by US government export regulations, see the General ChaLearn Contest Rule Terms. The organizers, sponsors, their students, close family members (parents, sibling, spouse or children) and household members, as well as any person having had access to the truth values or to any information about the data or the challenge design giving him (or her) an unfair advantage, are excluded from participation. A disqualified person may submit one or several entries in the challenge and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If a disqualified person submits an entry, this entry will not be part of the final ranking and does not qualify for prizes. The participants should be aware that ChaLearn and the organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for prizes.
  • Dissemination: The participants will be invited to attend a workshop organized in conjunction with a major machine learning conference and contribute to the proceedings. The challenge is part of the competition program of the IJCNN 2015 conference.
  • Registration: The participants must register to Codalab and provide a valid email address. Teams must register only once and provide a group email, which is forwarded to all team members. Teams or solo participants registering multiple times to gain an advantage in the competition may be disqualified.
  • Anonymity: The participants who do not present their results at the workshop can elect to remain anonymous by using a pseudonym. Their results will be published on the leaderboard under that pseudonym, and their real name will remain confidential. However, the participants must disclose their real identity to the organizers to claim any prize they might win. See our privacy policy for details.
  • Submission method: The results must be submitted through this CodaLab competition site. The participants can make up to 5 submissions per day in the Tweakathon phases. Using multiple accounts to increase the number of submissions in NOT permitted. There are NO submissions in the Final and AutoML phases (the submissions from the previous Tweakathon phase migrate automatically). In case of problem, send email to The entries must be formatted as specified on the Evaluation page.
  • Awards: The three top ranking participants of each Final or AutoML phase may qualify for awards (cash prize, travel award, and award certificate). To compete for awards, the participants must fill out a fact sheet briefly describing their methods. There is no other publication requirement. The winners will be required to make their code publicly available under an OSI-approved license such as, for instance, Apache 2.0, MIT or BSD-like license, if they accept their prize, within a week of the deadline for submitting the final results. The winners of each of the 10 prize-winning phases (indicated by a [+] in the Phases data-toggle="tab" page) will de determined according to leaderboard ranking (see the Evaluation page). In AutoML phases, entries exceeding the total time budget of the 5 tasks will not qualify for prizes. In case of a tie, the prize will go to the participant who submitted his/her entry first. Non winners or entrants who decline their prize retain all their rights on their entries and are not obliged to publicly release their code.
  • Travel awards: The travel awards may be used to attend a workshop organized in conjunction with the challenge. The award money will be granted in reimbursement of expenses including airfare, ground transportation, hotel, or workshop registration. Reimbursement is conditioned on (i) attending the workshop, (ii) making an oral presentation of the methods used in the challenge, and (iii) presenting original receipts and boarding passes. The reimbursements will be made after the workshop.


This challenge is brought to you by ChaLearn. Contact the organizers.


Start: Dec. 7, 2015, midnight

Description: For practice only. You may submit code capable of producing predictions on both VALIDATION AND TEST DATA or results. The leaderboard shows scores on validation data only. The final test results will be shown during phase Final 3.


Start: Dec. 31, 2050, 11 p.m.

Description: Results on test data of phase 3. There is NO NEW SUBMISSION. The results on test data of the last submission are shown.

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 aad_freiburg 2.8000
2 verif 2.8000
3 3.2000