Petersburg hackathon :: AutoML Round 1 ++

Secret url:
Organized by lukasz.romaszko - Current server time: June 25, 2018, 3:51 p.m. UTC


July 31, 2015, 1 a.m. UTC


Aug. 4, 2015, 3 p.m. UTC


AutoML 1.5
Aug. 5, 2020, 9 a.m. UTC


This is a clone of the AutoML challenge presently running specifically made for the hackathon of the MS machine learning summer school of Saint Petersburg, 2015. This clone differs from the regular challenge in that:

  • It is limited to data from round 1 of the AutoML challenge;
  • It is limited to 2 phases: Tweakathon 1 and Final 1 (no AutoML phase);
  • It starts on July 31, 2015 at 14h and ends on August 4 at 18h (Petersburg time; submissions are then migrated automatically to the the Final phase for final ranking);
  • It is limited to entering code submissions (no result submission);
  • It has a time budget limited to 10 minutes for a submission (for all 5 datasets) and is NOT governed by the time specified in the "info" file;
  • It has a limit of 10 submissions per day;
  • It has special prizes (you are not entitled to the prizes of the AutoML challenge);
  • It has a special "coopetition" method of ranking people to encourage them to collaborate: the number of downloads of groups sharing their code will be taken into account in the scoring;
  • It is subject to the Terms and Conditions shown on this website.

To get started:

Important advice:

  • Always write a description when you submit an entry that emphasizes how great your submission is to encourage others to download it. Your submission will then appear in the PUBLIC SUBMISSION panel.
  • Make forum posts to advertise for your code, comment on anything, ask questions.
  • Download other people's code and rate it (with likes).


This challenge is brought to you by ChaLearn. Contact the organizers.CHALEARN



This AutoML challenge is concerned with regression and classification problems (binary, multi-class, or multi-label) BUT this hackathon is limited to binary classification problems. Data are already formatted in fixed-length feature-vector representations. Each task is associated with a dataset coming from a real application. The domains of application are very diverse and are drawn from: biology and medicine, ecology, energy and sustainability management, image, text, audio, speech, video and other sensor data processing, internet social media management and advertising, market analysis and financial prediction.
All datasets present themselves in the form of data matrices with samples in lines and features (or variables) in columns. For instance, in a medical application, the samples may represent patient records and the features may represent results of laboratory analyses. The goal is to predict a target value, for instance the diagnosis "diseased" or "healthy" in the case of a medical diagnosis problem.
The identity of the datasets and the features is concealed (except in round 0) to avoid the use of domain knowledge and push the participants to design fully automated machine learning solutions.
In addition, the tasks are constrained by:

  • A Time Budget.
  • A Scoring Metric.

Task, scoring metric and time budget are provided with the data, in a special "info" file. HOWEVER, for the hackathon the time budget is hard coded into the code and limited to a total of 10 minutes.

Scoring Metrics

The scoring program computes a score by comparing submitted predictions with reference "target values". See this paper for details.

Leaderboard score calculation

The participants (or their submitted programs) provide prediction results for the withheld target values (called "solution"), for all 5 datasets or round 1. The results of scoring are shown on the leaderboard. We describe the columns:

  • <Rank>: This is the average of the numbers in parenthesis in columns "AVG" and column "Coopetition". This is the column used for the final evaluation.
  • AVG: This is the average of the scores shown in columns Set1 through Set 5.
  • Set i: These are the columns with the scores for each dataset.
  • Duration: Execution time of the code on the 5 datasets.
  • sub. no: Number of submissions made by the participant.
  • Coopetition: Total number of downloads of code by other participants.
  • Diff: Difference in average performance to the best previous submission.
  • Detailed Results: Scores with respect to all metrics.

For each column, the rank of the participant obtained by sorting is shown in parenthesis.

The results of the LAST submission made are used to compute the leaderboard results (so you must re-submit an older entry that you prefer if you want it to count as your final entry).

To make your submission public, you need to click the "Make your submission public" button. You should make your submissions public to allow other teams to download them and earn coopetition points (number of downloads). Write code descriptions and make forum posts to inform about the value of your contribution.

Training, validation and test sets

For each dataset, a labeled training set is provided for training and two unlabeled sets (validation set and test set) are provided for testing.

Submissions are made in Tweakathon phases only. The last submission of that phase migrates automatically to the Final phase.


There are 5 datasets in each round spanning a range of difficulties:

  • Different tasks: regression, binary classification, multi-class classification, multi-label classification.
  • Class balance: Balanced or unbalanced class proportions.
  • Sparsity: Full matrices or sparse matrices.
  • Missing values: Presence or absence of missing values.
  • Categorical variables: Presence or absence of categorical variables.
  • Irrelevant variables: Presence or absence of additional irrelevant variables (distractors).
  • Number Ptr of training examples: Small or large number of training examples.
  • Number N of variables/features: Small or large number of variables.
  • Aspect ratio Ptr/N of the training data matrix: Ptr>>N, Ptr~=N or Ptr<<N.

For Round 1: Binary classification problems only; no missing data; no categorical variables; moderate number of features (<2000); balanced classes; BUT sparse and full matrices; presence of irrelevant variables; various Ptr/N.


This challenge is brought to you by ChaLearn. Contact the organizers.

Challenge Rules

  • General Terms: This challenge is governed by the General ChaLearn Contest Rule Terms, the Codalab Terms and Conditions, and the specific rules set forth.
  • Announcements: To receive announcements and be informed of any change in rules, the participants must provide a valid email.
  • Conditions of participation: Participation requires complying with the rules of the challenge. Prize eligibility is restricted by US government export regulations, see the General ChaLearn Contest Rule Terms. The organizers, sponsors, their students, close family members (parents, sibling, spouse or children) and household members, as well as any person having had access to the truth values or to any information about the data or the challenge design giving him (or her) an unfair advantage, are excluded from participation. A disqualified person may submit one or several entries in the challenge and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If a disqualified person submits an entry, this entry will not be part of the final ranking and does not qualify for prizes. The participants should be aware that ChaLearn and the organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for prizes.
  • Registration: The participants are assigned to groups that are pre-registered. Ask for your group name and password.
  • Submission method: The participants must be submit code through this web site. The participants can make up to 10 submissions of 10 minutes per day in the Tweakathon phase. There are NO submissions in the Final phase (the submissions from the Tweakathon phase migrate automatically). In case of problem, send email to The entries must be formatted as specified on the Evaluation page.
  • Awards: The participants are not entitled to the prizes of the AutoML challenge. Award certificates and eventually small incentive prizes will be distributed to the winners (Azure account credits, goody bags). There will be award certificates for the three top ranking participants according to the <Rank> in Final 1 and AutoML 1.5 and one prize for the best presentation. In case of ties, the ties will be broken using the best average score (AVG column).


This challenge is brought to you by ChaLearn. Contact the organizers.



The datasets are downloadable from the Dataset page. Use the round 1 datasets.

Code submission

The participants must submit a zip file with their code (no result submission) via the Submission page. Do not use the general starting kit of the AutoML challenge. Use the hackathon sample code.
Although the sample submission can be used to submit results, code, or both, in this hackathon you will submit always code, using the Python language.

Always add a code description to your submission.

Input format and computational restrictions

The input format is specified on the Dataset page. It includes the prescribed "time budget" for each task (in seconds), which is different for each dataset. IGNORE IT. The time limit is hard coded in the sample code we provide. The total time limit is 10 minute per run (this is different in the general AutoML challenge).

Result format

All result files are formatted as text files ending with a ".predict" extension, with one result per sample per line, in the order of the samples. In round 1, we have only binary classification problems. In the regular AutoML challenge we have also:

  • Regression problems: one numeric value per line.
  • Binary classification problems: one numeric value between 0 and 1 to per line, indicating a score of class 1 membership (1 is certainty of class 1, 0.5 is a random guess, 0 is certainty of class 0).
  • Multiclass or multilabel problems: for C classes, C numeric values between 0 and 1 per line, indicating the scores of membership of the C classes. The scores add up to 1 for multiclass problems only.

The participants may test their models regularly and produce intermediate prediction results, numbered from num=0 to n. The following naming convention of the files should be respected:
where "basename" is the dataset name (e.g. adult, cadata, digits, dorothea, or newsgroups, in the first round), "setname" is either "valid" (validation set) or "test" (test set) and "num" is the order number of prediction results submitted. Please use the format 03d to number your submissions because we sort the file names in alphabetical order to determine the result order.

For example, in the first round, you would bundle for submission the following files in a zip archive (no directory structure):

  • adult_valid_000.predict
  • adult_valid_001.predict
  • adult_valid_002.predict
  • ...
  • adult_test_000.predict
  • adult_test_001.predict
  • adult_test_002.predict
  • ...
  • cadata_valid_000.predict
  • cadata_valid_001.predict
  • cadata_valid_002.predict
  • ...
  • cadata_test_000.predict
  • cadata_test_001.predict
  • cadata_test_002.predict
  • ...
  • etc.

The last result file for each set (with largest number num) is used for scoring.


This challenge is brought to you by ChaLearn. Contact the organizers.



What should I do for the slide presentation?

Prepare 3 slides for a 3 minute talk: (Title with names and affiliations) + (1) system description, (2) collaboration, (3) Feed-back (good/bad/wanted).

Submission deadline = Tuesday, August 4 at 18:00.


Send your presentation to 


What is the time budget?

We increased it to 20 minutes total. Sample code from the 'overview' section has 120 seconds per dataset, you can now double it to 240. Ignore time budget in the dataset's info file from the general AutoML challenge.

What if I have a problem with "metadata" ?

There are 2 possibilities :

- All the files in the zip file are in a folder (for example is in /sample_code/ You must put them at the root of the zip file (so must be in /

- You don't have the metadata file on your zip. Look at the sample code you downloaded at the beginning, you will see a metadata file. Add it at the root of the zip file and it will work.

How to make a submission that will be executed on the codalab platform?

The code will be executed if the predictions are not included in the .zip in the /res folder. Remove the /res folder from the .zip if you use the .zip generated by the sample code.

What is the submissions number limit?

The limit is 10 per day (day is 3 am to 3 am). Failed submissions are not counted.

What are the library versions? 

The Python and libraries that are installed on Codalab are: Anaconda 2.0 for Python 2.7. In particular you have:

scikit-learn 0.15.2
numpy 1.9.1
scipi 0.14


Can we submit code that pre-computes results?

No, we want you to submit real code that trains and tests on Codalab (even though you could submit results since you have the datasets.


Why did you group people by level?

Because that way everybody will progress better. If beginners are grouped with advanced people, they tend not to do anything. Because we encourage groups to share their code, the strongest teams will share their code and this will give a chance to the weaker teams.


Are we allowed to use multi-threading?

Yes. For each submission you make, you have the full use of an 8 core x86_64 machine with 14 GB RAM. You can either use multiple cores for one dataset or send the various datasets in parallel to different cores.


Can I make a different model for each dataset?



When is the coopetition score updated?

Only when you make a new submission

When does the hackathon ends?

August 4 at 18:00 h Petersbug time (3 pm UTC). Preferably submit your final submission before 17:30 h to make sure your submission works. After  18:00 h you will not be able to make changes anymore; your code will be executed again and scored on the test set.


Why do we not have the labels on validation data?

Because the results on the leaderboard until August 4 are computed with the validation labels and we want people to be stimulated to submit to Codalab to know their results. To get prediction results without doing submissions, use cross-validation. 


How can I get an estimate of performance without submitting code on Codalab?

You have cross-validation functions in scikit-learn, see

In order to compute cross-validation scores to estimate your performances without having to make submissions to the platform (and implement for example hyper-parameter selection), you need to use the correct scoring function. The scoring function name is called metric. It is given in the info file.

Example for christine usage = 'AutoML challenge 2015' name = 'christine' task = 'binary.classification' target_type = 'Binary' feat_type = 'Numerical' metric = 'bac_metric' time_budget = 1200 feat_num = 1636 target_num = 1 label_num = 2 train_num = 5418 valid_num = 834 test_num = 2084 has_categorical = 0 has_missing = 0 is_sparse = 0

the scoring metric is metric = ‘bac_metric’.
The scoring function can be obtained from the original starting kit of the AutoML challenge:

This contains also an older version of the sample code.


How can I implement meta-learning?

You can first train a "filter" using datasets (e.g. from the machine learning repository). The filter could be implemented as a Python object that you can save as a pickle. The pickle can then be reloaded and used as part of the code that you submit to Codalab.


What is the scoring metric for round 1?

It is the balanced accuracy (BAC), renormalized, so actually it is 2*BAC -1. In this way, random predictions get a score close to 0. Because in round 1 we have balanced classes, the accuracy and the balanced accuracy are identical.


Do I need to normalize the outputs between 0 and 1?

Not necessarily, but your output will be thresholded at 0.5 to compute the balanced accuracy.


Are we continuing to work on the same datasets throughout the week?

Yes. But now we added an AutoML 1.5 phase.


Are we limited to 10 submissions per day?



Could we get a real AutoML test?

Some people complained that with only a Tweakathin phase, they will not be tested for being able to build a "perfect black box" since they can tune the hyperparameters for each dataset. So we now added an extra AutoML phase in which the code will be blind tested on 5 new binary classification problems you have not seen before.


Are there any differences between the datasets of round 1 and those of AutoML 1.5?

They are similar. All datasets are binary classification problems. The class proportions are not necessarily strictly identical, but they are similar. The input matrices are dense or sparse. The input matrices do not include missing data or categorical variables. They sometimes include irrelevant variables (so feature selection may be useful). The proportions of number of training examples over number of features Ptr/N vary over a wide range. New: the scoing metric is the auc_metric.


My number of downloads does not match my coopetition score, why?

First your own downloads do not count. Second, the coopetition score is refreshed only at each new submission.



This challenge is brought to you by ChaLearn. Contact the organizers.


The organization of this challenge would not have been possible without the help of many people who are gratefully acknowledged.


Any opinions, findings, and conclusions or recommendations expressed in material found on this website are those of their respective authors and do not necessarily reflect the views of the sponsors. The support of the sponsors does not give them any particular right to the software and findings of the participants.


Microsoft supported the organization of this challenge and donated the prizes.


This challenge is part of the official selection of IJCNN competitions.

LIF Archimede AMU ETH

This project received additional support from the Laboratoire d'Informatique Fondamentale (LIF, UMR CNRS 7279) of the University of Aix Marseille, France, via the LabeX Archimede program, the Laboratoire de Recheche en Informatique of Paris Sud University, and INRIA Saclay, as part of the TIMCO project. Computing resources were provided generously by Joachim Buhmann, ETH Zuerich.


Isabelle Guyon, ChaLearn, Berkeley, California, USA
Evelyne Viegas, Microsoft Research, Redmond, Washington, USA

Data providers:

We selected the 30 datasets used in the challenge among 72 datasets that were donated or formatted using data publicly available by:
Yindalon Aphinyanaphongs, New-York University, New-York, USA
Olivier Chapelle, Criteo, California, USA
Hugo Jair Escalante, INAOE, Puebla, Mexico    
Sergio Escalera, University of Barcelona, Catalonia, Spain
Isabelle Guyon, ChaLearn, Berkeley, California, USA
Zainab Iftikhar Malhi, University of Lahore, Pakistan
Vincent Lemaire, Orange research, Lannion, Britany, France
Chih Jen Lin, National Taiwan University, Taiwan
Meysam Madani, University of Barcelona, Catalonia, Spain
Bisakha Ray, New-York University, New-York, USA
Mehreen Saeed, University of Lahore, Pakistan
Alexander Statnikov, American Express, New-York, USA
Gustavo Stolovitzky, IBM Computational Biology Center, Yorktown Heights, New York, USA
Hans-Jürgen Thiesen, Universität Rostock, Germany
Ioannis Tsamardinos, University of Crete, Greece

Committee members, advisors and beta testers:

Kristin Bennett, RPI, New-York, USA
Marc Boullé, Orange research, Lannion, Britany, France
Cecile Capponi, University of Aix-Marseille, France
Richard Caruana, Microsoft Research, Redmond, Washington, USA
Gavin Cawley, University of East Anglia, UK
Gideon Dror, Yahoo!, Haifa, Israel
Hugo Jair Escalante, INAOE, Puebla, Mexico
Sergio Escalera, University of Barcelona, Catalonia, Spain
Cécile Germain, Université de Paris Sud, France
Tin Kam Ho, IBM Watson Group, Yorktown Heights, New-York, USA
Balázs Kégl, Université de Paris Sud, France
Hugo Larochelle, Université de Sherbrooke, Canada
Vincent Lemaire, Orange research, Lannion, Britany, France
Chih Jen Lin, National Taiwan University, Taiwan
Víctor Ponce López, University of Barcelona, Catalonia, Spain
Nuria Macia, Universitat Ramon Llull, Barcelona, Spain
Simon Mercer, Microsoft, Redmond, Washington, USA
Florin Popescu, Fraunhofer First, Berkin, Germany
Mehreen Saeed, University of Lahore, Pakistan
Michèle Sebag, Université de Paris Sud, France
Danny Silver, Acadia University, Wolfville, Nova Scotia, Canada
Alexander Statnikov, American Express, New-York, USA
Ioannis Tsamardinos, University of Crete, Greece

Codalab and other software development

Eric Camichael, Tivix, San Francisco, California, USA
Isabelle Guyon, ChaLearn, Berkeley, California, USA
Ivan Judson, Microsoft, Redmond, Washington, USA
Christophe Poulain, Microsoft Research, Redmond, Washington, USA
Percy Liang, Stanford University, Palo Alto, California, USA
Arthur Pesah, Lycée Henri IV, Paris, France
Lukasz Romaszko, ChaLearn, California, USA
Xavier Baro Sole, University of Barcelona, Barcelona, Spain
Erick Watson, Sabthok International, Redmond, Washington, USA
Michael Zyskowski, Microsoft Research, Redmond, Washington, USA



This challenge is brought to you by ChaLearn. Contact the organizers.


Start: July 31, 2015, 1 a.m.

Description: Submit code capable of producing predictions on both VALIDATION AND TEST DATA. The leaderboard shows scores on phase 1 validation data only. Max number of submissions: 10


Start: Aug. 4, 2015, 3 p.m.

Description: Results on test data of phase 1. There is NO NEW SUBMISSION. The results on test data of the last submission are shown.

AutoML 1.5

Start: Aug. 5, 2020, 9 a.m.

Description: Blind test on 5 new datasets (binary classification). There is NO NEW SUBMISSION. The code of the previous phase is submitted and run automatically.

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 venus 1.00
2 rhea 2.00
3 ceres 2.50