ChaLearn Automatic Machine Learning Challenge (AutoML)

Organized by automl.chalearn - Current server time: July 22, 2018, 4:38 p.m. UTC
Reward $30,000

First phase

Tweakathon0
Dec. 8, 2014, midnight UTC

End

Competition Ends
June 25, 2016, midnight UTC

Fully Automatic Machine Learning without ANY human intervention

Spiral


This is a "supervised learning" challenge in machine learning. We are making available 30 datasets, all pre-formatted in given feature representations (this means that each example consists of a fixed number of numerical coefficients). The challenge is to solve classification and regression problems, without any further human intervention.

The difficulty is that there is a broad diversity of data types and distributions (including balanced or unbalanced classes, sparse or dense feature representations, with or without missing values or categorical variables, various metrics of evaluation, various proportions of number of features and number of examples). The problems are drawn from a wide variety of domains and include medical diagnosis from laboratory analyses, speech recognition, credit rating, prediction or drug toxicity or efficacy, classification of text, prediction of customer satisfaction, object recognition, protein structure prediction, action recognition in video data, etc. While there exist machine learning toolkits including methods that can solve all these problems, it is still considerable human effort to find, for a given combination of dataset, task, metric of evaluation, and available computational time, the combination of methods and hyper-parameter setting that is best suited. Your challenge is to create the "perfect black box" eliminating the human in the loop.

This is a challenge with code submission: your code will be executed automatically on our servers to train and test your learning machines with unknown datasets. However, there is NO OBLIGATION TO SUBMIT CODE. Half of the prizes can be won by just submitting prediction results. There are six rounds (Prep, Novice, Intermediate, Advanced, Expert, and Master) in which datasets of progressive difficulty are introduced (5 per round). There is NO PREREQUISITE TO PARTICIPATE IN PREVIOUS ROUNDS to enter a new round. The rounds alternate AutoML phases in which submitted code is "blind tested" in limited time on our platform, using datasets you have never seen before, and Tweakathon phases giving you time to improve your methods by tweaking them on those datasets and running them on your own systems (without computational resource limitation).

CHALEARN

This challenge is brought to you by ChaLearn. Contact the organizers.

Evaluation

Tasks

This challenge is concerned with regression and classification problems (binary, multi-class, or multi-label) from data already formatted in fixed-length feature-vector representations. Each task is associated with a dataset coming from a real application. The domains of application are very diverse and are drawn from: biology and medicine, ecology, energy and sustainability management, image, text, audio, speech, video and other sensor data processing, internet social media management and advertising, market analysis and financial prediction.
All datasets present themselves in the form of data matrices with samples in lines and features (or variables) in columns. For instance, in a medical application, the samples may represent patient records and the features may represent results of laboratory analyses. The goal is to predict a target value, for instance the diagnosis "diseased" or "healthy" in the case of a medical diagnosis problem.
The identity of the datasets and the features is concealed (except in round 0) to avoid the use of domain knowledge and push the participants to design fully automated machine learning solutions.
In addition, the tasks are constrained by:

  • A Time Budget.
  • A Scoring Metric.

Task, scoring metric and time budget are provided with the data, in a special "info" file.

Time Budget

The Codalab platform provides computational resources shared by all participants. To ensure the fairness of the evaluation, when a code submission is evaluated, its execution time is limited to a given Time Budget, which varies from dataset to dataset. The time budget is provided with each dataset in its "info" file. The organizers reserve the right to adjust the time budget by supplying the participants with new info files.
The participants who submit results (instead of code) are NOT constrained by the Time Budget, since they can run their code on their own platform. This may be advantageous for entries counting towards the Final phases (immediately following a Tweakathon). The participants wishing to also enter the AutoML phases, which require submitting code, can submit BOTH results and code (simultaneously). See the Instructions for details.

Scoring Metrics

The scoring program computes a score by comparing submitted predictions with reference "target values". For each sample i, i=1:P, the target value is:

  • a continuous numeric coefficient yi, for regression problem;
  • a vector of binary indicators [yik] in {0, 1}, for multi-class or multi-label classification problems (one per class k);
  • a single binary indicator yi in {0, 1}, for binary classification problems.

The participants must turn in prediction values matching as closely as possible the target value, in the form of:

  • a continuous numeric coefficient qi for regression problem;
  • a vector of numeric coefficients [qik] in the range [0, 1] for multi-class or multi-label classification problems (one per class k);
  • a single numeric coefficients qi in the range [0, 1] for binary classification problems.

The Starting Kit contains the Python implementation of all scoring metrics used to evaluate the entries. Each dataset has its own metric (scoring criterion), specified in its "info" file. All scores are re-normalized such that the expected value of the score for a "trivial guess" based on class prior probabilities is 0 and the optimal score is 1. Multi-label problems are treated as multiple binary classification problems and are evaluated by the average of the scores of each binary classification sub-problem.
The scores are taken from the following list:

  • R2: R-square or "coefficient of determination" used for regression problems: R2 = 1-MSE/VAR, where MSE=< (yi - qi)2> is the mean-square-error and VAR= < (yi - m)2> is the variance, with m=< yi >.
  • ABS: A coefficient similar to the R2 but based on mean absolute error (MAE) and mean absolute deviation (MAD): ABS =  1-MAE/MAD, with MAE=< abs(yi - qi) > and MAD=< abs(yi - m) >.
  • BAC: Balanced accuracy, which is the average of class-wise accuracy for classification problems (or the average of sensitivity (true positive rate) and specificity (true negative rate) for the special case of binary classification). For binary classification problems, the class-wise accuracy is the fraction of correct class predictions when qi is thresholded at 0.5, for each class. The class-wise accuracy is averaged over all classes for multi-label problems. For multi-class classification problems, the predictions are binarized by selecting the class with maximum prediction value argmaxk qik before computing the class-wise accuracy. We normalize the BAC with the formula BAC := (BAC-R)/(1-R), where R is the expected value of BAC for random predictions (i.e. R=0.5 for binary classification and R=(1/C) for C-class classification problems).
  • AUC: Area under the ROC curve, used for ranking and for binary classification problems. The ROC curve is the curve of sensitivity vs. 1-specificity, when a threshold is varied on the predictions. The AUC is identical to the BAC for binary predictions. The AUC is calculated for each class separately before averaging over all classes. We normalize it with the formula: AUC := 2AUC-1, making it de-facto identical to the so-called Gini index.
  • F1 score: The harmonic mean of precision and recall. Precision=positive predictive value=true_positive/all_called_positive. Recall=sensitivity=true positive rate=true_positive/all_real_positive. Prediction thresholding and class averaging is handled similarly as in the case of the BAC. We also normalize F1 with F1 := (F1-R)/(1-R), where R is the expected value of F1 for random predictions (i.e. R=0.5 for binary classification and R=(1/C) for C-class classification problems).
  • PAC: Probabilistic accuracy PAC = exp(- CE) based on the cross-entropy or log loss, CE = - < sumk log(qik) > for multi-class classification and CE = - <yi log(qi) + (1-yi) log(1-qi)> for binary classification and multi-label problems. Class averaging is performed after taking the exponential in the multi-label case. We normalize with PAC := (PAC-R)/(1-R), where R is the score obtained using qi =< yi > or qik=< yik > (i.e. using as predictions the fraction of positive class examples as an estimate of the prior probability).

We note that for R2, ABS, and PAC the normalization uses a "trivial guess" corresponding to the average target value qi =< yi > or qik=< yik >. In contrast, for BAC, AUC, and F1 the "trivial guess" is a random prediction of one of the classes with uniform probability.
In all formulas the brackets < . > designates the average over all P samples indexed by i: < yi > = (1/P) sumi (yi). Only R2 and ABS make sense for regression; we compute the other scores for completeness by replacing the target values by binary values after thresholding them in the mid-range.

Leaderboard score calculation

Each round includes five datasets from different application domains, spanning various levels of difficulty. The participants (or their submitted programs) provide prediction results for the withheld target values (called "solution"), for all 5 datasets. Independently of any intervention of the participants, the original version of the scoring program supplied by the organizers is run on the server to compute the scores. For each dataset, the participants are ranked in decreasing order of performance for the prescribed scoring metric associated with the given task. The overall score is computed by averaging the ranks over all 5 datasets and shown in the column <rank> on the leaderboard.

We ask the participants to test their systems regularly while training to produce intermediate prediction results, which will allow us to make learning curves (performance as a function of training time). Using such learning curves, we will adjust the "time budget" in subsequent rounds (eventually giving you more computational time!). But only the last point (corresponding to the file with the largest order number) is used for leaderboard calculations.

The results of the LAST submission made are used to compute the leaderboard results (so you must re-submit an older entry that you prefer if you want it to count as your final entry). This is what is meant by “Leaderboard modifying disallowed”. In phases marked with a [+], the participants with the three smallest <rank> are eligible for prizes, if they meet the Terms and Conditions.

Training, validation and test sets

For each dataset, a labeled training set is provided for training and two unlabeled sets (validation set and test set) are provided for testing.

Phases and rounds

The challenge is run in multiple Phases grouped in rounds, alternating AutoML contests and Tweakathons. There are 6 six rounds: Round 0 (Preparation round), followed by 5 rounds of progressive difficulty (Novice, Intermediate, Advanced, Expert, and Master). Except for round 0 (preparation) and round 5 (termination), all rounds include 3 phases, alternating Tweakathons and AutoML contests:

Phase in round [n] Goal Duration Submissions Data Leaderboard scores Prizes
[+] AutoML[n] Blind test of code Short NONE (code migrated) New datasets, not downloadable Test set results Yes
Tweakathon[n] Manual tweaking 1 month Code and/or results Datasets downloadable Validation set results No
[+] Final[n] Results of Tweakathon revealed Short NONE (results migrated) NA Test set results Yes


The results of the last submission made are shown on the leaderboard. Submissions are made in Tweakathon phases only. The last submission of one phase migrates automatically to the next one. If code is submitted, this makes it possible to participate to subsequent phases without making new submissions. Prizes are attributed for phases marked with a [+] during which there is NO submission. The total prize pool is $30,000 (see Rewards and Terms and Conditions for details).

Code vs. result submission

To participate in the AutoML[n] phase, code must be submitted in Tweakathon[n-1]. To participate in the Final[n], code or results must be submitted in Tweakathon[n]. If both code and (well-formatted) results are submitted, in  Tweakathon[n] the results are used for scoring rather than re-running the code in Tweakathon[n] and Final[n]. The code is executed when results are unavailable or not well formatted. Hence there is no disadvantage to submitting both results and code. There is no obligation to submit the code, which has produced the results provided. Using mixed submissions of results and code, different methods can be used to enter the Tweakathon/Final phases and to enter the AutoML phases.

Datasets

There are 5 datasets in each round spanning a range of difficulties:

  • Different tasks: regression, binary classification, multi-class classification, multi-label classification.
  • Class balance: Balanced or unbalanced class proportions.
  • Sparsity: Full matrices or sparse matrices.
  • Missing values: Presence or absence of missing values.
  • Categorical variables: Presence or absence of categorical variables.
  • Irrelevant variables: Presence or absence of additional irrelevant variables (distractors).
  • Number Ptr of training examples: Small or large number of training examples.
  • Number N of variables/features: Small or large number of variables.
  • Aspect ratio Ptr/N of the training data matrix: Ptr>>N, Ptr~=N or Ptr<<N.

We will progressively introduce difficulties from round to round (each round cumulating all the difficulties of the previous ones plus new ones): Some datasets may be recycled from previous challenges, but reformatted into new representations, except for the final MASTER round, which includes only completely new data.

  1. NOVICE: Binary classification problems only; no missing data; no categorical variables; moderate number of features (<2000); balanced classes; BUT sparse and full matrices; presence of irrelevant variables; various Ptr/N.
  2. INTERMEDIATE: Multi-class and binary classification problems + additional difficulties including: unbalanced classes; small and large number of classes (several hundred); some missing values; some categorical variables; up to 7000 features.
  3. ADVANCED: All types of classification problems, including multi-label + additional difficulties including: up to 300,000 features.
  4. EXPERT: Classification and regression problems, all difficulties.
  5. MASTER: Classification and regression problems, all difficulties, completely new datasets.

 

CHALEARN

This challenge is brought to you by ChaLearn. Contact the organizers.

Challenge Rules

  • General Terms: This challenge is governed by the General ChaLearn Contest Rule Terms, the Codalab Terms and Conditions, and the specific rules set forth.
  • Announcements: To receive announcements and be informed of any change in rules, the participants must provide a valid email.
  • Conditions of participation: Participation requires complying with the rules of the challenge. Prize eligibility is restricted by US government export regulations, see the General ChaLearn Contest Rule Terms. The organizers, sponsors, their students, close family members (parents, sibling, spouse or children) and household members, as well as any person having had access to the truth values or to any information about the data or the challenge design giving him (or her) an unfair advantage, are excluded from participation. A disqualified person may submit one or several entries in the challenge and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If a disqualified person submits an entry, this entry will not be part of the final ranking and does not qualify for prizes. The participants should be aware that ChaLearn and the organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for prizes.
  • Dissemination: The participants will be invited to attend a workshop organized in conjunction with a major machine learning conference and contribute to the proceedings. The challenge is part of the competition program of the IJCNN 2015 conference.
  • Registration: The participants must register to Codalab and provide a valid email address. Teams must register only once and provide a group email, which is forwarded to all team members. Teams or solo participants registering multiple times to gain an advantage in the competition may be disqualified.
  • Anonymity: The participants who do not present their results at the workshop can elect to remain anonymous by using a pseudonym. Their results will be published on the leaderboard under that pseudonym, and their real name will remain confidential. However, the participants must disclose their real identity to the organizers to claim any prize they might win. See our privacy policy for details. If a participant provides his real name, it will appear on the learderboard and may be used by the Codalab platform provider at his discretion (see Outercurve Privacy and Terms).
  • Submission method: The results must be submitted through this CodaLab competition site. The participants can make up to 5 submissions per day in the Tweakathon phases. Using multiple accounts to increase the number of submissions in NOT permitted. There are NO submissions in the Final and AutoML phases (the submissions from the previous Tweakathon phase migrate automatically). In case of problem, send email to events@chalearn.org. The entries must be formatted as specified on the Evaluation page.
  • Awards: The three top ranking participants of each Final or AutoML phase may qualify for awards (cash prize, travel award, and award certificate). To compete for awards, the participants must fill out a fact sheet briefly describing their methods. There is no other publication requirement. The winners will be required to make their code publicly available under an OSI-approved license such as, for instance, Apache 2.0, MIT or BSD-like license, if they accept their prize, within a week of the deadline for submitting the final results. The winners of each of the 10 prize-winning phases (indicated by a [+] in the Phases data-toggle="tab" page) will de determined according to leaderboard ranking (see the Evaluation page). In AutoML phases, entries exceeding the total time budget of the 5 tasks will not qualify for prizes. In case of a tie, the prize will go to the participant who submitted his/her entry first. Non winners or entrants who decline their prize retain all their rights on their entries and are not obliged to publicly release their code. However, Outercurve, the organization running the Codalab platform, has specific conditions on their use of entries that must be reviewed and accepted by the participants and are out of the control of the organizers.
  • Travel awards: The travel awards may be used to attend a workshop organized in conjunction with the challenge. The award money will be granted in reimbursement of expenses including airfare, ground transportation, hotel, or workshop registration. Reimbursement is conditioned on (i) attending the workshop, (ii) making an oral presentation of the methods used in the challenge, and (iii) presenting original receipts and boarding passes. The reimbursements will be made after the workshop.

CHALEARN

This challenge is brought to you by ChaLearn. Contact the organizers.

Rewards

The key to the treasure is the treasure. We hope that your biggest reward will be the learning and research experience:

  • Discovering or revisiting fascinating machine learning methods.
  • Combining them to beat the "no free lunch theorem" by creating the first universal AutoML machine.
  • Solving along the way the problem of optimizing arbitrary objectives (embodied by various scoring metrics).
  • Solving the problem of learning in constant time (managing a time budget).

Another reward will be the social experience : meeting other smart people and disseminating your ideas at workshops organized in conjunction with high profile conferences (we are already part of the IJCNN competition program and will be submitting workshop proposals to ICML and NIPS). We will also organize proceedings and will offer you to contribute to a crowd-sourced paper written by participants and organizers to disseminate your results.

As a additional incentive, there is a prize pool of USD 30,000 donated by Microsoft. All 10 phases marked with a [+] (Final or AutoML) will be attributed:

  • First place: USD 1500 (*) + Award certificate
  • Second place: USD 900 (*) + Award certificate
  • Third place: USD 600 (*) + Award certificate

(*) 1/3 is cash and 2/3 in travel award. See Terms and Conditions.

 

CHALEARN

This challenge is brought to you by ChaLearn. Contact the organizers.

Instructions

Data

The datasets are downloadable from the Dataset page.

Code or result submission

The participants must submit a zip file with their code and/or results via the Submission page. Get started in minutes: we provide a kit including sample submissions and step-by-step instructions. Starting Kit

Participation does not require submitting code, but, if you submit code for evaluation in a given AutoML phase, it must be submitted during the Tweakathon of the PREVIOUS round. ONLY TWEAKATHON PHASES TAKE SUBMISSIONS. Phases marked with a [+] report results on submissions that are forwarded automatically from the previous phase.

The sample submission can be used to submit results, code, or both:

  • Result submission: To submit prediction results, you must run your code on your own machine. You will need first to download the Datasets and the Starting Kit. Always submit both validation and test set results simultaneously, to be ranked on the leaderboard during the "Tweakathon" phase (using the validation set) and during the "Final" phase (using the test set). Result submissions will NOT allow you to participate in the "AutoML" phase.
  • Code submission: We presently support submission of Python code. An example is given in the Starting Kit. If you want to make entries with other languages, please contact us. In principle, the Codalab platform can accept submissions of any Linux executable, but this has not been test yet. If you submit code, make sure it produces results on both validation and test data. It will be used for training and testing in all subsequent phases and rounds until you submit new code.
  • Result and code submission: If you submit both results and code, your results will be used for the Tweakathon and Final phases of the present round; your code will be used for the next AutoML phase (and all subsequent phases and rounds), unless you submit new code.

There is no disadvantage to submit both results and code. The results do not need to have been produced by the code you submit. For instance, you can submit the sample code together with your results if you do not want to submit your own code. You can submit results of models manually tweaked during the Tweakathon phases.

Input format and computational restrictions

The input format is specified on the Dataset page. It includes the prescribed "time budget" for each task (in seconds), which is different for each dataset. In round 0, the total time allowed for all tasks is about half an hour, so BE PATIENT this is how long it will take for the sample code we provide to run when you submit it. Submissions of results are processed much faster, in a few minutes.

Result submission format

A sample result submission is provided with the Starting Kit. All result files should be formatted as text files ending with a ".predict" extension, with one result per sample per line, in the order of the samples:

  • Regression problems: one numeric value per line.
  • Binary classification problems: one numeric value between 0 and 1 to per line, indicating a score of class 1 membership (1 is certainty of class 1, 0.5 is a random guess, 0 is certainty of class 0).
  • Multiclass or multilabel problems: for C classes, C numeric values between 0 and 1 per line, indicating the scores of membership of the C classes. The scores add up to 1 for multiclass problems only.

We ask the participants to test their models regularly and produce intermediate prediction results, numbered from num=0 to n. The following naming convention of the files should be respected:
    [basename]_[setname]_[num].predict
where "basename" is the dataset name (e.g. adult, cadata, digits, dorothea, or newsgroups, in the first round), "setname" is either "valid" (validation set) or "test" (test set) and "num" is the order number of prediction results submitted. Please use the format 03d to number your submissions because we sort the file names in alphabetical order to determine the result order.

For example, in the first round, you would bundle for submission the following files in a zip archive (no directory structure):

  • adult_valid_000.predict
  • adult_valid_001.predict
  • adult_valid_002.predict
  • ...
  • adult_test_000.predict
  • adult_test_001.predict
  • adult_test_002.predict
  • ...
  • cadata_valid_000.predict
  • cadata_valid_001.predict
  • cadata_valid_002.predict
  • ...
  • cadata_test_000.predict
  • cadata_test_001.predict
  • cadata_test_002.predict
  • ...
  • etc.

The last result file for each set (with largest number num) is used for scoring. It is useful however to provide intermediate results: ALL the results are used by the organizers to make learning curves and infer whether performance improvements could be gained by increasing the time budget. This will affect the time budget allotted in subsequent rounds.

CHALEARN

This challenge is brought to you by ChaLearn. Contact the organizers.

Forum

Please subscribe to our Google group to post messages on the forum send email to automl@googlegroups.com.



CHALEARN

This challenge is brought to you by ChaLearn. Contact the organizers.

Help

Can I enter the challenge if I did not make submissions to previous rounds?

Yes!

Can I enter during a Final or AutoML phase?

No: all entries must be made during the Tweakathon phases.

Where can I download the data?

From the Data page, under the Participate tab. You first need to register to have access to it.

How do I make submissions?

Register and go to the Participate tab where you find data, and a submission form.

Do you provide tips on how to get started?

We provide a Starting Kit, see Step-by-step instructions.

Are there prizes?

Yes, see Rewards and Terms and Conditions.

Do I need to submit code to participate?

No. You can submit prediction results only, if you don't want to submit code. This will allow you to see your performances on the leaderboard during Tweakathon and Final phases (but not AutoML phases), provided that you submit results both on validation data and on test data during the Tweakathon phase. Result submissions allow you to compete and get ranked in Final phases without disclosing your code. Only if you win, you will need to make your code publicly available to earn your prizes.

If I submit code, do I surrender all rights to that code to the sponsors or organizers?

No. You just grant to the organizers a license to use your code for evaluation purposes during the challenge. You retain all other rights. However, the winners will be required to make their code publicly available under an OSI-approved license such as, for instance, Apache 2.0, MIT or BSD-like license, if they accept their prize. See our Terms and Conditions.

However, Outercurve, the organization running the Codalab platform, has the following disclaimer, see their Privacy and Terms conditions: "Outercurve does not want to receive confidential or proprietary information from User through this site. Any material, information, or other communication User transmits or posts ("Communications") to Outercurve’s Web site will be considered non-confidential and non-proprietary and Outercurve will be under no obligation of any kind with respect to such information. Outercurve will be free to reproduce, make derivative works from, use, disclose, and distribute the Communications to others without limitation. At our sole election, Outercurve may provide authorship attribution by listing User's name."

Can I enter anonymously?

Yes. You may use a pseudonym to make your entries. However, to receive notifications from the organizers, you must provide a valid email. You will only need to reveal your identity if you want to accept a prize. See our Terms and Conditions.

Can I register multiple times?

No. If you accidentally register multiple times or have multiple accounts from members of the same team, please notify the organizers and use a single account to make submissions. Teams or solo participants with multiple accounts may be disqualified.

But I want both to submit code and not to be limited by the time budget in the Final phases. How can I do that with a single account?

This is easy: you can submit both code and results. Follow the instructions of the Starting Kit. In the Final phase, the results of that phase will be used to compute your score. In the AutoML phase, the code will be run on the new datasets. This way you get the best of both!

How much computational power and memory are available?

For each submission you make, you have the full use of an 8 core x86_64 machine with 56 GB RAM. We will ramp up the compute power as needed. You can get the specifics of the current system when you make a submission and look at the scoring error log.

The sample result submission includes code, why?

All submissions are in fact code submissions, but you do not need to supply your own code, you can keep the sample code. All you need to do is to include your own prediction results in the "res/" subdirectory. In this way, the platform will use those results to compute the scores. Respect the file name convention because the scoring program looks for files with dataset names corresponding to the datasets of the corresponding phase.

Do I have to use the Python script run.py as template or can I write my own code?

You can write your own scripts, the sample code is only provided for convenience. But note that the script run.py allows you to submit both code and results. If you make changes and want to submit both code and results you need to support the same functionality because your code is responsible for copying the results to the output when you submit results (all submissions are code submissions; results submission is handled by using a program that just copies the results to the output).

The sample code submission does not give the same results as the reference submission, why?

The sample code is based in part on ensembles of decision trees. You get different results every run. It may be possible to get perfect reproducibility by controlling the random number generator seed.

If I submit both results and code, what is taken into account, results or code?

The sample code is written such that the program first searches for results in the "res/" subdirectory. If it finds files named after the datasets of the current phase, the results are copied to the output directory so they can directly be processed by the scoring program. If there is at least one missing file, the program proceeds with training models and making predictions to produce results. You can change that to computing only missing results if you want.

Why do I need to submit results both on validation and test data?

Validation results are used to rank the participants on the leaderboard during the Tweakathon phases. Test results are used during the Final phases to determine Tweakathon winners. But, we do not let participants make any submission during Final phases. So you must submit results both on validation and test data during the Tweakathon phases (or submit code that produces those results). In this way, the results will quickly appear during the Final phase (lasting 1 day) because they will be precomputed during the Tweakathon (lasting 4 weeks).

Can I use the unlabeled validation and test data for training?

This is not explicitly forbidden, but it is discouraged. Likewise, we prefer if you preprocess data in a way that validation and test data are not preprocessed together with the training data, but the preprocessing parameters are obtained from training data only, then applied to preprocess the validation and test data.

Can I submit results that were not generated by the code submitted?

Yes.

Does is make sense to migrate "result submissions" to the next round?

No. The datasets change between rounds. The results of one round are useless for the next one. Submission migration from round are only useful for code submission. If you submit results and do not make changes to the sample code, your performance in the next round will be that of the sample code.

Where do I find the time budget of each task?

In the "dataset.info" file.

Will the file dataset.info always be present?

Yes, it will always be present. The sample code does not require dataset.info to be present and can make guesses, but we will always provide this fille for the datasets of the AutoML challenge.

Will the data formats and the keys in dataset.info always be the same

Yes. We are not planning to make changes. If changes occur, the participants will be notified. Please provide a valid email.

Some datasets do not have a dataset_feat.type file, is this a bug or do users have to guess?

No, this is not a bug and users do not have to guess. The dataset.info file provides the type of features. The dataset_feat.type file is only necessary when the feature type is “mixed”.

What is the unit of the time budget?

Seconds.

Does the time budget correspond to wall time or CPU time?

We are presently using wall time to measure the total time spent by a participant on the 5 tasks to be performed (corresponding to the 5 datasets of a particular round). Each participant has the full use of a particular machine during the execution of his/her job. It belongs to the participant to make best use of this time, including spreading the computational load among the multiple cores.

My submission seems stuck, how long will it run?

In round 0, in principle no more than 300+200+300+100+300=1200 seconds. We kill the process after 1 hour if something goes wrong.

If I under use my time budget in one phase, can I use it later?

No. Each code run has its own time budget that is the sum of the budgets allocated to the five tasks. You can make 5 runs per day at most.

What happens if I exceed my time budget?

There is some mild tolerance during Tweakathon and Final phases, but the time budget is strictly enforced during AutoML phases. Eventually your process gets killed after the time budget plus a margin is exceeded. To avoid loosing all your results, save intermediate results regularly. For AutoML phases, if your submission has a total running time (reported on the leaderboard) exceeding the total time budget of the five tasks, it will not be eligible for prizes.

Can we distribute the allotted time freely between the 5 datasets?

Each dataset has its own time budget that should be managed separately. But, for practical implementation reasons, only the total time is recorded by the scoring program. So the participants will only be penalized if they go over the total time. However, we do not encourage the participants to re-balance the time between the tasks, each task should be solved within its time budget. But the fact that we only record the total time gives some tolerance and opens the door to possibly make use of this feature to rebalance the time between datasets. There will be no penalty for making use of this.

How can "anytime" Tweakathon and Final phases be comparable to AutoML phases with a time budget?

They are not directly comparable. In Tweakathon phases (and their associated Final phases), you may submit results computed using your own system with unbounded computational resources. Thus people having large systems are at an advantage. In AutoML phases, your code runs on the challenge platform. All users are compared in the same fair way.

Will people submitting code be disadvantaged during Tweakathon phases?

No because they can submit both results and code with their last submission. The results will be taken into account in the Final phase, allowing them to get all the advantages of using their own system. But the code will be migrated to the AutoML phase, allowing them to enter the AutoML contest as well.

The time budget is too small, can you increase it?

We may eventually increase it if we see that the learning curves of the participants are far from reaching an asymptote. This is why it is so important that you compute and save predictions regularly during your model search.

How will the runtimes and dataset sizes change in future rounds?

The runtimes will depend on the number of participants, how the learning curves look like, and our total budget (Microsoft is donating time on its cloud computing facility Azure). We need to split available time in a smart way. We may increase the time by a factor 10.

The dataset sizes vary by orders of magnitude in number of training samples P and number of features N. The largest values are of the order P=800,000, N=300,000 and P*N=2.1010.

Why are you switching metrics all the time?

This is part of the AutoML problem: each task has its own metric. However, we compute also all the other metrics so you can see how robust your method is against metric change. You can of course tune your method (automatically) to the particular metric of the task.

Can I use something else than Python code?

In theory yes: any Linux executable can run on the system. However, we only prepared a starting kit with Python at this stage and have not tested this option. If you are interested, please contact us.

Are there publication opportunities?

Yes, we are part of the IJCNN 2015 competition program and we are planning one or several workshops in conjunction with major machine learning conference (IJCNN, ICML, or NIPS) and proceedings in JMLR Workshop and Conference Proceedings (pending acceptance).

What is meant by "Leaderboard modifying disallowed"?

Your last submission is shown automatically on the leaderboard. You cannot choose which submission to select. Your last submission before the Tweakathon phase ends is your final submission and the submission that will be forwarded to the next round.

What is the file called metadata?

This is a file that you should have in your submitted bundle to indicate to the platform which program must be executed.

How can I debug my code?

Install on your local computer the exact same version of Python and libraries that are installed on Codalab: Anaconda 2.4.0 for Python version: 2.7.10. This should be sufficient to troubleshoot most problems. In particular, check that you are using the following library versions:

  • scikit-learn 0.16.1
  • numpy 1.10.1
  • scipy 0.16.0

To exactly reproduce the environment used on Codalab, the participants can perform the following steps:

  • Create an Azure account.
  • Login to management portal.
  • Create a new virtual machines (Quick Create, Linux server and medium size.)
  • Go to dashboard of new VM and connect via RDP.
  • Install Anaconda.
  • Verify your code runs in this environment.

Can I give an arbitrary hard time to the organizers?

ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". ISABELLE GUYON, CHALEARN, MICROSOFT AND/OR OTHER ORGANIZERS AND SPONSORS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. In case of dispute about prize attribution or possible exclusion from the competition, the participants agree not to take any legal action against the organizers or sponsors. Decisions can be appealed by submitting a letter to Vincent Lemaire, secretary of ChaLearn, and disputes will be resolved by the board of ChaLearn.

How do I get informed of possible changes in rules and of new events?

Give a valid contact email. Subscribe to the forum.

Where can I get additional help?

For questions of general interest, the participants may subscribe to our Google group to post messages on the forum send email to automl@googlegroups.com.

CHALEARN

This challenge is brought to you by ChaLearn. Contact the organizers.

Credits

The organization of this challenge would not have been possible without the help of many people who are gratefully acknowledged.

Sponsors:

Any opinions, findings, and conclusions or recommendations expressed in material found on this website are those of their respective authors and do not necessarily reflect the views of the sponsors. The support of the sponsors does not give them any particular right to the software and findings of the participants.

MSR

Microsoft supported the organization of this challenge and donated the prizes.

IJCNN

This challenge is part of the official selection of IJCNN competitions.

LIF Archimede AMU ETHCDS

This project received additional support from the Laboratoire d'Informatique Fondamentale (LIF, UMR CNRS 7279) of the University of Aix Marseille, France, via the LabeX Archimede program, the Laboratoire de Recheche en Informatique of Paris Sud University, and INRIA-Saclay as part of the TIMCO project, as well as the support from the Paris-Saclay Center for Data Science (CDS). Computing resources were provided generously by Joachim Buhmann, ETH Zuerich.

Coordinators:

Isabelle Guyon, ChaLearn, Berkeley, California, USA
Evelyne Viegas, Microsoft Research, Redmond, Washington, USA

Data providers:

We selected the 30 datasets used in the challenge among 72 datasets that were donated or formatted using data publicly available by:
Yindalon Aphinyanaphongs, New-York University, New-York, USA
Olivier Chapelle, Criteo, California, USA
Hugo Jair Escalante, INAOE, Puebla, Mexico    
Sergio Escalera, University of Barcelona, Catalonia, Spain
Isabelle Guyon, ChaLearn, Berkeley, California, USA
Zainab Iftikhar Malhi, University of Lahore, Pakistan
Vincent Lemaire, Orange research, Lannion, Britany, France
Chih Jen Lin, National Taiwan University, Taiwan
Meysam Madani, University of Barcelona, Catalonia, Spain
Bisakha Ray, New-York University, New-York, USA
Mehreen Saeed, University of Lahore, Pakistan
Alexander Statnikov, American Express, New-York, USA
Gustavo Stolovitzky, IBM Computational Biology Center, Yorktown Heights, New York, USA
Hans-Jürgen Thiesen, Universität Rostock, Germany
Ioannis Tsamardinos, University of Crete, Greece

Committee members, advisors and beta testers:

Kristin Bennett, RPI, New-York, USA
Marc Boullé, Orange research, Lannion, Britany, France
Cecile Capponi, University of Aix-Marseille, France
Richard Caruana, Microsoft Research, Redmond, Washington, USA
Gavin Cawley, University of East Anglia, UK
Gideon Dror, Yahoo!, Haifa, Israel
Hugo Jair Escalante, INAOE, Puebla, Mexico
Sergio Escalera, University of Barcelona, Catalonia, Spain
Cécile Germain, Université de Paris Sud, France
Tin Kam Ho, IBM Watson Group, Yorktown Heights, New-York, USA
Balázs Kégl, Université de Paris Sud, France
Hugo Larochelle, Université de Sherbrooke, Canada
Vincent Lemaire, Orange research, Lannion, Britany, France
Chih Jen Lin, National Taiwan University, Taiwan
Víctor Ponce López, University of Barcelona, Catalonia, Spain
Nuria Macia, Universitat Ramon Llull, Barcelona, Spain
Simon Mercer, Microsoft, Redmond, Washington, USA
Florin Popescu, Fraunhofer First, Berkin, Germany
Mehreen Saeed, University of Lahore, Pakistan
Michèle Sebag, Université de Paris Sud, France
Danny Silver, Acadia University, Wolfville, Nova Scotia, Canada
Alexander Statnikov, American Express, New-York, USA
Ioannis Tsamardinos, University of Crete, Greece

Codalab and other software development

Eric Camichael, Tivix, San Francisco, California, USA
Isabelle Guyon, ChaLearn, Berkeley, California, USA
Ivan Judson, Microsoft, Redmond, Washington, USA
Christophe Poulain, Microsoft Research, Redmond, Washington, USA
Percy Liang, Stanford University, Palo Alto, California, USA
Arthur Pesah, Lycée Henri IV, Paris, France
Lukasz Romaszko, ChaLearn, California, USA
Xavier Baro Sole, University of Barcelona, Barcelona, Spain
Sebastien Treguer, IA Lab / La Paillasse /ChaLearn, France
Erick Watson, Sabthok International, Redmond, Washington, USA
Flavio Zhingri, Tivix, New-York City, USA
Michael Zyskowski, Microsoft Research, Redmond, Washington, USA

 

CHALEARN

This challenge is brought to you by ChaLearn. Contact the organizers.

Tweakathon0

Start: Dec. 8, 2014, midnight

Description: Practice phase on toy data drawn from well-known publicly available data. In preparation for phase 1, submit code capable of producing predictions on both VALIDATION AND TEST DATA. The phase 0 data are available from the 'Get Data' page. The leaderboard shows scores on phase 0 validation data only.

[+] Final0

Start: Feb. 14, 2015, midnight

Description: Results on test data of phase 0. There is NO NEW SUBMISSION. The results on test data of the last submission are shown. [+] Prize winning phase.

[+] AutoML1

Start: Feb. 15, 2015, midnight

Description: NOVICE phase on binary classification problems. Blind test of the code on NEW DATA: There is NO NEW SUBMISSION. The last code submitted in phase 0 is run automatically on the new phase 1 datasets. [+] Prize winning phase.

Tweakathon1

Start: Feb. 16, 2015, midnight

Description: Continue practicing on the same data (the phase 1 data are now available for download from the 'Get Data' page). In preparation for phase 2, submit code capable of producing predictions on both VALIDATION AND TEST DATA. The leaderboard shows scores on phase 1 validation data only.

[+] Final1

Start: June 14, 2015, 11:59 p.m.

Description: Results on test data of phase 1. There is NO NEW SUBMISSION. The results on test data of the last submission are shown. [+] Prize winning phase.

[+] AutoML2

Start: June 19, 2015, 8 a.m.

Description: INTERMEDIATE phase on multiclass classification problems. Blind test of the code on NEW DATA: There is NO NEW SUBMISSION. The last code submitted in phase 1 is run automatically on the new phase 2 datasets. [+] Prize winning phase.

Tweakathon2

Start: June 24, 2015, 11:30 p.m.

Description: Continue practicing on the same data (the data are now available for download from the 'Get Data' page). In preparation for phase 3, submit code capable of producing predictions on both VALIDATION AND TEST DATA. The leaderboard shows scores on phase 2 validation data only.

[+] Final2

Start: Nov. 14, 2015, 11:59 p.m.

Description: Results on test data of phase 2. There is NO NEW SUBMISSION. The results on test data of the last submission are shown. [+] Prize winning phase.

[+] AutoML3

Start: Nov. 15, 2015, 11:59 p.m.

Description: ADVANCED phase on multiclass and multilabel classification problems. Blind test of the code on NEW DATA: There is NO NEW SUBMISSION. The last code submitted in phase 2 is run automatically on the new phase 3 datasets. [+] Prize winning phase.

Tweakathon3

Start: Nov. 26, 2015, 11:59 p.m.

Description: Continue practicing on the same data. In preparation for phase 4, submit code capable of producing predictions on both VALIDATION AND TEST DATA. The leaderboard shows scores on validation data only.

[+] Final3

Start: Feb. 19, 2016, 11:59 p.m.

Description: Results on test data of phase 3. There is NO NEW SUBMISSION. The results on test data of the last submission are shown. [+] Prize winning phase.

[+] AutoML4

Start: Feb. 20, 2016, 11:59 p.m.

Description: EXPERT phase on classification and regression problems. Blind test of the code on NEW DATA: There is NO NEW SUBMISSION. The last code submitted in phase 3 is run automatically on the new phase 4 datasets. [+] Prize winning phase.

Tweakathon4

Start: Feb. 21, 2016, 11:59 p.m.

Description: Continue practicing on the same data. In preparation for phase 5, submit code capable of producing predictions on both VALIDATION AND TEST DATA. The leaderboard shows scores on phase 4 validation data only.

[+] Final4

Start: May 1, 2016, 11:59 p.m.

Description: Results on test data of phase 4. There is NO NEW SUBMISSION. The results on test data of the last submission are shown. [+] Prize winning phase.

[+] AutoML5

Start: May 2, 2016, 11:59 p.m.

Description: MASTER phase on classification and regression problems. Blind test of the code on NEW DATA: There is NO NEW SUBMISSION. The last code submitted in phase 4 is run automatically on the new phase 5 datasets. [+] Prize winning phase.

Competition Ends

June 25, 2016, midnight

You must be logged in to participate in competitions.

Sign In