Here is a sample result submission for round 4.
Here is a sample code submission (works of all rounds).
More information about the (new version of) the starting kit is found [here].
This challenge is concerned with regression and classification problems (binary, multi-class, or multi-label) from data already formatted in fixed-length feature-vector representations. Each task is associated with a dataset coming from a real application. The domains of application are very diverse and are drawn from: biology and medicine, ecology, energy and sustainability management, image, text, audio, speech, video and other sensor data processing, internet social media management and advertising, market analysis and financial prediction.
All datasets present themselves in the form of data matrices with samples in lines and features (or variables) in columns. For instance, in a medical application, the samples may represent patient records and the features may represent results of laboratory analyses. The goal is to predict a target value, for instance the diagnosis "diseased" or "healthy" in the case of a medical diagnosis problem.
The identity of the datasets and the features is concealed (except in round 0) to avoid the use of domain knowledge and push the participants to design fully automated machine learning solutions.
In addition, the tasks are constrained by:
Task, scoring metric and time budget are provided with the data, in a special "info" file.
The Codalab platform provides computational resources shared by all participants. To ensure the fairness of the evaluation, when a code submission is evaluated, its execution time is limited to a given Time Budget, which varies from dataset to dataset. The time budget is provided with each dataset in its "info" file. The organizers reserve the right to adjust the time budget by supplying the participants with new info files.
The participants who submit results (instead of code) are NOT constrained by the Time Budget, since they can run their code on their own platform. This may be advantageous for entries counting towards the Final phases (immediately following a Tweakathon). The participants wishing to also enter the AutoML phases, which require submitting code, can submit BOTH results and code (simultaneously). See the Instructions for details.
The participants must turn in prediction values matching as closely as possible the target value, in the form of:
The Starting Kit contains the Python implementation of all scoring metrics used to evaluate the entries. Each dataset has its own metric (scoring criterion), specified in its "info" file. All scores are re-normalized such that the expected value of the score for a "trivial guess" based on class prior probabilities is 0 and the optimal score is 1. Multi-label problems are treated as multiple binary classification problems and are evaluated by the average of the scores of each binary classification sub-problem.
The scores are taken from the following list:
We note that for R2, ABS, and PAC the normalization uses a "trivial guess" corresponding to the average target value qi =< yi > or qik=< yik >. In contrast, for BAC, AUC, and F1 the "trivial guess" is a random prediction of one of the classes with uniform probability.
In all formulas the brackets < . > designates the average over all P samples indexed by i: < yi > = (1/P) sumi (yi). Only R2 and ABS make sense for regression; we compute the other scores for completeness by replacing the target values by binary values after thresholding them in the mid-range.
Each round includes five datasets from different application domains, spanning various levels of difficulty. The participants (or their submitted programs) provide prediction results for the withheld target values (called "solution"), for all 5 datasets. Independently of any intervention of the participants, the original version of the scoring program supplied by the organizers is run on the server to compute the scores. For each dataset, the participants are ranked in decreasing order of performance for the prescribed scoring metric associated with the given task. The overall score is computed by averaging the ranks over all 5 datasets and shown in the column <rank> on the leaderboard.
We ask the participants to test their systems regularly while training to produce intermediate prediction results, which will allow us to make learning curves (performance as a function of training time). Using such learning curves, we will adjust the "time budget" in subsequent rounds (eventually giving you more computational time!). But only the last point (corresponding to the file with the largest order number) is used for leaderboard calculations.
The results of the LAST submission made are used to compute the leaderboard results (so you must re-submit an older entry that you prefer if you want it to count as your final entry). This is what is meant by “Leaderboard modifying disallowed”. In phases marked with a [+], the participants with the three smallest <rank> are eligible for prizes, if they meet the Terms and Conditions.
For each dataset, a labeled training set is provided for training and two unlabeled sets (validation set and test set) are provided for testing.
The challenge is run in multiple Phases grouped in rounds, alternating AutoML contests and Tweakathons. There are 6 six rounds: Round 0 (Preparation round), followed by 5 rounds of progressive difficulty (Novice, Intermediate, Advanced, Expert, and Master). Except for round 0 (preparation) and round 5 (termination), all rounds include 3 phases, alternating Tweakathons and AutoML contests:
|Phase in round [n]||Goal||Duration||Submissions||Data||Leaderboard scores||Prizes|
|[+] AutoML[n]||Blind test of code||Short||NONE (code migrated)||New datasets, not downloadable||Test set results||Yes|
|Tweakathon[n]||Manual tweaking||1 month||Code and/or results||Datasets downloadable||Validation set results||No|
|[+] Final[n]||Results of Tweakathon revealed||Short||NONE (results migrated)||NA||Test set results||Yes|
The results of the last submission made are shown on the leaderboard. Submissions are made in Tweakathon phases only. The last submission of one phase migrates automatically to the next one. If code is submitted, this makes it possible to participate to subsequent phases without making new submissions. Prizes are attributed for phases marked with a [+] during which there is NO submission. The total prize pool is $30,000 (see Rewards and Terms and Conditions for details).
To participate in the AutoML[n] phase, code must be submitted in Tweakathon[n-1]. To participate in the Final[n], code or results must be submitted in Tweakathon[n]. If both code and (well-formatted) results are submitted, in Tweakathon[n] the results are used for scoring rather than re-running the code in Tweakathon[n] and Final[n]. The code is executed when results are unavailable or not well formatted. Hence there is no disadvantage to submitting both results and code. There is no obligation to submit the code, which has produced the results provided. Using mixed submissions of results and code, different methods can be used to enter the Tweakathon/Final phases and to enter the AutoML phases.
There are 5 datasets in each round spanning a range of difficulties:
We will progressively introduce difficulties from round to round (each round cumulating all the difficulties of the previous ones plus new ones): Some datasets may be recycled from previous challenges, but reformatted into new representations, except for the final MASTER round, which includes only completely new data.
Start: Dec. 7, 2015, midnight
Description: Continue practicing on the same data. In preparation for round 5, submit code capable of producing predictions on both VALIDATION AND TEST DATA. The leaderboard shows scores on validation data only.
Start: Dec. 31, 2050, 11 p.m.
Description: Results on test data of round 4. There is NO NEW SUBMISSION. The results on test data of the last submission are shown. [+] Prize winning phase.
You must be logged in to participate in competitions.Sign In