Fully Automatic Machine Learning
without ANY human intervention
Machine learning has achieved great success in online advertising, recommender system, financial market analysis, computer vision, linguistics, bioinformatics and many other fields, but these achievements crucially depend on human machine-learning experts. In almost all of these successful machine learning applications, human experts are involved in all machine learning stages including: transforming real world problems into machine learning tasks, collecting data, doing feature engineering, selecting or designing the model architecture, tuning model’s hyper-parameters, evaluating model’s performance, deploying the machine learning system in online systems and so on. As the complexity of these tasks is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML (Automatic Machine Learning). In this challenge you are asked to provide code for solving real world classification problems without any human intervention. During the feed-back phase you can submit your code, that will be evaluated on public data sets, you will receive immediate feedback on the performance of your method. Since the final goal of the challenge is to perform AutoML, your last code submission in the feedback phase will be used with five other private data sets. The performance in these latter data sets will be considered for ranking participants.
There is also a phase in which you can submit predictions, although the goal of the challenge is on AutoML.
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The data are provided as preprocessed matrices, so that participants can focus on classification, although participants are welcome to use additional feature extraction procedures (as long as they do not violate any rule of the challenge). All problems are binary classification problems and are assessed with the normalized Area Under the ROC Curve (AUC) metric (i.e. 2*AUC-1).
The identity of the datasets and the type of data is concealed, though its structure is revealed. The final score in phase 2 will be the average of rankings on all testing datasets, a ranking will be generated from such results, and winners will be determined according to such ranking.
The tasks are constrained by a time budget. The Codalab platform provides computational resources shared by all participants. Each code submission will be exceuted in a compute worker with the following characteristics: 2Cores / 8G Memory / 40G SSD with Ubuntu OS. To ensure the fairness of the evaluation, when a code submission is evaluated, its execution time is limited in time.
The challenge has two phases:
During the feedback phase, the results of your last submission on test data are shown on the leaderboard. Prizes will be awarded in Phase 2 only.
Prizes sponsored by 4paradigm will be granted to top ranking participants, provided the comply with the rules of the challenge (see the terms and conditions, section). The distribution of prizes will be as follows.
* A fraction of the prize amount might be used as travel grant to attend the conference and workshop.
Start: Nov. 30, 2017, midnight
Description: Practice on five datasets similar to those of the AutoML phase. You can make multiple submissions of code. The results on test data are shown on the leaderboard.
Start: March 12, 2018, 11:59 p.m.
Description: Your last CODE submission of the first phase will be blindly tested on new datasets. No new submission is made in this phase.
March 15, 2018, midnight
You must be logged in to participate in competitions.Sign In