In many real-world machine learning applications, AutoML is strongly needed due to the limited machine learning expertise of developers. Moreover, batches of data in many real-world applications may be arriving daily, weekly, monthly, or yearly, for instance, and the data distributions are changing relatively slowly over time. This presents a continuous learning, or Lifelong Machine Learning challenge for an AutoML system. Typical learning problems of this kind include customer relationship management, on-line advertising, recommendation, sentiment analysis, fraud detection, spam filtering, transportation monitoring, econometrics, patient monitoring, climate monitoring, manufacturing and so on. In this competition, which we are calling AutoML for Lifelong Machine Learning, large scale datasets collected from some of these real-world applications will be used. Compared with previous AutoML competitions(http://automl.chalearn.org/), the focus of this competition is on drifting concepts, getting away from the simpler i.i.d. cases. Participants are invited to design a computer program capable of autonomously (without any human intervention) developing predictive models that are trained and evaluated in a lifelong machine learning setting under restricted resources and time.
Although the scenario is fairly standard, this challenge introduces the following difficulties:
• Algorithm scalability. We provide datasets that are 10-100 times larger than in previous challenges we organized.
• Varied feature types. Varied feature types are included (continuous, binary, ordinal, categorical, multi-value categorical, temporal). Categorical variables with a large number of values following a power law are included.
• Concept drift. The data distribution is slowly changing over time.
• Lifelong setting. All datasets included in this competition are chronologically splitted into 10 batches, meaning that instance batches in all datasets are chronologically ordered (note that instances in one batch are not guaranteed to be chronologically ordered). The algorithms will be tested for their capability of adapting to changes in data distribution by exposing them to successive test batches chronologically ordered. After testing, the labels will be revealed to the learning machines and incorporated in the training data.
There’re Three phases of the competition:
The Feedback phase is a phase with code submission, you can practice on 5 datasets that are of similar nature as the datasets of the final phase. You can make a limited number of submissions. You can download the labeled training data and the unlabeled test set so that you can prepare your code submission at home and submit it later. The LAST code submission will be forwarded to the next phase for final testing.
The Test phase is the blind test phase with no submission. The last submission of the previous phase is blind tested on 5 new datasets. Your code will be trained and tested automatically, without human intervention. The final score will be evaluated by the result of the blind testing. But if your submission fails, such as memory overflow or timeout, you can see the error log and submit it it again within a specified number of submissions, in order that everyone can run successfully in the final stage.
The AutoML phase is the blind test phase with no submission. The last submission of the previous phase is blind tested on 5 new datasets. Your code will be trained and tested automatically, without human intervention. The final score will be evaluated by the result of the blind testing.
The goal of this challenge is to expose the research community to real world datasets exhibiting the concept drift phenomenon, and under a lifelong ML evaluation scenario. Participants must develop AutoML solutions for dealing with these problems. All datasets are formatted in a uniform way, though the type of features from dataset to dataset might differ (Numerical, Categorical, Multi-valued categorical and time features may be available). The data are provided as preprocessed matrices, so that participants can focus on classification, although participants are welcome to use additional feature transformations / extraction procedures (as long as they do not violate any rule of the challenge). All problems are binary classification tasks and are assessed with the Area Under the ROC Curve (AUC) metric. The considered datasets present, in different degree, the concept drift phenomenon.
The identity of the datasets and the type of data is concealed, though their structure (number of patterns, inputs, feature types, etc.) is revealed. The final score in phase 2 (the phase considered for delivering prizes) will be the average rank of the participants' performance on individual datasets. Winners will be determined by ranking them according to the final score (smallest average rank is best). The overall duration of solutions will be considered as tie-breaking criterion.
The tasks are constrained by a time budget, where each dataset has a different (not cumulative) budget. The Codalab platform provides computational resources shared by all participants. Each code submission will be executed in a compute worker with the following characteristics: 4Cores / 16GB Memory / 80G SSD with Ubuntu OS. To ensure the fairness of the evaluation, when a code submission is evaluated, its execution time is limited (details on the time for each dataset are provided in the input data).
A simulated lifelong ML evaluation scenario is considered (see the figure below). Each dataset is divided into 10 batches of approximately the same number of instances. Instances are chronologically sorted in each batch (and across batches). The code of participants will have access to the data and labels in the first batch (considered a training batch). After that, participants must make predictions for the next i-th batch (the participant’s code will have access to the data of the new batch) and performance will be evaluated. Next, labels for the i-th batch will be revealed to the code, and participants can update their model for making predicitons for ne batch i+1. Your code must implement at least two methods: fitting/training (using the available data at time i, this method could also store data, perform instance selection, subsampling, etc.), and prediction (your model makes predictions for an unlabeled batch). Please look at the sample code submission included in the Starking kit for guidance on how to design your model/code. Average performance across batches will be used for evaluation of each data set.
The challenge has Three phases:
During the feedback phase, the results of your last submission on test data are shown on the leaderboard. Prizes will be awarded in Phase 2 only.
Important: For Phase 1, we provide you with the first 4 test batches (in addition to the labeled training batch) so you can easier design your models at home. For the final phase, only the training batch will be made available to your code initially (then each test batch will be progressively delivered to your code as outlined above).
See the Terms and Conditions site
Prizes sponsored by 4paradigm will be granted to top ranking participants (Excecution time of your submission will be used as tie-breaking criterion), provided they comply with the rules of the challenge (see the terms and conditions, section). The distribution of prizes will be as follows.
To be eligible for prizes you must: publicly relase your code under an open source license, submit factsheet describing your solution, presenting the solution in the competition session at PAKDD2019, signing the prize acceptance format and adhering to the rules of the challenge.
Start: Dec. 25, 2018, midnight
Description: Practice on five datasets similar to those of the second phase. Code submission only.
Start: March 15, 2019, midnight
Description: No new submissions. Your last submission of the first phase will be blindly tested.
April 30, 2019, 2 p.m.
You must be logged in to participate in competitions.Sign In