The chemical company, Chems-R-Us, has hired your company to consult on data analytics. Chems-R-Us' goal is to create a model to predict ready biodegradation of chemicals by using molecular descriptors. A data set of with molecular descriptors and biodegradation experimental values of 1055 chemicals were collected from the webpage of the National Institute of Technology and Evaluation of Japan (NITE). The company wants your consulting to develop a computational model to predict the ready biodegradability of molecules.
This challenge consists in two problems:
Binary classification: Each data row is labeled (-1) or (1). You have to train a predictive model on train dataset to be able to find as well as possible the labels of the test dataset.
Feature selection: Among the 168 features, there are fake features. Variables randomly generated which don't help predicting the class. The goal of this problem is therefore to classify features between fake (0) and real (1).
The problems are a binary classification and a feature selection. For both, the evaluation metric is the area under ROC curve (AUC).
The Receiver Operating Characteristic (ROC) curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. It is created by plotting the true positive rate (TPR) against the false positive (FPR) at various threshold settings.
The evaluation metric is therefore the area under the curve (AUC).
You may submit 10 submissions every day and 100 in total.
This challenge is for educational purposes only, no prizes are awarded.
This challenge is governed by the general ChaLearn contest rules.
Start: April 5, 2018, midnight
You must be logged in to participate in competitions.Sign In