This is the well known Iris dataset from Fisher's classic paper (Fisher, 1936). The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
References and credits:
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annual Eugenics, 7, Part II, 179-188 (1936).
The competition protocol was designed by Isabelle Guyon.
The starting kit was adapted from an Jupyper notebook designed by Balazs Kegl.
The problem is a multiclass classification problem. Each sample (an Iris) is characterized by its sepal and petal width and length (4 features). You must predict the Iris categories: setosa, virginica, or versicolor.
You are given for training a data matrix X_train of dimension num_training_samples x num_features and an array y_train of labels of dimension num_training_samples. You must train a model which predicts the labels for two test matrices X_valid and X_test.
There are 2 phases:
This sample competition allows you to submit either:
The submissions are evaluated using the accuracy metric.
Submissions must be made before the end of phase 1. You may submit 5 submissions every day and 100 in total.
This challenge is for educational purposes only and no prizes are granted. It is governed by the general ChaLearn contest rules.
Start: Nov. 15, 2018, midnight
Description: Development phase: tune your models and submit prediction results, trained model, or untrained model.
Start: April 30, 2050, midnight
Description: Final phase (no submission, your last submission from the previous phase is automatically forwarded).
You must be logged in to participate in competitions.Sign In