“Explosive growth of cities globally signifies the demographic transition from rural to urban, and is associated with shifts from an agriculture-based economy to mass industry, technology, and service. In principle, cities offer a more favorable setting for the resolution of social and environmental problems than rural areas. Cities generate jobs and income, and deliver education, health care and other services. Cities also present opportunities for social mobilization and women's empowerment.” - the World Bank.
Given the importance of urban growth, this project aims to use crowdsource power to develop a data-based model of future urban growth as a function of socio-economic indicators. The goal is to find a minimal set of indicators that can explain to a certain degree the population growth in these areas. For this purpose we will design a challenge on how to best model this indicator using the least number of features while having a good accuracy. The winning model can help governments to create better policies to increase wellness and regulate urban growth to avoid problems brought by either extreme: stagnation and overgrowth. This model can equally be applied for businesses. Real estate agencies, for example, can use these results to better plan their resources on the upcoming year.
The novelty of this challenge lies in using crowdsourcing for feature selection (NP-Hard) on geographical, social and economical indicators for urban growth modelling. Human insight is very important to this problem because of the configuration of present data. The dataset for this problem has much more features than examples and the number of possible subsets of features is far greater than the larger estimation of the total number of atoms in the universe.
Your challenge is to predict the urban growth of a year having a huge amount of data about the previous year.
Mykola LIASHUHA
Guilherme SALES SANTA CRUZ
Louis LAMALLE
Mohamed Salem MESSOUD
Ousmane Cissé
Romain JAMINET
mykola.liashuha@gmail.com
For this task we will use tabular data. The data comes from the World Bank Data and contains the main socio-economic indexes of the countries.The total data has 60 years of 126 countries and geographic areas (e.g. ‘Europe’), i.e. 16104 rows and 17517 columns per area per year. Link for the word bank website : here.
After a preprocessing, our dataset contains 15290 samples with 14944 features. Our training dataset contains 10232 rows and 14944 columns, our validation dataset contains 2528 rows and 14944 columns, and our test dataset contains 2529 rows and 14944 columns.
Go to the FILES tab to download the data and the starting kit. The starting kit contains a very small subset of the data for debug purposes. To prepare a challenge submission, you need to use the large dataset that you download separately. For the the large dataset, you do NOT have the labels!
This research was [partially] supported by Labex DigiCosme (project ANR11LABEX0045DIGICOSME) operated by ANR as part of the program Investissement d’Avenir Idex ParisSaclay (ANR11IDEX000302)
The criteria for the task is, respectively, the least number of features used to model and the highest accuracy. The minimum accuracy measured by R² is 0.8. The R² metric is used because it provides a normalized score for regression problems.
You are given for training a data matrix X_train of dimension 15290 x 14944 and an array y_train of labels of dimension num_training_samples. You must train a model which predicts the labels for two test matrices X_valid and X_test.
There are 2 phases:
This sample competition allows you to submit either:
The submissions are evaluated using the accuracy and the number of features metric.
Submissions must be made before the end of phase 1. You may submit 5 submissions every day and 100 in total.
This challenge is for educational purposes only and no prizes are granted. It is governed by the general ChaLearn contest rules.
Start: Feb. 21, 2021, midnight
Description: Development phase: tune your models and submit prediction results, trained model, or untrained model.
Start: April 15, 2021, midnight
Description: Final phase (no submission, your last submission from the previous phase is automatically forwarded).
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | acrulopez | 0.9990 |
2 | Thibaut | 0.9350 |
3 | moha3lans | 0.9292 |