Data Science Game 2016: Final

Organized by antoine77340 - Current server time: Jan. 25, 2021, 10:54 a.m. UTC

First phase

Public phase
Oct. 15, 2014, midnight UTC


Competition Ends
Sept. 11, 2016, 12:30 p.m. UTC

Welcome in the final of the Data Science Game 2016 !

Challenge description:

The challenge consists in developing an insurance quote conversion model. Everytime someone asks for an online quote through AXA's platforms or specialized comparison websites, a load of data is stored on AXA's servers. Some of these quoted are converted into actual contracts, and some are not. Your goal is to predict whether a quote will be converted. But be careful, many clients run multiple quotes, on various days and sometimes from multiple channels ...

The dataset contains automobile insurance quotes from an AXA subsidiary. The target variable indicates whether the person who requested a given quote bought the associated insurance policy.The dataset contains requests for quotes received by AXA from different brokers and comparison websites, along with some additional data (internal to AXA or from external sources). Each customer quote can reach AXA from different brokers simultaneously when submitted via a comparison website, so a single user can be associated to multiple quotes.


Beginning of the challenge: Saturday at 7:30 am - Number of submission: unlimited

End of the public leaderboard phase: Sunday at 1:00 pm PARIS TIME

Beginning of the private leaderboard phase: Sunday at 1:00 pm PARIS TIME- Number of submission: 2

End of the private leaderboard phase and competiton: Sunday at 2:00 pm PARIS TIME

Column Name - Explanation

CustomerMD5Key - Key associated to the customer
ReceivedDateTime - Date when the request/quote has been received by AXA
SCID - Broker ID
SelectedPackage - Policy type
FirstDriverMaritalStatus - Marital Status of the First registered driver
CarAnnualMileage - Car annual mile age
CarFuelId - Kind of fuel used
CarUsageId - The reason to use the vehicle business or pleasure or both or anything else
FirstDriverAge - Age of the first registered driver
CarInsuredValue - Value of the car insured
CarAge - Age of the insured car
FirstDriverDrivingLicenseNumberY -  First Driver Driving License number
VoluntaryExcess -  AXA do not cover all the cost for an accident for example the customer may select e.g. that (s)he does not mind paying 100 euros and AXA will cover anything above that or the customer may select 50, 1000, 2000 etc
CarParkingTypeId - Garage or off street etc there are approximately 5-10 different options
PolicyHolderNoClaimDiscountYears -  How many years the customer did not have any accidents
FirstDriverDrivingLicenceType -  Full car, Provisional Car, EU provisional, Oversees full licence, Provisional  Car but full motorbike licence etc. more than 20 different categories
CoverIsNoClaimDiscountSelected - the driver had an accident the last one year
CarDrivingEntitlement -  full, provisional etc.
CarTransmissionId -  manual, automatic etc.
SocioDemographicId -  areas that the customer resides and education level, i.e. Single Phd education living in a town working in technology
PolicyHolderResidencyArea - ?
AllDriversNbConvictions - Number of convictions for all drivers
TodayDate -  Today Date
RatedDriverNumber - Which driver is the highest risk
IsPolicyholderAHomeowner -   Home owner
CarMakeId - internal car code that provides us information about the car i.e. Toyota Auris 1.2 cc automatic, Fiat Passat 1.5cc manual etc.
DaysSinceCarPurchase - clear enough
NameOfPolicyProduct - clear enough
AffinityCodeId  - similar to Broker ID

On the rejected quotes you may have the same customer key but different SCID codes. The reason for this is that if a customer submits a quote we can receive it simultaneously from different brokers. However, when the quote is converted to a policy then there is a single customer key and SCID code. 


If you are not familiar with the Codalab platform, you should know that your public leaderboard score is not based on your best submission like in kaggle but on your latest submission.

The private leaderboard will however be the best submission among the 2 submissions allowed during the last phase.


The evaluation metric for this challenge will be the binary log-loss.

We implemented the same log-loss as the kaggle log-loss metric. You can find more detail about the metric there:

Data Privacy

You are not allowed to share the data to a person who is not part of your team. The data must stay on your laptop or on Microsoft Azure.



The 5 best teams from the private leaderboard will be awarded. An extra special prize will also be given by the jury.

Public phase

Start: Oct. 15, 2014, midnight

Description: Public phase

Private phase

Start: Sept. 11, 2016, 11 a.m.

Description: Private leaderboard

Competition Ends

Sept. 11, 2016, 12:30 p.m.

You must be logged in to participate in competitions.

Sign In