TrackML Throughput Phase

Organized by VictorEstrade - Current server time: Sept. 24, 2018, 12:04 p.m. UTC
Reward $15,000

Current

Development
Sept. 7, 2018, midnight UTC

Next

Final
Nov. 5, 2018, 11:59 p.m. UTC

End

Competition Ends
Nov. 12, 2018, 11:59 p.m. UTC

Welcome!

This competitions is an official NIPS 2018 competition.

To explore what our universe is made of, scientists at CERN are colliding protons, essentially recreating mini big bangs, and meticulously observing these collisions with intricate silicon detectors. Event rates have already reached hundreds of millions of collisions per second, meaning physicists must sift through tens of petabytes of data per year. And, as the resolution of detectors improve, ever better software is needed for real-time pre-processing and filtering of the most promising events, producing even more data. To help address this problem, a team of Machine Learning experts and physics scientists working at CERN (the world largest high energy physics laboratory), has partnered with prestigious sponsors to answer the question: can machine learning assist high energy physics in discovering and characterizing new particles?In this competition, you are challenged to build an algorithm that quickly reconstructs particle tracks from 3D points left in the silicon detectors.

A 3D image of the points (white) and tracks (red):

 

A simplified view in 2-Dimension : the name of the game is to associated the points into tracks.

 

 

 

 

The challenge is organized in two phases:

  • The Accuracy phase (finished) was run on Kaggle from May to August  13, 2018 (winners to be announced end of September). This first phase focused on the highest score, irrespective of the evaluation time. it was an official IEEE WCCI competition (Rio de Janeiro, Jul 2018).
  • The Throughput phase runs NOW (this Codalab competition), starting in September 2018. Participants must submit their software, which is evaluated by the platform. Incentive is on the throughput (or speed) of the evaluation, while reaching a good accuracy score. It should be noted that the new dataset is slightly different. This phase is an official NIPS competition (Montreal, Dec 2018).

Having participated to Accuracy phase is not necessary (at all) to participate to this Throughput phase. All the necessary information for the Throughput phase is available here on Codalab. The overall TrackML challenge web site is there. Kernels developped and discussion on the Kaggle forum are available to jump start participants to the Throughput phase. Questions should be posted on the forum or directed to trackml.contact at gmail.com 

Evaluation Criteria

Participants submit software (following a template provided), which is run on the platform on 50 test events, which are different but share the same characteristics as the training test.

The software produces a submission file containing particule tracking predictions, from which an accuracy score (how good the software is at finding the tracks) is evaluated. The time to run the software is also measured.  Accuracy and time are then combined to produce the ranking score.

Accuracy score

In one line : it is the intersection between the reconstructed tracks and the ground truth particles, normalized to one for each event, and averaged on the events of the test set

First, each hit is assigned a weight: 

  • weight is non zero only for hits left by particles coming from within a cylinder centered at (0,0,0), with axis around the z-axis, radius 2 mm, half length 16.5 cm and with at least 8 hits (this is the only difference with respect to the scoring of the Accuracy phase, where about 10% of the total possible score was for particles not fulfilling this condition)
  • the few first (starting from the center of the detector) and last hits left by a particle have a larger weight
  • hits from the more straight tracks (more rare, but more interesting) have a larger weight
  • random hits or hits from very short tracks have weight zero 
  • the sum of the weights of all the hits of one event is 1 by construction
  • the hit weights are available in the truth file. They are not revealed for the test dataset

Then, the accuracy score is constructed as follows: 

  • tracks are uniquely matched to particles by the double majority rule:
    • for a given track, the matching particle is the one to which the absolute majority (strictly more that 50%) of the track points belong.
    • the track should have the absolute majority of the points of the matching particle. If any of these constraints is not met, the score for this track is zero
  • the score of a surviving track is the sum of the weights of the points of the intersection between the track and the matching particle.
  • the score of an event is the sum of the score of all its tracks.
  • the final accuracy score is the average on the events on the test set

A perfect algorithm will have an accuray score of 1, while a random one will have an accuracy score 0. An implementation can be found in the trackml python library.

Evaluation time

The software is run by Codalab on 3 dedicated 48 core machines (Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz), within a docker limiting the resources used by the software to two cores and 4GB of memory.

The single thread template software is taking care of the I/O, and handing the data to the user code for each event one after the other. The user code may be multi threaded. Only the wall clock time spent in the user code is accounted for. The input to the ranking score is the average time per event.

Tests have shown the time measurement to be reproducible within a few percent, which is sufficient for the online public leaderboard.


It is probably possible to hack the time measurement ; the public leaderboard and the software submitted will be regularly inspected by the organisers. Any attempt of deliberate hacking will cause the participant to disqualified and his contributions deleted.

Ranking score

Given the accuracy and the time per event (in second), they are combined in a final score as followed:

if accuracy > 0.5 and time < 600:

         score = sqrt{ log( 1 + 600 / time) * (accuracy - 0.5)**2 }

else :

         score = 0

This picture indicates how the ranking score depends on accuracy and time:

 

 

Minimum performance

In practice, the participant software is  first run on one test event (always the same). If on this single event, the accuracy is less than 5% or the time more than 600 second, a -1 score is reported. These values might be adjusted by the organisers during the challenge.

 

Final leaderboard

Once the competition is finished, the public of leaderboard will be purged of submissions with sign of unfair practices. Submission will be run on a new 50 events dataset several times, in order to have an accurate time measurement.

The final scores (and final leaderboard) will then be determined from the average time, and the accuracy.

 

Terms and conditions

  • General Terms: This challenge is governed by the General ChaLearn Contest Rule Terms, the Codalab Terms and Conditions, and the specific rules set forth.

  • Announcements: To receive announcements and be informed of any change in rules, the participants must provide a valid email.

  • Conditions of participation: Participation requires complying with the rules of the challenge. Prize eligibility is restricted by US government export regulations, see the General ChaLearn Contest Rule Terms. The organizers, sponsors, their students, close family members (parents, sibling, spouse or children) and household members, as well as any person having had access to the truth values or to any information about the data or the challenge design giving him (or her) an unfair advantage, are excluded from participation. Participants to the previously run TrackML Accuracy phase on Kaggle can participate.  A disqualified person may submit one or several entries in the challenge and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If a disqualified person submits an entry, this entry will not be part of the final ranking and does not qualify for prizes. The participants should be aware that ChaLearn and the organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for prizes.

  • Training data access: Only registered participants having accepted the rules can access the training data. By downloading the training data, the participants agree to keep it for their own use and not to re-distribute it in any form, including giving direct access to the URL to download the data.

  • Dissemination: The participants may be invited to attend a workshop organized in conjunction with a major machine learning conference and contribute to the proceedings. This competition is an official NIPS 2018 competition.

  • Registration: The participants must register to Codalab and provide a valid email address. Teams should register only once using a group email, which will be shared by all team members. Teams or solo participants registering multiple times to gain an advantage in the competition may be disqualified.

  • Anonymity: The participants who do not present their results at the workshop can elect to remain anonymous by using a pseudonym. Their results will be published on the leaderboard under that pseudonym, and their real name will remain confidential. However, the participants must disclose their real identity to the organizers to claim any prize they might win. See our privacy policy for details. If a participant provides his real name, it will appear on the learderboard and may be used by the Codalab platform provider at his discretion.

  • Submission method: The results must be submitted through this CodaLab competition site. The participants can make up to N_SUB_MAX submissions per day. The value of N_SUB_MAX is indicated as "Max submissions per day" on the challenge website. The organisers reserve the right to change it during the challenge. Using multiple accounts to increase the number of submissions is NOT permitted. The public leaderboard will be updated by automatic evaluation of the submission. The Final leaderboard will be determined by running the best submission in a controlled way on a new dataset with the same statistical property as the one provided. The entries must be formatted as specified on the Evaluation page. As specified in the General ChaLearn Contest Rule Terms, any attempt of hacking like trying to export the Test dataset, to use more than the allocated resources or to manipulate the public leaderboard can lead to disqualification.

  • Prizes: The top performers are eligible to prizes (see Prizes page). To be eligible to any prizes, the participants must make their code publicly available under an OSI-approved license such as, for instance, Apache 2.0, MIT or BSD-like license, fill out a fact sheet briefly describing their methods as well as a short documentation following a template, no later than one week after the deadline for submitting the final results. There is no other publication requirement.  In case of a tie, the prize will go to the participant who submitted his/her entry first. Non winners or entrants who decline their prize retain all their rights on their entries and are not obliged to publicly release their code.

  • Travel awards: The travel awards may be used to attend a workshop organized in conjunction with the challenge. The award money will be granted in reimbursement of expenses including airfare, ground transportation, hotel, or workshop registration. Reimbursement is conditioned on (i) attending the workshop, (ii) making an oral presentation of the methods used in the challenge, and (iii) presenting original receipts and boarding passes. The reimbursements will be made after the workshop.

CHALEARN

This challenge is brought to you by ChaLearn and the sponsors listed on the Sponsors page. Questions should be posted on the forum of directed to trackml.contac at gmail.com.

The TrackML Throughput phase offers a new set of prizes (in addition to the first Accuracy phase prizes).

Cash Prizes

Participants with the best score on the final evaluation (that will take place after the challenge is over on a new test set, under tightly controlled timing) are eligible to receive:

  • 1st Place - $ 7,000
  • 2nd Place - $ 5,000
  • 3rd Place - $ 3,000

HEP meets ML prize

A second set of prizes will be attributed by a jury (with international experts in particle physics tracking algorithms and machine learning), which will select the submission with the most promising balance between score, evaluation speed and originality with respect to traditional particle physics combinatorial approaches. We will provide an entry form for those of you interested in such prizes. 

Prizes which will be distributed under this category :

  • One NVIDIA Tesla V100 GPU.
  • Invitations (at least one) to NIPS dec 2018 in Montreal.
  • Invitations (at least one) to a grand finale workshop at CERN (Geneva) in spring 2019.

Conditions

To be eligible to both types of prizes the teams must submit their complete code (training and evaluation) with an open source license within one week after the end of the competition. The participating team will decide how the amount of the Award will be divided internally amoung the team members. The Award will not cover the travel expenses of team members who belong to the ATLAS or CMS collaboration or are based at CERN. 

 

 

The (human) organizers are here

The Internation Advisory Committee is there

 

Platinum sponsors

NVidia logo NVIDIA GPU's are powering the world's fastest supercomputers
GPU computing is the most pervasive, accessible, energy-efficient path forward for HPC data centers. GPU's are ushering in the era of Convergence, where modeling and simulation are combined with AI to spur a wave of innovation and insight unmatched since computers were first applied to science problems.
UNIGE logo UNIGE : The University of Geneva and its faculty of science is heavily invested in fundamental research and machine learning applications. Its department of particle physics (DPNC) is a strong and long-standing member of the ATLAS collaboration of CERN's LHC.
Gold sponsors
ChaLearn logo ChaLearn: is a non-for-profit organization dedicated to educate the public with the organization of scientific competitions, particularly in machine learning.
DataIS logo The DATAIA Institute: aims to gather and structure on a scientific site, multidisciplinary expertises of great scope and high visibility to better address the major challenges of data science, artificial intelligence and their applications through decompartmentalization between mathematics, computer science and legal, economic and social sciences.
Silver sponsors
CERN openlab logo CERN openlabis a unique public-private partnership that accelerates the development of cutting-edge ICT solutions for the worldwide LHC community and wider scientific research. Through CERN openlab, CERN collaborates with leading ICT companies and research institutes.
CDS logo Paris-Saclay CDS Using data science to advance domain sciences.
Inria logo INRIA is French research institute of computer science.
ERC logo ERC mPP : This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement no 772369). mPP is an ERC Consolidator Grant coordinated by CERN aiming to promote applications based on modern machine learning for particle physics experiments.
RECEPT logo ERC RECPT : This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement no 724777). RECEPT is an ERC consolidator grant concerned with studies of lepton universality and real-time reconstruction and analysis of particle trajectories.
CommonGround logo Common Ground : the only professional online platform tailored to academics with STEM qualifications looking for a rewarding career in the private sector. We combine job opportunities in Machine Learning, Data Science and Software Engineering with unique company, education and support services so academics can discover the companies they really want to work for and better prepare themselves for the transition into industry.
UPsud logo University Paris Sud : is a place dedicated to high-level research, member of the League of European Research Universities. Itis particularly famous for its very high level in basic research, especially in Mathematics and Physics, while hosting numerous research programs in Computer Science, Chemistry and Biology.
INQNET logo INQNET_Logo INQNET (INtelligent Quantum NEtworks and Technologies) is a research program of the Alliance of Quantum Technologies. It aims to accelerate progress in areas of fundamental QIS&T, including quantum AI.
FNAL logo Fermilab is America's particle physics and accelerator laboratory. We bring the world together to solve the mysteries of matter, energy, space and time.
pytorch logo pytorch is an open source deep learning platform built to be flexible and modular for research, with the stability and support needed for production deployment. It enables fast, flexible experimentation through a tape-based autograd system designed for immediate and python-like execution.
  • 5 November 2018 midnight UTC End of submissions.
  • 12 November 2018 11:59 UTC Deadline to submit survey and short software document (for leaderboard prizes and HEP meets ML prize) 
  • end November 2018 winners annoucement

Step by step

How to participate ?

  1. register to Codalab (top right corner)
  2. register to this competition (click on Participate folder)
  3. download Starting kit and Public Data (made available as soon as you register)
  4. submit sample_code_submission.zip found in Starting kit. This is a very simple DBSCAN algorithm. Fast but poor accuracy the score should be zero.
  5. download additional information and files from the "Data and starting kit description page", browse the Forum etc...
  6. build a new zip file with your own software, following documentation in the Starting kit
  7. submit your own submission. Only when accuracy reaches 50% in time per event less than 600 s will you get a non zero score. Note that you can chose to make it public, for other participants to download
  8. go back to 6 and have fun...

This page to inform participants on what has happened since the beginning of the Throughput phase of the TrackML challenge.

  • Thu 12th September : maximum number of submisison per day increased to 2
  • Mon 10 September : teething problem, solved
  • Friday 7 September, 2PM UTC : challenge is online

Use the forum as much as possible. 

Organisers can be reached at trackml.contact at gmail.com

Development

Start: Sept. 7, 2018, midnight

Description: During this phase participants can submit code that will run on the validation data and get feedback from the platform (the max submissions per day and total might be adjusted by the organizers during the competition). Failed submissions are not accounted for.

Final

Start: Nov. 5, 2018, 11:59 p.m.

Description: In the final phase participant's best submission will be tested offline against the private dataset. No new submission allowed.

Competition Ends

Nov. 12, 2018, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In
# Username Score
1 fastrack 0.7774
2 cubus 0.7719
3 Taka 0.0000