Congratulations to the winners:
The fact sheets are available for inspection.
![]() |
This is a competition with code submission. Contact the organizers for instructions to get admitted. |
The problem of attributing causes to effects is pervasive in science, medicine, economy and almost every aspects of our everyday life involving human reasoning and decision making. What affects your health? the economy? climate changes? The gold standard to establish causal relationships is to perform randomized controlled experiments. However, experiments are costly while non-experimental "observational" data collected routinely around the world are readily available. Unravelling potential cause-effect relationships from such observational data could save a lot of time and effort.
Consider for instance a target variable B, like occurrence of "lung cancer" in patients. The goal would be to find whether a factor A, like "smoking", might cause B. The objective of the challenge is to rank pairs of variables {A, B} to prioritize experimental verifications of the conjecture that A causes B. As is known, "correlation does not mean causation". More generally, observing a statistical dependency between A and B does not imply that A causes B or that B causes A; A and B could be consequences of a common cause. But, is it possible to determine from the joint observation of samples of two variables A and B that A should be a cause of B or vice versa?
This challenge is limited to pairs of variables deprived of their context and deprived of time ordering of the samples. Neither constraint-based methods relying on conditional independence tests and/or graphical models nor Granger causality type of methods using time ordering are applicable. The goal is to push the state-of-the art in complementary methods.
We ran a first version of this challenge from March 29 to September 2, 2013, under the name Cause-Effect Pairs challenge. The participants built on top of algorithms from the literature and greatly improved performance (from and AUC near 0.6 to and AUC near 0.8!). The results were discussed at a workshop at NIPS 2013. But most methods remain relatively slow. Your turn to make it better and faster!
This challenge is brought to you by ChaLearn, see our credits page. Contact the organizers.
For the purpose of this challenge, two variables A and B are causally related if:
B = f (A, noise) or A = f (B, noise).
If the former case, A is cause of B and in the latter case B is a cause of A. All other factors are lumped into the "noise". We provide samples of joint observations of A and B, not organized in a time series. We exclude feed-back loops and consider only 4 types of relationships:
A->B | A causes B | Positive class |
B->A | B causes A | Negative class |
A - B | A and B are consequences of a common cause | Null class |
A | B | A and B are independent | Null class |
We bring the problem back to a classification problem: for each pair of variable {A, B}, you must answer the question: is A a cause of B? (or, since the problem is symmetrical in A and B, is B a cause of A?)
We expect the participants to produce a score between -Inf and +Inf, large positive values indicating that A is a cause of B with certainty, large negative values indicating that B is a cause of A with certainty. Middle range scores (near zero) indicate that neither A causes B nor B causes A.
For each pair of variables, we have a ternary truth value indicating whether A is a cause of B (+1), B is a cause of A (-1), or neither (0). We use the scores provided by the participants as a ranking criterion and evaluate their entries with the area under the ROC curve (AUC). We average two AUCs for the problem of classifying "A causes B" vs. everything else (forward AUC) and the problem of classifying "B causes A" vs. everything else (backward AUC).
The participants must submit a zip file containing a program that calculates a fast causation coefficient. The program must be written in Python. To that end, we provide:
Code submitted bundled in zip file will be executed automatically on the Codalab platform and the scoring results will be posted on the leaderboard for immediate feed-back. The results in the development phase are computed on validation data. The final ranking will be done during the final phase on the test data. The results on test data will remain hidden to the participants until the end of the challenge.
To develop their code, we encourage the participants to use the exact version of Python and libraries that installed on Codalab: Anaconda 1.6.2, which can be found at http://repo.continuum.io/archive/. More particularly, we installed http://repo.continuum.io/archive/Anaconda-1.6.2-Windows-x86_64.exe.
Yes, see our Terms and Conditions. There are travel awards for the three top ranking participants to attend the workshop held in conjunction with the Microsoft Faculty Summit in July 2014, and one Grand Prize for $1000 for the winner of the fast code prize (among the three top ranking participants).
Yes, you may submit a paper to the JMLR special topic on causality, deadline September 15, 2014.
We ran a first version of this challenge from March 29 to September 2, 2013, under the name Cause-Effect Pairs challenge. A lot of help can be found on the website of our previous challenge:
Codalab lets you choose which entry you want to display on the leaderboard. You must submit it by clicking "Submit to Learderboard".
To exactly reproduce the environment used on Codalab, the participants can perform the following steps:
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". ISABELLE GUYON, CHALEARN, KAGGLE, MICROSOFT AND/OR OTHER ORGANIZERS AND SPONSORS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. In case of dispute about prize attribution or possible exclusion from the competition, the participants agree not to take any legal action against the organizers or sponsors. Decisions can be appealed by submitting a letter to Vincent Lemaire, secretary of ChaLearn, and disputes will be resolved by the board of ChaLearn.
Post messages of general interest to our Google group.
This challenge is brought to you by ChaLearn, see our credits page. Contact the organizers.
(*) The travel award may be used to attend a workshop organized in conjunction with the Microsoft Faculty Summit held at Microsoft Research in Redmond in July 2014. The award money will be granted in reimbursement of expenses including airfare, ground transportation, hotel, or workshop registration. Reimbursement is conditioned on (i) attending the workshop, (ii) making an oral presentation of the methods used in the challenge, and (iii) presenting original receipts and boarding passes. For winners traveling from outside North America, the travel award will be doubled.
This challenge is brought to you by ChaLearn, see our credits page. Contact the organizers.
Start: April 9, 2014, noon
Start: June 15, 2014, midnight
Start: June 18, 2014, 5:31 p.m.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | david.lopez.paz | 91.60 |
2 | reference | 287.77 |