Since Albert Einstein provided a theoretical foundation for Robert Brown’s observation of the movement of particles within pollen grains suspended in water, significant deviations from the laws of Brownian motion have been uncovered in a variety of animate and inanimate systems, from biology to the stock market. Anomalous diffusion, as it has come to be called, is connected to non-equilibrium phenomena, flows of energy and information, and transport in living systems. Typically, anomalous diffusion is characterized by a nonlinear growth of the mean squared displacement MSD with respect to time t:
MSD∼tα,
with α≠1 and can be generated by a variety of stochastic processes, such as:
Identifying the physical origin of this behavior and calculating its exponent α is crucial to understand the nature of the systems under observation. However, the measurement of these properties from the data analysis of trajectories is often limited especially for trajectories that are short, irregularly sampled or featuring mixed behaviors. In the last years, several methods have been proposed to quantify anomalous diffusion, going beyond the classical calculation of the mean squared displacement.
The AnDi challenge aims at bringing together a vibrating and multidisciplinary community of scientists working on this problem. The use of the same reference datasets will allow an unbiased assessment of the performance of published and unpublished methods for characterizing anomalous diffusion from single trajectories.
The challenge consists of three main tasks:
Each task will further include modalities for different number of dimensions (1D, 2D and 3D), for a total of 9 subtasks. Participants can submit results for an arbitrary number of subtasks.
Although the objective of the AnDi Challenge is mainly scientific, the top-ranking participant of each of the 9 subtasks will be invited to give an oral presentation to the ANDI workshop, where they will be awarded with a certificate. Travel expenses will be covered by the organization.
Team | Affiliation | Method |
---|---|---|
QuBI | UVic-UCC (Vic, Spain) | ELM |
Anomalous Unicorns | ICFO-The Institute of Photonic Sciences (Spain) | HYDRA (RNN + CNN) |
Valencian Karatekas | Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València (Spain) | TBA |
WUST ML A | Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology (Wrocław, Poland) | RISE for 1D - MrSEQL for 2D and 3D |
WUST ML B | Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology (Wrocław, Poland) | Gradient Boosting |
UPV-MAT | Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València (Spain) | Recurrent Neural Networks for trajectory profiling |
DeepSPT | Department of Physics, Pohang University of Science and Technology (Pohang, Korea) | Deep Learning (ResNet_MLP + XGBoost) |
UCL | Department of Chemistry, University College London (UK) | CONDOR |
DecBayComp | Institut Pasteur, Decision and Bayesian Computation lab (Paris, France) | Graphs on random walks |
eduN | Max Planck Institute for the Physics of Complex Systems (MPI-PKS) (Dresden, Germany) | MAD-RNN |
University of Gothenburg (Sweden) | ||
Erasmus MC | Erasmus MC, Cell Biology department (Rotterdam, The Netherlands) | FEST |
TSA | Max-Planck Institute for the physics of complex systems (Dresden, Germany) | Educated scaling analysis |
BIT | PhyLife, Department of Physics, Chemistry and Pharmacy, University of Southern Denmark (Odense, Denmark) | Bayesian inference using annealed importance sampling |
Institute for Physics and Astronomy, University of Potsdam (Potsdam-Golm, Germany) | ||
Department of Physics, Pohang University of Science and Technology (Pohang, Korea) | ||
HNU | School of Physics and Electronics, Hunan University (Changsha, China) | Just LSTM it |
NOA | Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València (Spain) | Convolutional LSTM |
Tom Bland | The Francis Crick Institute (London, UK) | Convolutional Neural Network |
rank | team | T1.1D |
---|---|---|
1.0 | UPV-MAT | 0.152070 |
2.0 | HNU | 0.152083 |
3.0 | eduN | 0.155829 |
4.0 | UCL | 0.159670 |
5.0 | Tom Bland | 0.176480 |
6.0 | Erasmus MC | 0.186271 |
7.0 | DecBayComp | 0.186349 |
8.0 | DeepSPT | 0.232319 |
9.0 | BIT | 0.238565 |
10.0 | WUST ML B | 0.252191 |
11.0 | QuBI | 0.268770 |
12.0 | Anomalous Unicorns | 0.285998 |
13.0 | TSA | 0.325419 |
rank | team | T1.2D |
---|---|---|
1.0 | UCL | 0.138895 |
2.0 | eduN | 0.143241 |
3.0 | HNU | 0.151801 |
4.0 | Tom Bland | 0.153457 |
5.0 | Erasmus MC | 0.176680 |
6.0 | BIT | 0.196906 |
7.0 | WUST ML B | 0.251251 |
8.0 | UPV-MAT | 0.282528 |
9.0 | TSA | 0.309310 |
10.0 | DecBayComp | 0.334354 |
rank | team | T1.3D |
---|---|---|
1.0 | UCL | 0.118815 |
2.0 | eduN | 0.139198 |
3.0 | HNU | 0.148544 |
4.0 | Erasmus MC | 0.151086 |
5.0 | BIT | 0.196919 |
6.0 | DecBayComp | 0.277763 |
7.0 | TSA | 0.307556 |
8.0 | UPV-MAT | 0.340580 |
rank | team | T2.1D |
---|---|---|
1.0 | eduN | 0.8705 |
2.0 | UPV-MAT | 0.8687 |
3.0 | Tom Bland | 0.8533 |
4.0 | UCL | 0.8497 |
5.0 | Erasmus MC | 0.8444 |
6.0 | Anomalous Unicorns | 0.8265 |
7.0 | DecBayComp | 0.8047 |
8.0 | NOA | 0.7884 |
9.0 | QuBI | 0.7487 |
10.0 | DeepSPT | 0.7091 |
11.0 | WUST ML B | 0.6104 |
12.0 | WUST ML A | 0.5995 |
13.0 | BIT | 0.5387 |
14.0 | TSA | 0.5058 |
rank | team | T2.2D |
---|---|---|
1.0 | eduN | 0.8845 |
2.0 | UCL | 0.8764 |
3.0 | Tom Bland | 0.8684 |
4.0 | UPV-MAT | 0.8666 |
5.0 | Erasmus MC | 0.8614 |
6.0 | WUST ML A | 0.7237 |
7.0 | WUST ML B | 0.6801 |
8.0 | BIT | 0.5264 |
9.0 | DecBayComp | 0.2262 |
10.0 | Valencian Karatekas | 0.2069 |
rank | team | T2.3D |
---|---|---|
1.0 | UCL | 0.9236 |
2.0 | eduN | 0.9171 |
3.0 | UPV-MAT | 0.8904 |
4.0 | Erasmus MC | 0.8767 |
5.0 | WUST ML A | 0.7921 |
6.0 | WUST ML B | 0.6387 |
7.0 | BIT | 0.4967 |
rank | team | T3.1.1D | T3.2.1D | T3.3.1D | total inv rank |
---|---|---|---|---|---|
1.0 | eduN | 36.762731 | 0.65530 | 0.293203 | 0.833333 |
2.0 | Tom Bland | 35.993695 | 0.61765 | 0.319098 | 0.611111 |
3.0 | BIT | 43.965496 | 0.47920 | 0.309850 | 0.388889 |
4.0 | HNU | 81.331490 | 0.20000 | 0.647655 | 0.250000 |
rank | team | T3.1.2D | T3.2.2D | T3.3.2D | total inv rank |
---|---|---|---|---|---|
1.0 | Tom Bland | 31.031505 | 0.60170 | 0.289556 | 0.833333 |
2.0 | eduN | 42.582648 | 0.62715 | 0.321730 | 0.666667 |
3.0 | BIT | 47.103605 | 0.41275 | 0.323511 | 0.333333 |
4.0 | HNU | 80.692771 | 0.19720 | 0.645350 | 0.250000 |
rank | team | T3.1.3D | T3.2.3D | T3.3.3D | total inv rank |
---|---|---|---|---|---|
1.0 | eduN | 32.665125 | 0.57565 | 0.291130 | 1.000000 |
2.0 | BIT | 51.531586 | 0.38580 | 0.325543 | 0.500000 |
3.0 | HNU | 80.799330 | 0.20190 | 0.642918 | 0.333333 |
The evaluation of the submissions for challenge ranking purposes will be performed using standard metrics, as described below. However, the organizers reserve the right to analyze the performance of the methods in further details, e.g., at varying trajectory parameters, such as length or noise, or by using other metrics (ROC-AUC, top-(n) accuracy, Fβ) and describe the results in the associated publications.
The results will be evaluated by the calculation of the MAE:
MAE = 1/N Σi |αi,calc - αi,GT|,
where N is the number of trajectories in the dataset, αi,calc and αi,GT represent the calculated and ground truth values of the anomalous exponent of the i-th trajectory, respectively.
The results will be evaluated by the calculation of the F1 score, i.e. the harmonic mean of the precision and the recall:
F1 = 2 · (precision · recall)/(precision + recall).
The recall and the precision are calculated as:
precision = TP/(TP+FP) and recall=TP/(TP+FN), with TP, FP and FN being the true positive, false positive and false negative rates, respectively. In particular, the "micro" version of the F1 score provided in the Scikit-learn Python's library will be used and the metrics will be calculated globally by counting the total true positives, false negatives and false positives.
For this task, in addition to the MAE and the F1 score, we will also calculate the root mean squared error RMSE of the changepoint relative localization:
RMSE = [1/N Σi(ti,calc - ti,GT)2 ]1/2,
where ti,calc and ti,GT represent the calculated and ground truth values of the changepoint position, respectively. For the ranking, the precision in determining the changepoint position, the anomalous diffusion exponent α and the diffusion model will be summarized in a unique metric given by the mean reciprocal rank MRR obtained for the three metrics: MRR = (1/rankMAE + 1/rankF1 + 1/rankRMSE)/3 .
Thus, please pay attention since the leaderboard shows the average rank. Participant can submit results for an arbitrary number of subtasks. In the leaderboard, submission lacking results for some task/subtask will be scored with MAE=100 (task 1 and 3), F1=0 (task 2 and 3), and RMSE=200 (task 3).
Anyone who registers to this challenge, who downloads/uses the code, and/or some or all of the data associated with the challenge is considered to have read and accepted all the rules mentioned above.
Please check the Terms and Conditions before submission.
An example of a (labeled) training dataset is provided here. Training data can be further generated using the code freely available at AnDiChallenge on GitHub. However, there is no limitation in the amount of data to be used for training and benchmarking. For task 1 and 2, the datasets will include trajectories of different lengths. For task 3, all the trajectories will have fixed length and at most one changepoint. Models in 3D won't be implemented by simply composing 3 independent motions on orthogonal axes, but by first generating displacements according to the specific model and then obtaining random x,y,z components compatible with the displacement choice. In such a way, we guarantee the uniform sampling over a sphere. Trajectories generated according to theoretical models will be corrupted with Gaussian noise associated with a finite localization precision. The ratio of the standard deviation associated with the noise over the standard deviation of the displacements of the uncorrupted trajectory will be ≤2. Trajectories will be given a short-time diffusion coefficient by rescaling their overall variance randomly. Scaling and normalization will be carried out over 1000-points-long trajectories that will be finally cut to the desired length. The datasets for each phase will be available at the links provided here. For scientific purposes and further analysis, the datasets might contain trajectories for which the ground truth is not known, which will not be used for scoring.
Each dataset provided for Development, Validation, and Challenge phase consists of a single .zip
file containing three separate data files, one for each task, formatted as text files (.txt
extension). The filename ends with a number from 1 to 3 indicating the task number, according to the following naming convention:
[dataset]_[development|validation|challenge]_[task_number].txt
.
Each file is organized in lines, separated by a carriage return + line feed terminator. Each line corresponds to a single trajectory. Elements in the same line are separated by the semicolon operator ;
. Each line starts with the subtask number (i.e., the number of dimensions) and then include the trajectory coordinates. The number of elements in a line is given by one plus the trajectory length times the number of dimensions. For 2d and 3d trajectories, components of the same trajectory along cartesian axes are sequentially concatenated. Therefore, on the same line, you will have first all the x's, then all the y's, etc...
Example of a dataset named dataset_training_1.txt
, showing the coordinates of 4 trajectories for subtask 1 (1D), 5 trajectories for subtask 2 (2D), and 3 trajectory for subtask 3 (3D):
To submit prediction results, participants must download the datasets and run their code on their computer. Result submissions for the Development and Validation phase will NOT automatically allow one to participate in the Challenge phase. A sample result submission is provided in the Starting Kit. Submissions should include a single .zip
file containing up to three separate result files, one for each task. When compressing the files, please do not put the files into a folder first and then compress the folder. Instead, select or highlight the files and then compress them. Also, hidden files created during compression on MacOSX might prevent the correct scoring of the submission. As described here, zipping from Terminal as
zip -r dir.zip . -x ".*" -x "__MACOSX"
prevents the creation of hidden files. Result files should be formatted as text files. Their name should end with a number from 1 to 3 indicating the task number and they must have a .txt
extension, according to the following naming convention:
task[task_number].txt
.
Each file should present one result (corresponding to one trajectory) per line, in the same exact order as the dataset. In each submission file, we expect the same number of lines as the number of trajectories in the dataset file. Each line should start with the subtask number (i.e., the number of dimensions) and then include the results of the tasks, separated by the semicolon operator ;
. Only insert the results for the subtask(s) you want to compete for. The formatting of the results depends on the specific task:
task1.txt
, including subtask 1 (1D) and 3 (3D):
task2.txt
, including only subtask 2 (2D):
task3.txt
, including the three subtasks:
The AnDi workshop will be held at ICFO premises in Castelldefels (Barcelona). The tentative dates are 02-04/06/2021 but we might be forced to change them due to the traveling restrictions associated with the COVID-19 pandemic.
Stay tuned!
Start: March 1, 2020, midnight
Description: Set-up your code capable of producing predictions on anomalous diffusiong trajectories. This phase is intended for development and practice only. A labeled dataset ("Development dataset for Training") and the code to generate further trajectories are made available. The leaderboard shows scores of the "Development dataset for Scoring" dataset only.
Start: Sept. 14, 2020, midnight
Description: Tweek and tune your code to improve its prediction capabilities of anomalous diffusion. An unlabeled "Validation dataset" is made available. The leaderboard shows scores on Validation dataset only.
Start: Oct. 26, 2020, midnight
Description: For the final ranking and publication, please evaluate your code on the "Challenge dataset" and submit the results. This is the final stage, only a limited number of submissions is available. Results will not be scored upon submission, the leaderboard will be available after the deadline.
Nov. 1, 2020, 11:59 p.m.
You must be logged in to participate in competitions.
Sign In