Since Albert Einstein provided a theoretical foundation for Robert Brown’s observation of the movement of particles within pollen grains suspended in water, significant deviations from the laws of Brownian motion have been uncovered in a variety of animate and inanimate systems, from biology to the stock market. Anomalous diffusion, as it has come to be called, is connected to non-equilibrium phenomena, flows of energy and information, and transport in living systems. Typically, anomalous diffusion is characterized by a nonlinear growth of the mean squared displacement MSD with respect to time t:
with α≠1 and can be generated by a variety of stochastic processes, such as:
Identifying the physical origin of this behavior and calculating its exponent α is crucial to understand the nature of the systems under observation. However, the measurement of these properties from the data analysis of trajectories is often limited especially for trajectories that are short, irregularly sampled or featuring mixed behaviors. In the last years, several methods have been proposed to quantify anomalous diffusion, going beyond the classical calculation of the mean squared displacement.
The AnDi challenge aims at bringing together a vibrating and multidisciplinary community of scientists working on this problem. The use of the same reference datasets will allow an unbiased assessment of the performance of published and unpublished methods for characterizing anomalous diffusion from single trajectories.
The challenge consists of three main tasks:
Each task will further include modalities for different number of dimensions (1D, 2D and 3D), for a total of 9 subtasks. Participants can submit results for an arbitrary number of subtasks.
Although the objective of the AnDi Challenge is mainly scientific, the top-ranking participant of each of the 9 subtasks will be invited to give an oral presentation to the ANDI workshop, where they will be awarded with a certificate. Travel expenses will be covered by the organization.
The evaluation of the submissions for challenge ranking purposes will be performed using standard metrics, as described below. However, the organizers reserve the right to analyze the performance of the methods in further details, e.g., at varying trajectory parameters, such as length or noise, or by using other metrics (ROC-AUC, top-(n) accuracy, Fβ) and describe the results in the associated publications.
The results will be evaluated by the calculation of the MAE:
MAE = 1/N Σi |αi,calc - αi,GT|,
where N is the number of trajectories in the dataset, αi,calc and αi,GT represent the calculated and ground truth values of the anomalous exponent of the i-th trajectory, respectively.
The results will be evaluated by the calculation of the F1 score, i.e. the harmonic mean of the precision and the recall:
F1 = 2 · (precision · recall)/(precision + recall).
The recall and the precision are calculated as:
precision = TP/(TP+FP) and recall=TP/(TP+FN), with TP, FP and FN being the true positive, false positive and false negative rates, respectively. In particular, the "micro" version of the F1 score provided in the Scikit-learn Python's library will be used and the metrics will be calculated globally by counting the total true positives, false negatives and false positives.
For this task, in addition to the MAE and the F1 score, we will also calculate the root mean squared error RMSE of the changepoint relative localization:
RMSE = [1/N Σi(ti,calc - ti,GT)2 ]1/2,
where ti,calc and ti,GT represent the calculated and ground truth values of the changepoint position, respectively. For the ranking, the precision in determining the changepoint position, the anomalous diffusion exponent α and the diffusion model will be summarized in a unique metric given by the mean reciprocal rank MRR obtained for the three metrics: MRR = (1/rankMAE + 1/rankF1 + 1/rankRMSE)/3 .
Thus, please pay attention since the leaderboard shows the average rank. Participant can submit results for an arbitrary number of subtasks. In the leaderboard, submission lacking results for some task/subtask will be scored with MAE=100 (task 1 and 3), F1=0 (task 2 and 3), and RMSE=200 (task 3).
Anyone who registers to this challenge, who downloads/uses the code, and/or some or all of the data associated with the challenge is considered to have read and accepted all the rules mentioned above.
Please check the Terms and Conditions before submission.
An example of a (labeled) training dataset is provided here. Training data can be further generated using the code freely available at AnDiChallenge on GitHub. However, there is no limitation in the amount of data to be used for training and benchmarking. For task 1 and 2, the datasets will include trajectories of different lengths. For task 3, all the trajectories will have fixed length and at most one changepoint. Models in 3D won't be implemented by simply composing 3 independent motions on orthogonal axes, but by first generating displacements according to the specific model and then obtaining random x,y,z components compatible with the displacement choice. In such a way, we guarantee the uniform sampling over a sphere. Trajectories generated according to theoretical models will be corrupted with Gaussian noise associated with a finite localization precision. The ratio of the standard deviation associated with the noise over the standard deviation of the displacements of the uncorrupted trajectory will be ≤2. Trajectories will be given a short-time diffusion coefficient by rescaling their overall variance randomly. Scaling and normalization will be carried out over 1000-points-long trajectories that will be finally cut to the desired length. The datasets for each phase will be available at the links provided here. For scientific purposes and further analysis, the datasets might contain trajectories for which the ground truth is not known, which will not be used for scoring.
Each dataset provided for Development, Validation, and Challenge phase consists of a single
.zip file containing three separate data files, one for each task, formatted as text files (
.txt extension). The filename ends with a number from 1 to 3 indicating the task number, according to the following naming convention:
Each file is organized in lines, separated by a carriage return + line feed terminator. Each line corresponds to a single trajectory. Elements in the same line are separated by the semicolon operator
;. Each line starts with the subtask number (i.e., the number of dimensions) and then include the trajectory coordinates. The number of elements in a line is given by one plus the trajectory length times the number of dimensions. For 2d and 3d trajectories, components of the same trajectory along cartesian axes are sequentially concatenated. Therefore, on the same line, you will have first all the x's, then all the y's, etc...
Example of a dataset named
dataset_training_1.txt, showing the coordinates of 4 trajectories for subtask 1 (1D), 5 trajectories for subtask 2 (2D), and 3 trajectory for subtask 3 (3D):
To submit prediction results, participants must download the datasets and run their code on their computer. Result submissions for the Development and Validation phase will NOT automatically allow one to participate in the Challenge phase. A sample result submission is provided in the Starting Kit. Submissions should include a single
.zip file containing up to three separate result files, one for each task. When compressing the files, please do not put the files into a folder first and then compress the folder. Instead, select or highlight the files and then compress them. Also, hidden files created during compression on MacOSX might prevent the correct scoring of the submission. As described here, zipping from Terminal as
zip -r dir.zip . -x ".*" -x "__MACOSX"
prevents the creation of hidden files. Result files should be formatted as text files. Their name should end with a number from 1 to 3 indicating the task number and they must have a
.txt extension, according to the following naming convention:
Each file should present one result (corresponding to one trajectory) per line, in the same exact order as the dataset. In each submission file, we expect the same number of lines as the number of trajectories in the dataset file. Each line should start with the subtask number (i.e., the number of dimensions) and then include the results of the tasks, separated by the semicolon operator
;. Only insert the results for the subtask(s) you want to compete for. The formatting of the results depends on the specific task:
task1.txt, including subtask 1 (1D) and 3 (3D):
task2.txt, including only subtask 2 (2D):
task3.txt, including the three subtasks:
The AnDi workshop will be held at ICFO premises in Castelldefels (Barcelona) on February 17-20, 2021.
Start: March 1, 2020, midnight
Description: Set-up your code capable of producing predictions on anomalous diffusiong trajectories. This phase is intended for development and practice only. A labeled dataset ("Development dataset for Training") and the code to generate further trajectories are made available. The leaderboard shows scores of the "Development dataset for Scoring" dataset only.
Start: Sept. 14, 2020, midnight
Description: Tweek and tune your code to improve its prediction capabilities of anomalous diffusion. An unlabeled "Validation dataset" is made available. The leaderboard shows scores on Validation dataset only.
Start: Oct. 26, 2020, midnight
Description: For the final ranking and publication, please evaluate your code on the "Challenge dataset" and submit the results. This is the final stage, only a limited number of submissions is available. Results will not be scored upon submission, the leaderboard will be available after the deadline.
Nov. 1, 2020, 11:59 p.m.
You must be logged in to participate in competitions.Sign In