the anomalous diffusion challenge

Organized by cmanzo - Current server time: July 9, 2020, 7:53 a.m. UTC

Current

Development
March 1, 2020, midnight UTC

Next

Validation
Sept. 14, 2020, midnight UTC

End

Competition Ends
Nov. 1, 2020, 11:59 p.m. UTC

AnDi: The anomalous diffusion challenge

Since Albert Einstein provided a theoretical foundation for Robert Brown’s observation of the movement of particles within pollen grains suspended in water, significant deviations from the laws of Brownian motion have been uncovered in a variety of animate and inanimate systems, from biology to the stock market. Anomalous diffusion, as it has come to be called, is connected to non-equilibrium phenomena, flows of energy and information, and transport in living systems. Typically, anomalous diffusion is characterized by a nonlinear growth of the mean squared displacement MSD with respect to time t:
MSD∼tα,
with α≠1 and can be generated by a variety of stochastic processes, such as:

Identifying the physical origin of this behavior and calculating its exponent α is crucial to understand the nature of the systems under observation. However, the measurement of these properties from the data analysis of trajectories is often limited especially for trajectories that are short, irregularly sampled or featuring mixed behaviors. In the last years, several methods have been proposed to quantify anomalous diffusion, going beyond the classical calculation of the mean squared displacement.

The AnDi challenge aims at bringing together a vibrating and multidisciplinary community of scientists working on this problem. The use of the same reference datasets will allow an unbiased assessment of the performance of published and unpublished methods for characterizing anomalous diffusion from single trajectories.

The challenge consists of three main tasks:

  • Task 1 - Inference of the anomalous diffusion exponent α.
  • Task 2 - Classification of the diffusion model.
  • Task 3 - Segmentation of trajectories.
    As in classical time-series segmentation, an input trajectory showing a change of anomalous diffusion exponent α and/or diffusion model must be divided into two discrete segments in order to reveal the underlying properties of each.

Each task will further include modalities for different number of dimensions (1D, 2D and 3D), for a total of 9 subtasks. Participants can submit results for an arbitrary number of subtasks.
Although the objective of the AnDi Challenge is mainly scientific, the top-ranking participant of each of the 9 subtasks will be invited to give an oral presentation to the ANDI workshop, where they will be awarded with a certificate. Travel expenses will be covered by the organization.

Evaluation

The evaluation of the submissions for challenge ranking purposes will be performed using standard metrics, as described below. However, the organizers reserve the right to analyze the performance of the methods in further details, e.g., at varying trajectory parameters, such as length or noise, or by using other metrics (ROC-AUC, top-(n) accuracy, Fβ) and describe the results in the associated publications.

Inference of the anomalous diffusion exponent α (Task 1)

The results will be evaluated by the calculation of the MAE:
MAE = 1/N Σii,calc - αi,GT|,
where N is the number of trajectories in the dataset, αi,calc and αi,GT represent the calculated and ground truth values of the anomalous exponent of the i-th trajectory, respectively.

Classification of the diffusion model (Task 2)

The results will be evaluated by the calculation of the F1 score, i.e. the harmonic mean of the precision and the recall:
F1 = 2 · (precision · recall)/(precision + recall).
The recall and the precision are calculated as:
precision = TP/(TP+FP) and recall=TP/(TP+FN), with TP, FP and FN being the true positive, false positive and false negative rates, respectively. In particular, the "micro" version of the F1 score provided in the Scikit-learn Python's library will be used and the metrics will be calculated globally by counting the total true positives, false negatives and false positives.

Segmentation of trajectories (Task 3)

For this task, in addition to the MAE and the F1 score, we will also calculate the root mean squared error RMSE of the changepoint relative localization:
RMSE = [1/N Σi(ti,calc - ti,GT)2 ]1/2,
where ti,calc and ti,GT represent the calculated and ground truth values of the changepoint position, respectively. For the ranking, the precision in determining the changepoint position, the anomalous diffusion exponent α and the diffusion model will be summarized in a unique metric given by the mean reciprocal rank MRR obtained for the three metrics: MRR = (1/rankMAE + 1/rankF1 + 1/rankRMSE)/3 .
Thus, please pay attention since the leaderboard shows the average rank. Participant can submit results for an arbitrary number of subtasks. In the leaderboard, submission lacking results for some task/subtask will be scored with MAE=100 (task 1 and 3), F1=0 (task 2 and 3), and RMSE=200 (task 3).

Terms and Conditions

  • General Terms: This challenge is governed by the Codalab Privacy and Terms and by the specific rules set forth.
  • Conditions of participation: Participation requires complying with the rules of the challenge. The organizers and any person having had access to the ground truth values or to any information about the data or the challenge design giving an unfair advantage are excluded from participation. Excluded participants may still submit one or several entries in the challenge and request to have them evaluated, provided that they notify the organizers of their conflict of interest. If an excluded participant submits an entry, this entry will not be part of the final ranking and will not qualify for awards. The organizers reserve the right to evaluate for scientific purposes any entry made in the challenge, whether or not it qualifies for awards. Participants have full responsibility for the ownership of algorithms and tools used for the challenge. Any improper usage of someone else's algorithm is charged to the participants.
  • Registration: The participants must register to Codalab and provide a valid email address. Teams must register following the instructions provided here. Teams or individual participants registering multiple times to gain an advantage in the competition may be excluded. To receive announcements and be informed of any change in rules, the participants must provide a valid email address.
  • Submission method: The results must be submitted through the CodaLab competition site. The submissions must be formatted as specified on the Instructions page. The participants can make up to 5 submissions per day in the Development phase, 3 submissions per day in the Validation phase, and 1 submission per day for the Challenge phase. Using multiple accounts to increase the number of submissions in NOT permitted. All submitted results will be evaluated but only one score will be shown on the leaderboard. The score of each submission can be downloaded from the Submit/View Results page. Participants can choose which score to show on the leaderboard. There is no option for privately sharing of evaluation results. There is no option to make modifications on the leaderboard such as removing lower scores and keeping only higher ones. <---> Teams have to use a single unique name. Any submission coming from the same team with different nicknames will be removed from the leaderboard. In the case of issues, please start a new topic in the Forums.
  • Dissemination: The top participants will be invited to contribute to a joint article describing and summarizing the methods used and the results obtained for the challenge. The paper will be submitted to the arXiv first and then to an indexed journal. In order to be included as authors, participants have to:
    • send to the organizers a detailed description of their methods and be available to clarify to the organizers any doubt that might arise concerning the methods, the code, and/or the results. The organizers will review the paper for sufficient detail to be able to understand and reproduce the method and hold the right to exclude participants from the article in case their method description is not adequate or their results cannot be reproduced.
    • An open-source version of the code used to analyze the test set under an OSI-approved license such as, for instance, Apache 2.0, MIT or BSD-like license. The link must be provided to the organizers by email.
    Furthermore, the participants will be invited to contribute with articles to the Special Issue of Journal of Physics A edited by the organizers.
    The participants will be invited to attend the ANDI workshop that will be held in Castelldefels (Barcelona).
  • Awards: The top-ranking participants of each of the 9 award-winning subtasks may qualify for awards (travel award, invited oral contribution and award certificate). To receive the award it is compulsory to provide a link to a repository containing:
    • An open-source version of the code used to analyze the test set under an OSI-approved license such as, for instance, Apache 2.0, MIT or BSD-like license. The link must be provided to the organizers by email.
    • A markdown or pdf file with a brief description of the algorithm and instructions to run the code.
    The winners of each subtask will de determined according to a ranking calculated as described in the Evaluation page. In the case of a tie, the prize will go to the participant who submitted his/her entry first.
  • Travel awards: The travel awards will cover the expenses to attend the workshop organized in conjunction with the challenge, including airfare (economy class), hotel and workshop registration. The award is conditioned on (i) attending the workshop and (ii) giving an invited oral presentation of the methods used in the challenge. If the winner is a team, the award will cover the expenses of only one of the team members.
  • Code and Data: The code and the data provided for this challenge on Codalab and at AnDiChallenge on GitHub are licensed under Attribution-NonCommercial-ShareAlike 4.0 International. The data can be used for scientific and educational purposes. Any commercial use of data is forbidden. Appropriate citations must be included in scientific publications (journal publications, conference papers, technical reports, presentations at conferences and meetings, thesis) that use the code and/or the data shared in this challenge. The citation must refer to this website and to the doi: 10.5281/zenodo.3707702, and later to the publications that will describe the results of this challenge. Teams must notify the organizers of the challenge about any publication that is even partly based on the results or data published on this site in order to maintain a list of publications associated with the challenge.

Anyone who registers to this challenge, who downloads/uses the code, and/or some or all of the data associated with the challenge is considered to have read and accepted all the rules mentioned above.

Instructions

Please check the Terms and Conditions before submission.

Datasets

An example of a (labeled) training dataset is provided here. Training data can be further generated by means of the code freely available at AnDiChallenge on GitHub. However, there is no limitation of the amount of data to be used for training and benchmarking. For task 1 and 2, the datasets will include trajectories of different lengths. For task 3, all the trajectories will have fixed length and at most one changepoint. Trajectories will be corrupted with different levels of noise associated to a finite localization precision and will be scaled to have a short-time diffusion coefficient (i.e., variance of the displacement distribution) between 0 and 1. The datasets for each phase will be available at the links provided here. For scientific purposes and further analysis, the datasets might contain trajectories for which the ground truth is not know, which will not be used for scoring.
Each dataset provided for Development, Validation and Challenge phase consists in a single .zip file containing three separated data files, one for each task, formatted as text files (.txt extension). The filename ends with a number from 1 to 3 indicating the task number, according to the following naming convention:
[dataset]_[development|validation|challenge]_[task_number].txt.
Each file is organized in lines, separated by a carriage return + line feed terminator. Each line correspond to a single trajectory. Elements in the same line are separated by the semicolon operator ;. Each line starts with the subtask number (i.e., the number of dimensions) and then include the trajectory coordinates. The number of elements in a line is given by one plus the trajectory length times the number of dimensions. For 2d and 3d trajectories, components of the same trajectory along cartesian axes are sequentially concatenated. Therefore, on the same line, you will have first all the x's, then all the y's, etc...
Example of a dataset named dataset_training_1.txt, showing the coordinates of 4 trajectories for subtask 1 (1D), 5 trajectories for subtask 2 (2D), and 3 trajectory for subtask 3 (3D):

1.0; -2.510; -3.003; -0.669; -1.631; -0.302; -0.818; 3.465; 5.997; 4.749; 6.077; 9.145
1.0; 0.047; -0.485; -0.355; -0.716; -1.088; -0.703; -1.494
1.0; 0.991; 1.042; 0.238; 0.909; 0.625; 0.443; -0.639; -0.210; -0.183
1.0; 1.044; 0.947; -0.267; -3.394; -0.269; 1.450; 4.976; 6.972; 9.422; 12.491; 9.825; 8.997
....
2.0; 0.374; 0.040; 0.208; 0.692; 0.538; 0.428; 2.111; 1.471; 1.307; 1.375; 1.669; 1.524
2.0; 2.514; 1.567; 1.777; 2.768; 4.325; -1.821; -6.890; -9.927; -19.495; -10.6649
2.0; -0.393; -0.787; -1.180; -1.573; -3.967; -4.36; -4.754; -5.147
2.0; -0.049; -0.102; -0.233; -0.164; 1.268; 1.27; 1.221; 1.207
2.0; -8.646;-17.293;-25.983;-34.674;-43.365;-81.008;-71.838;-62.667;-53.497;-44.326
....
3.0; -1.775; -2.441; -1.383; -1.247; 7.374; 7.291; 6.788; 7.444; 3.367; 2.746; 2.937; 3.685
3.0; 0.282; 0.478; 0.320; 3.417; 3.498; 3.128; -0.601; -0.203; -0.224;
3.0; 2.829; 1.048; 0.855; 2.837;-6.484;-7.895;-9.96;-13.692; 13.006; 8.89; 5.295; 4.983
....

Result submission format

To submit prediction results, participants must download the datasets and run their code on their own computer. Result submissions for the Development and Validation phase will NOT automatically allow one to participate to the Challenge phase. A sample result submission is provided in the Starting Kit. Submissions should include a single .zip file containing up to three separated result files, one for each task. When compressing the files, please do not put the files into a folder first and then compress the folder. Instead, select or highlight the files and then compress them. Also, hidden files created during compression on MacOSX might prevent the correct scoring of the submission. As described here, zipping from Terminal as
zip -r dir.zip . -x ".*" -x "__MACOSX"
prevents the creation of hidden files. Result files should be formatted as text files. Their name should end with a number from 1 to 3 indicating the task number and they must have a .txt extension, according to the following naming convention:
task[task_number].txt.
Each file should present one result (corresponding to one trajectory) per line, in the same exact order as the dataset. In each submission file, we expect the same number of lines as the number of trajectories in the dataset file. Each line should start with the subtask number (i.e., the number of dimensions) and then include the results of the tasks, separated by the semicolon operator ;. Only insert the results for the subtask(s) you want to compete for. The formatting of the results depends on the specific task:

  • Task 1 - Inference of the anomalous diffusion exponent α.
    This is a regression problem, the results must include one numeric value per line, representing the anomalous diffusion exponent α of each trajectory.
    Example of submission for task 1 named task1.txt, including subtask 1 (1D) and 3 (3D):
    1; 0.70
    1; 0.732
    1; 0.85
    1; 1.21
    ....
    3; 0.341
    3; 1.24
    3; 0.671
    ....
  • Task 2 - Classification of the diffusion model.
    This is a multiclass classification problem, the results must include five numeric values between 0 and 1 per line, representing the scores of membership of each trajectory to the models, in the following order: ATTM; CTRW; FBM; LW; SBM. The scores must add up to 1. Please notice that in this case the format of the submission file is different from the reference file used for scoring.
    Example of submission for task 2 named task2.txt, including only subtask 2 (2D):
    2; 0.32; 0.09; 0.03; 0.51; 0.05
    2; 1; 0; 0; 0; 0
    2; 0.038; 0.255; 0.081; 0.623; 0.003
    2; 0.739; 0.027; 0.152; 0.004; 0.078
    ....
  • Task 3 - Segmentation of trajectories.
    This is a combined problem, including segmentation, regression and classification. The results must include five numeric values per line. The first value represents the changepoint (which is the time at which the change of behavior occurs, where 0 corresponds to the beginning of the trajectory and T corresponds to the end of the trajectory). The second value must be an integer between 0 and 4, indicating the model associated to the first segment of the trajectory, according to the following convention: [0: ATTM; 1: CTRW, 2: FBM, 3: LW, 4: SBM]. The third value must be the value of the anomalous diffusion exponent of the first segment of the trajectory. The fourth value must be an integers between 0 and 4, indicating the model associated to the second segment of the trajectory, according to the following convention: [0: ATTM; 1: CTRW, 2: FBM, 3: LW, 4: SBM]. The fifth value must be the value of the anomalous diffusion exponent of the second segment of the trajectory. If no changepoint is detected, the first value should be either 0 or T, and the second and the third values must be identical to the fourth and the fifth, respectively.
    Example of submission for task 3 named task3.txt, including the three subtasks:
    1; 95; 2; 1.4; 3; 1.1
    1; 62; 4; 1.8; 1; 0.2
    1; 143; 2; 0.7; 0; 1.4
    ....
    2; 124; 0; 0.60; 3; 1.65
    2; 72; 3; 1.95; 1; 0.45
    2; 143; 0; 0.05; 4; 0.25
    ....
    3; 50; 1; 0.73; 3; 0.47
    3; 22; 4; 1.40; 4; 1.79
    3; 55; 1; 0.65; 0; 0.92
    ....

The AnDi Colloquium Series

 

The AnDi workshop

 

The AnDi workshop will be held at ICFO premises in Castelldefels (Barcelona) on February 17-20, 2021.

 

Organizers

Development

Start: March 1, 2020, midnight

Description: Set-up your code capable of producing predictions on anomalous diffusiong trajectories. This phase is intended for development and practice only. A labeled dataset ("Development dataset for Training") and the code to generate further trajectories are made available. The leaderboard shows scores of the "Development dataset for Scoring" dataset only.

Validation

Start: Sept. 14, 2020, midnight

Description: Tweek and tune your code to improve its prediction capabilities of anomalous diffusion. An unlabeled "Validation dataset" is made available. The leaderboard shows scores on Validation dataset only.

Challenge

Start: Oct. 26, 2020, midnight

Description: For the final ranking and publication, please evaluate your code on the "Challenge dataset" and submit the results. This is the final stage, only a limited number of submissions is available. Results will not be scored upon submission, the leaderboard will be available after the deadline.

Competition Ends

Nov. 1, 2020, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In