Health Data Challenge 2019

Organized by Alexis_Arnaud - Current server time: March 30, 2020, 11:08 a.m. UTC

First phase

CHALLENGE #1
Nov. 26, 2019, 9 a.m. UTC

End

Competition Ends
Dec. 1, 2019, midnight UTC

Successful treatment of cancer is still a challenge and this is partly due to a wide heterogeneity of cancer composition across patient population. Unfortunately, accounting for such heterogeneity is very difficult. Clinical evaluation of tumor heterogeneity often requires the expertise of anatomical pathologists and radiologists.

This challenge is dedicated to the quantification of intra-tumor heterogeneity using appropriate statistical methods on cancer omics data.

In particular, it focuses on estimating cell types and proportion in biological samples based on methylation and transcriptome data sets. The goal is to explore various statistical methods for source separation/deconvolution analysis (Non-negative Matrix Factorization, Surrogate Variable Analysis, Principal component Analysis, Latent Factor Models, …).

You will participate in 2 challenges:

- CHALLENGE #1 // aim: learn how to use collalab, discover the dataset, manage to submit a deconvolution script on either RNA-seq or methylome

- CHALLENGE #2: phase 1 // aim: test and develop method to quantifiy tumor heterogeneity using both RNA-seq and methylome data (submit results only)

- CHALLENGE #2: phase 2 // aim: validate your method by submitting your script on the platform (evalution on an independant dataset)

How to start ?

[1] Go on the challenge page, in the Participate tab, in the Files item : download the starting kit by clicking the Starting Kit button, and the public data sets by clicking the Public Data button for the 1st phase.

[2] On your local machine, unzip the just downloaded zip files stating_kit_p1.zip and data_public_p1.zip. Copy the data sets (DC1_D_met.rds, DC1_D_rna.rds) into the unziped starting_kit directory.

The unziped strating-kit directory contains now:

• A starting_kit_p1.html corresponding to the vignette of the Challenge (all useful information can be found here).
• A submission_script_p1.Rmd to modify and to use to submit your predictions.
• The data sets for methylation (DC1_D_met.rds) and transcriptome (DC1_D_rna.rds).

[3] Then open R in the starting_kit directory, (e.g. submission_script_p1.Rmd with RStudio), and launch the following command to generate the baseline submission :

rmarkdown::render("submission_script_p1.Rmd")

How to submit your results ?

Now, let’s submit your prediction (a zip file generated by the sumbmission_script.Rmd file) in the Participate tab of the codalab challenge.

How is the scoring metric computed?

The discriminating metric will be computed on the A matrix: mean absolute error between the estimate and the groundtruth.

The matrix D of shape (N patients, M methylation sites) is provided. D = T A, with T the cell-type profiles (k cell types, M variables) and A the cell-type proportion per patients (N patients, k cell types).

Participants have to identify an estimate of A matrix.

During challenge #1, they have to submit a reproductible script (with their implemented solution) that computes A.

2 D matrices will be assesed, and the sum MAE on the 2 corresponding estimated A will be used for scoring.

During challenge #2 - Phase 1, they have to submit directly the estimates of A (to avoid computation delay).

During challenge #2 - Phase 2, they have to submit their final script that computes A, this script will be executed on a new noisy realisation of the simulated dataset.

For each challenge, the discriminating metric will be computed on the A matrix (mean absolute error between the estimate and the groundtruth).

The root mean squared error on the D matrix is given as indicator.

By participating to this challenge, you accept to publicly share your submissions.

CHALLENGE #1

Start: Nov. 26, 2019, 9 a.m.

Description: Estimate the proportion matrix A from a mixture matrix D (either DNAm or RNA-seq). Submit your script (program) on the platform. (Data type correspondence : 1.0 = met, 2.0 = rna)

CHALLENGE #2 // Phase 1 // Exploration

Start: Nov. 27, 2019, 8 a.m.

Description: Estimate the proportion matrix A from 2 mixture matrices D (DNAm & RNA-seq). Submit your results.rds only. (Data type correspondence : 1.0 = met, 2.0 = rna, 3.0 = both)

CHALLENGE #2 // Phase 2 // Validation

Start: Nov. 28, 2019, 10 a.m.

Description: Estimate the proportion matrix A from 2 mixture matrices D (DNAm & RNA-seq). Submit your script (program) on the platform. (Data type correspondence : 1.0 = met, 2.0 = rna, 3.0 = both)

Competition Ends

Dec. 1, 2019, midnight

You must be logged in to participate in competitions.