Organized by AArnaud - Current server time: Aug. 7, 2020, 9:40 p.m. UTC

Feb. 15, 2020, midnight UTC

Feb. 15, 2021, midnight UTC

**/!\ This benchmark will open soon, if you need an early access, please send an email to the organizers. /!\**

Successful treatment of cancer is still a challenge and this is partly due to a wide heterogeneity of cancer composition across patient population. Unfortunately, accounting for such heterogeneity is very difficult. Clinical evaluation of tumor heterogeneity often requires the expertise of anatomical pathologists and radiologists.

This benchmark is dedicated to the quantification of intra-tumor heterogeneity using appropriate statistical methods on cancer omics data.

In particular, it focuses on estimating cell types and proportion in biological samples based on methylation and transcriptome data sets. The goal is to explore various statistical methods for source separation/deconvolution analysis (Non-negative Matrix Factorization, Surrogate Variable Analysis, Principal component Analysis, Latent Factor Models, …) using both RNA-seq and methylome data.

[1] Go on the challenge page, in the `Participate`

tab, in the `Files`

item : download the starting kit by clicking the `Starting Kit`

button, and the public data sets by clicking the `Public Data`

button for the 1st phase.

[2] On your local machine, unzip the just downloaded zip files `stating_kit.zip`

and `data_public.zip`

. Copy the data sets (`D_met1_public.rds`

, `D_rna1_public.rds`

, `A_1_public.rds`

) into the unziped `starting_kit`

directory. Then open *R* in the `starting_kit`

directory, (e.g. open `strating_kit.Rmd`

with *RStudio*).

The unziped strating-kit directory contains now:

- A
`starting_kit.html`

corresponding to the vignette of the Benchmark (all useful information can be found here). - A
`submission_script.Rmd`

to modify and to use to submit your code. - The methylation and transcriptome
**D**matrices, and the associated**A**matrix.

[3] In the *R* console launch the following command :

`rmarkdown::render(input = "submission_script.Rmd")`

Now, let’s submit your code (a zip file generated by the `sumbmission_script.Rmd`

file) in the `Participate`

tab of the codalab challenge.

The discriminating metric will be computed on the **A** matrix: **mean absolute error** between the estimate and the groundtruth.

The matrix **D** of shape (N patients, M methylation sites) is provided. **D = T A**, with **T** the cell-type profiles (k cell types, M variables) and **A** the cell-type proportion per patients (N patients, k cell types).

Participants have to identify an estimate of **A** matrix.

During this benchmark, they have to submit a reproductible script (with their implemented solution) that compute **A**. This script will be applied on 10 simulated data sets to estimate 10 **A** matrices, and the mean of the MAE between those estimations and the simulated **A** matrices will be used for scoring.

**By participating to this challenge, you accept to publicly share your submissions.**

You can freely test your methods on this benchmark and compare yourself to the reference methods. When your development is finished or stable, please create a team with the following nomenclature : "Feature selection for DNAm" [met] + "Feature selection for RNA" [rna] / "Deconvolution method" [both]. The idea is to gather the similar approaches under the same team. Don't hesitate to send an email to the organisers if you have any questions or issues.

**Start:** Feb. 15, 2020, midnight

**Description:** Estimate the proportion matrix A from the DNAm and/or RNA-seq matrices D. The score "Time" is the average time in secondes (over 10 cases) to estimate one A matrix. The score "Data Type" is simply to indicate on which matrix D ("met" = 1, "rna" = 2, "both" = 3) the matrix A has been estimated.

**Feb. 15, 2021, midnight**

You must be logged in to participate in competitions.

Sign In# | Username | Score |
---|---|---|

1 | codabench_4 | 0.0240 |

2 | codabench_3 | 0.0370 |

3 | codabench_1 | 0.0569 |