MolHack 2019

Organized by insilico_taiwan - Current server time: March 22, 2019, 6:08 p.m. UTC

Current

Development
Feb. 24, 2019, 4 p.m. UTC

Next

Final
March 24, 2019, 4 p.m. UTC

End

Competition Ends
March 31, 2019, 4 p.m. UTC

MolHack 2019

Requirements

Send CodaLab user ID to taiwan@insilico.com 
Verify your residency of Taiwan, ROC, by sending scan or photo of your personal ID 
Submit your predictions by the submission deadline of March 31, 2019 

Introduction

An important aspect of drug discovery is the capability to design new molecules that have some characteristics of interest. Such characteristics may be (approximately) encoded in the molecule's fingerprint (such as MACCS fingerprint). The capability to generate new molecules that have fingerprints of interests is therefore very important. Such a task may be achieved through a carefully designed and trained Deep Neural Network (DNN) that is conditioned on molecules fingerprints. Such a model can be said to be a Conditional Generative DNN model for generating new and unique molecules based on target MACCS fingerprints.

Goal

We seek to model the generation of small molecules that have some specific properties/features as a Deep Learning problem. Since the characteristics of a molecule may be encoded in its MACCS fingerprint, the goal is to build and train a Conditional Generative DNN model for generating new and unique molecules based on target MACCS fingerprints.

Challenge

At the minimum, the MolHack challenge will require the participants to build and train a Conditional Generative DNN model that can generate new small molecules such that the newly generated small molecules have similar MACCS fingerprints as the target MACCS fingerprint upon which the generation of the molecules is conditioned.

MolHack 2019: Evaluation

The generation of small molecules that have some specific properties/features is modeled as a Deep Learning problem. Each sample molecule (represented by its SMILES string) can be characterized its MACCS fingerprints. You must build and train a Conditional Generative DNN model for generating new and unique molecules based on target MACCS fingerprints. 
Preparing your submission with the starting kit is highly recommended. 

There are 2 phases:

  • Phase 1: Development phase. We provide you with training data and test data. Generate new small molecules such that the newly generated small molecules have similar MACCS fingerprints as the target MACCS fingerprints in test data. And you will receive feedback on your performance on the test set. The performance of your LAST submission will be displayed on the leaderboard.
  • Phase 2: Final phase. We provide you with the evaluation data. Generate new small molecules having similar MACCS fingerprints as the target MACCS fingerprints in the evaluation data. Your performance on the evaluation set will appear on the leaderboard when the organizers finish checking the submissions.

This sample competition allows you to submit:

  • The generated small molecules (please see the starting kit for more information on the format and a sample file).
  • A runnable docker image containing the conditional generative model that must be trained and tested.

The Generated Small Molecules

For each of the evaluation MACCS fingerprints, the participant must generate 1,000 unique new molecules. For example, if there are 100 evaluation MACCS fingerprints, then the participant must generate 100,000 new molecules. Please, note that the uniqueness of the molecules are checked within each set of 1,000 (for each evaluation MACCS fingerprints) such that there are no penalties for generating a specific molecule multiple times for different fingerprints as long as it is generated only once for each fingerprint.

The submissions are evaluated using the Tanimoto similarity metric. This metric computes the Tanimoto similarity between MACCS fingerprints of the newly generated small molecules and the target MACCS fingerprints. Only unique generated small molecules are considered, and the generated molecule should not be identical to any of the molecules in the training data. Newly generated molecules that are not unique or that are identical to any of the molecules in the training data set will be scored 0.0. The final score is calculated from the combination of the (1) mean score of the best 100 unique molecules generated for each of the evaluation MACCS fingerprints and (2) the mean score of all the 1,000 generated molecules for each target MACCS fingerprints. 
The final score equation: 0.7 * mean_top_100 + 0.3 * mean_total

A Runnable Docker Image

A runnable docker image should also be provided. The docker image should contain the trained model. The image should be pushed to your private repository on Docker Hub. The image should be pushed to a private repository so as to protect your results from the others. You can create a private repository here https://hub.docker.com/. To allow us to be able to run a container based on your private docker image, you must add "insilicotaiwan" as a collaborator to the private repository.

You can get started with Docker by checking its documentation https://docs.docker.com/docker-hub/. The Docker Forum https://forums.docker.com/c/docker-hub and the Docker Success Center https://success.docker.com/q/docker-hub may also be helpful.

MolHack 2019: Rules

One Account per Participant

You cannot sign up to CodaLab from multiple accounts and therefore you cannot submit from multiple accounts.

Submission Limits

Submissions must be made before the end of phase 1 and phase 2. You may submit 2 submissions every day and 60 in total.

Additional Rules

MolHack will be held in two stages. The first stage will last until March. 25, and will take place on CodaLab in-class platform. After the end of this stage, you will get access to the new test set and a separate leaderboard. You will have 6 days until March. 31 to make predictions on a new dataset and submit them to the platform.

See complete rules at molhack.com/molhack_official_rules

Development

Start: Feb. 24, 2019, 4 p.m.

Description: Development phase: create models and submit them or directly submit results on test data; feed-back are provided on the test set.

Final

Start: March 24, 2019, 4 p.m.

Description: Final phase: submissions on evaluation set are used to compute the final score. The results on the evaluation set will be revealed when the organizers make them available.

Competition Ends

March 31, 2019, 4 p.m.

You must be logged in to participate in competitions.

Sign In
# Username Score
1 vincentl 0.8956
2 simonw80 0.7877
3 atom1231 0.6673