Send CodaLab user ID to firstname.lastname@example.org
Verify your residency of Taiwan, ROC, by sending scan or photo of your personal ID
Submit your predictions by the submission deadline of March 31, 2019
An important aspect of drug discovery is the capability to design new molecules that have some characteristics of interest. Such characteristics may be (approximately) encoded in the molecule's fingerprint (such as MACCS fingerprint). The capability to generate new molecules that have fingerprints of interests is therefore very important. Such a task may be achieved through a carefully designed and trained Deep Neural Network (DNN) that is conditioned on molecules fingerprints. Such a model can be said to be a Conditional Generative DNN model for generating new and unique molecules based on target MACCS fingerprints.
We seek to model the generation of small molecules that have some specific properties/features as a Deep Learning problem. Since the characteristics of a molecule may be encoded in its MACCS fingerprint, the goal is to build and train a Conditional Generative DNN model for generating new and unique molecules based on target MACCS fingerprints.
At the minimum, the MolHack challenge will require the participants to build and train a Conditional Generative DNN model that can generate new small molecules such that the newly generated small molecules have similar MACCS fingerprints as the target MACCS fingerprint upon which the generation of the molecules is conditioned.
The generation of small molecules that have some specific properties/features is modeled as a Deep Learning problem. Each sample molecule (represented by its SMILES string) can be characterized its MACCS fingerprints. You must build and train a Conditional Generative DNN model for generating new and unique molecules based on target MACCS fingerprints.
Preparing your submission with the starting kit is highly recommended.
There are 2 phases:
This sample competition allows you to submit:
For each of the evaluation MACCS fingerprints, the participant must generate 1,000 unique new molecules. For example, if there are 100 evaluation MACCS fingerprints, then the participant must generate 100,000 new molecules. Please, note that the uniqueness of the molecules are checked within each set of 1,000 (for each evaluation MACCS fingerprints) such that there are no penalties for generating a specific molecule multiple times for different fingerprints as long as it is generated only once for each fingerprint.
The submissions are evaluated using the Tanimoto similarity metric. This metric computes the Tanimoto similarity between MACCS fingerprints of the newly generated small molecules and the target MACCS fingerprints. Only unique generated small molecules are considered, and the generated molecule should not be identical to any of the molecules in the training data. Newly generated molecules that are not unique or that are identical to any of the molecules in the training data set will be scored 0.0. The final score is calculated from the combination of the (1) mean score of the best 100 unique molecules generated for each of the evaluation MACCS fingerprints and (2) the mean score of all the 1,000 generated molecules for each target MACCS fingerprints.
The final score equation: 0.7 * mean_top_100 + 0.3 * mean_total
A runnable docker image should also be provided. The docker image should contain the trained model. The image should be pushed to your private repository on Docker Hub. The image should be pushed to a private repository so as to protect your results from the others. You can create a private repository here https://hub.docker.com/. To allow us to be able to run a container based on your private docker image, you must add "insilicotaiwan" as a collaborator to the private repository.
You can get started with Docker by checking its documentation https://docs.docker.com/docker-hub/. The Docker Forum https://forums.docker.com/c/docker-hub and the Docker Success Center https://success.docker.com/q/docker-hub may also be helpful.
You cannot sign up to CodaLab from multiple accounts and therefore you cannot submit from multiple accounts.
Submissions must be made before the end of phase 1 and phase 2. You may submit 2 submissions every day and 60 in total.
MolHack will be held in two stages. The first stage will last until March. 25, and will take place on CodaLab in-class platform. After the end of this stage, you will get access to the new test set and a separate leaderboard. You will have 6 days until March. 31 to make predictions on a new dataset and submit them to the platform.
See complete rules at molhack.com/molhack_official_rules
Start: Feb. 24, 2019, 4 p.m.
Description: Development phase: create models and submit them or directly submit results on test data; feed-back are provided on the test set.
Start: March 24, 2019, 4 p.m.
Description: Final phase: submissions on evaluation set are used to compute the final score. The results on the evaluation set will be revealed when the organizers make them available.
March 31, 2019, 4 p.m.
You must be logged in to participate in competitions.Sign In