Predicting Generalization in Deep Learning

Organized by ydjiang - Current server time: Jan. 19, 2021, 9:55 p.m. UTC

First phase

Development Phase
July 14, 2020, midnight UTC


Competition Ends
Nov. 1, 2020, 1:41 a.m. UTC

Predicting the generalization performance of neural networks!

This competition is a part of NeurIPS 2020 competition track.

Welcome! In this competition, the competitors are asked to submit a python function whose input is a trained neural network and its training data and output is a complexity measure or generalization predictor that quantifies how well the trained model generalizes on the test data. You can find general information of the competition at

Getting Started

You can get a starting kit which contains a sample submission in the get_starting_kit tab to the left. You can also download sample data that we have prepared, but you are not obligated to use them (since they are not drawn from the same distribution as the test data). The starting kit already contains a sample submission that can be submitted. In addition, the starting kit also contains a number of example baselines that demonstrate how to effectively use the API's, if you are not familiar with Keras. When testing locally, please make sure that you are using Tensorflow 2.2. If you are familiar with Docker, it would be good to test the code in our docker environment to make sure everything runs correctly. Alternatively, we prepared a Colab notebook where you can test whether your submission runs correctly on a single model without the need for setting up all software dependencies.

Make / Join Teams

You are expected to form teams; however, there are no minimum or maximum number for how many participants are allowed on each team. Each participant should only be on one team. Teams will be approved by organizers but you may add team members later. For more details, check If you have potential conflict of interests with the organizing teams and are thus not eligible for winning the competition, please apply to join the team COI or create a team with the COI prefix (e.g. COI-Google).

Make Submissions

To make a submission, click on the Participate tab above, then select Submit/View Result tab on the left, and finally click on the Submit button to upload your submission. If you are new to Codalab and wants to learn more, please check for more information about how to use the platform.

Note: Please make sure you have a metadata in your submission. Otherwise Codalab would not recongnize your submission. The content does not matter.

You may use the the Forums above to start discussions. You can also reach the organizers at


In this competition, the competitors are asked to write a Python function whose input is a trained neural network and its training data and output is a complexity measure or generalization predictor that quantifies how well the trained model generalizes on the test data. The competition will be separated into 2 phases: development phase and evaluation phase each with its own set of neural networks. Please make sure in the submission there is a python script called and inside there is a function called complexity that takes in a Keras model and Tensorflow dataset and outputs a scalar value that is your solution.

The competitors can only submit a fixed number of solutions everyday and the submission must finish within a given time budget. In the evaluation phase, the competitors have a limited number of chance to submit new solutions. The solutions in this phase are first run on development phase data to make sure that it finishes within time budget. 

There are 2 phases:

  • Phase 1: development phase. The competitor can submit their solutions which are evaluated on the private dataset 1, which contain different neural network architectures trained on different data from the data provided to competitor.
  • Phase 2: evaluation phase. The competitors have a limited number of chance to submit new solutions. The solutions in this phase are first run on development phase data and then on the private dataset 2 if it finishes within time.

The submissions are evaluated using the conditional mutual information metric outlined in this document, originally proposed in this paper. The minimum score is 0.0 and the maximum score is 1.0. We multiply the final score by 100 so the score ranges from 0 to 100.



A participant may submit 3 submissions every day and 150 in total. While there are no limit on submissions per team, please be mindful that computational resource is shared between all participants of the competition and please be reasonable and kind and refrain from submitting large numbers of submissions in parallel!

This challenge is governed by these rules. Participating in the competition means that you have read and agree with these rules.

Download Size (mb) Phase
Starting Kit 3.850 #1 Development Phase
Public Data 8994.553 #1 Development Phase

Note: Information below may be adjusted as the competition continues based on the computational demand and availability.

Hardware Specs and Compute Budget

The submissions will be run on virtual machines with the following hardware spesc on Google Cloud:

  • 4 virtual CPU's with Intel Broadwell platform
  • 26 GB memory
  • 1 Nvidia K80 GPU

We are allowing 3 submissions per day per team, and 150 submissions over the course of the competition. Although models differ in size, the participants submission is expected to finish on average within 5 minutes per model (amortized). Submissions exceeding this time limit will receive the minimum score.

Python packages and Docker Image

Your submission will be run in the following docker: scottyak00/codalab-00:gpu-worker-tf-2.2

You can find this docker image on Docker Hub. This image contains most common packages; however, if you believe there are necessary packages missing, please contact the organizing team and we will try to add your package to the best of our abilities.

If you are not familiar with docker, you make sure your local environment or virtual environemnt use Tensorflow 2.2.

Potential for new tracks

During our pilot testing phase, we tested and confirmed that data augmentation is one of the easiest ways for participants to earn a high score in this competition. As one of the goals for this competition is to help researchers discover measures that can grant theoretical insight into the phenomenon of generalization in deep learning, we would like to note that, if necessary, we may create a separate track for submissions that do not use data augmentation (or other potential “exploitative” techniques), in order to encourage submissions that produce new insight into the phenomena of generalization. We also reserve the right to change the data and model during phase 1 if we learn that the competition is too exploitable.

Best of luck to everyone!

Access files in your submission

Some submissions may need to access files that cannot be included in the python program (e.g. trained weights). In this case, you may add an additional named argument "program_dir" to your complexity function. The absolute path to your submission directory will be passed into this argument.


Development Phase

Start: July 14, 2020, midnight

Description: Development phase: submit and test your solutions on the phase 1 private data.

Final Phase

Start: Oct. 22, 2020, midnight

Description: Your solutions will be run on the phase 2 private data. Please include names and email addresses of your team memebers inside the metadata.

Competition Ends

Nov. 1, 2020, 1:41 a.m.

You must be logged in to participate in competitions.

Sign In