Webvision - Video track

Organized by hildekuehne - Current server time: Jan. 21, 2021, 2:23 p.m. UTC


First phase
Feb. 15, 2020, midnight UTC


Competition Ends


Welcome to the Webvision Video Challenge!


in conjunction with the 4th Workshop on Visual Understanding by Learning from Web Data at CVPR2020.

The idea of workshop is to advance the area of learning knowledge and representation from web data and for this challenge to learn actions mentioned in those videos, e.g. "crack egg", or "add butter" without any human generated labels from subtitles only. 

This challenge is for learning from videos/frames!

Please find all details and links here: https://data.vision.ee.ethz.ch/cvl/webvision/challenge2.html



WebVision Video Track

We use the task of video alignment to test the quality of you classifier. Alignment means that the transcripts of the actions ( i.a. the action labels in the right order) are already given and the task is to find the right boundaries for the given actions in the video. We know from previous work on weak learning for video sequences (see e.g. https://ieeexplore.ieee.org/document/8585084, https://arxiv.org/abs/1610.02237) that this task is usually a good surrogate for the overall classification accuracy. In this case it helps to avoid any language inconsistencies as it aligns the output to the correct action labels only and ignores the rest. It is therefore not so important which score was given to "mix_egg" or "beat_egg", as only the scores of the class "whisk_egg" would be considered (if this was the annotation).

We use Intersection over Union (IoU) as accuracy measurement.

Submission Policy

To encourage more teams to participate in this challenge, we will maintain a leaderboard to show the recognition results of all teams. Each team can submit one time with 5 results. The final rank is based on the best of 5 results in the final submission for each team.

General rules

Please find all details and links here: https://data.vision.ee.ethz.ch/cvl/webvision/challenge2.html

Training data: You are allowed to use the yes/no validation data listed in the 'val_yes_no.txt' file (here:) for validation and/or training. It's only a few clips per class, so the assumption is that it will not get you all the way, but any new ideas are welcome.

Subtitles: You are only allowed to use the orginal subititles or the generated labels from the baseline. Please do not! download new subtitles as they can change over time and we would not be able to compare your methods to others any more.

Validation and testing: You can use the test set of the original dataset as validation set. It is not allowed to include the data from the test set as additional training data!

As a rule of thumb, please keep everything reproductionable! 


Important Dates

June 07, 2020 Final submission deadline
June 10, 2020 Challenge results are released
June 18, 2020 Workshop date (co-located with CVPR 2020)

All deadlines are at 23:59 Pacific Standard Time.


Award will be given to top two performers of each track. In addition two top ranked participants will be invited to give an oral presentation at the CVPR Workshop 2020. The award is conditioned on (i) attending the workshop, (ii) making an oral presentation of the methods used in the challenge.

Terms and Conditions

By downloading the image data for this challenge you agree to the following terms:

  1. You will not distribute the images.
  2. ETH Zurich makes no representations or warranties regarding the data, including but not limited to warranties of non-infringement or fitness for a particular purpose.
  3. You accept full responsibility for your use of the data and shall defend and indemnify ETH Zurich, including its employees, officers and agents, against any and all claims arising from your use of the data, including but not limited to your use of any copies of copyrighted images that you may create from the data.

First phase

Start: Feb. 15, 2020, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In
# Username Score
1 qwang 0.103250