The 2nd Large-scale Video Object Segmentation Challenge - Track 2: Video Instance Segmentation

Organized by fyc0624 - Current server time: Jan. 18, 2021, 7:56 a.m. UTC


Aug. 15, 2019, midnight UTC


June 1, 2019, midnight UTC


Competition Ends
Aug. 30, 2019, 11:59 p.m. UTC

The 2nd Large-scale Video Object Segmentation Challenge - Track 2: Video Instance Segmentation


Video object segmentation has been studied extensively in the past decade due to its importance in understanding video spatial-temporal structures as well as its value in industrial applications. Recently, data-driven algorithms (e.g. deep learning) have become the dominant approach to computer vision problems and one of the most important keys to their successes is the availability of large-scale datasets. Last year, we presented the first large-scale video object segmentation dataset named YouTubeVOS and hosted the 1st Large-scale Video Object Segmentation Challenge in conjuction with ECCV 2018. This year, we are thrilled to invite you to the 2nd Large-scale Video Object Segmentation Challenge in conjunction with ICCV 2019. The benchmark would be an augmented version of the YouTubeVOS dataset with more annotations. Some incorrect annotations are also corrected. For more details, check our website for the workshop and challenge.



  • Sep 5th: The final competition results will be announced and high-performance teams will be invited to give oral/poster presentations at our ICCV 2019 workshop.
  • Aug 15th-30th: Release the test dataset and open the submission of the test results.
  • Jun 1st: Setup the submission server on CodaLab and open the submission of the validation results.
  • May 20th: Release the training and validation dataset.



Video instance segmentation extends the image instance segmentation task from the image domain to the video domain. The new problem aims at simultaneous detection, segmentation and tracking of object instances in videos. Given a test video, the task requires not only the masks of all instances of a predefined category set to be labeled but also the instance identities across frames to be associated. A detailed explanation of the new task can be found in our paper.


We collected the first large-scale dataset for video instance segmentation, called YouTube-VIS, which is based on our initial YouTube-VOS dataset. Specifically, our new dataset has the following features.

  • 2,883 high-resolution YouTube videos
  • A category label set including 40 common objects such as person, animals and vehicles
  • 4,883 unique video instances
  • 131k high-quality manual annotations

We split the YouTube-VIS dataset into 2,238 training videos, 302 validation videos and 343 test videos.


Evaluation Criteria

We borrow the standard evaluation metrics in image instance segmentation with modification adapted to our new task. Specifically, the metrics are

  • Average Precision (AP). AP is defined as the area under the precision-recall (PR) curve. A confidence score (between 0 and 1) which measures the confidence of a predicted category for an instance is needed to plot the PR curve. AP is averaged over multiple intersection-over-union (IoU) thresholds. We follow the COCO evaluation metrics to use 10 IoU thresholds from 50% to 95% at step 5%.
  • Average Recall (AR). AR is defined as the maximum recall given some fixed number of segmented instances per video.

Both of the two metrics are first evaluated per category and then averaged over the category set.


The only modification made to the two standard metrics for our new task is the IoU computation. Because differently from image instance segmentation, each instance in a video contains a sequence of masks. Therefore, the IoU computation is carried out not only in the spatial domain, but also in the temporal domain, i.e. the sum of intersection at every single frame over the sum of union at every single frame.

The toolkit for evaluation is public on GitHub.

Note that the dataset is built on top of video object segmentation dataset in Track 1. It is not allowed to use the given segmentation masks in validation and test phases from Track 1.

Terms and Conditions

The annotations in this dataset belong to the organizers of the challenge and are licensed under a Creative Commons Attribution 4.0 License.

The data is released for non-commercial research purpose only.

The organizers of the dataset as well as their employers make no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose. Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify the organizers, against any and all claims arising from Researcher's use of the Database, including but not limited to Researcher's use of any copies of copyrighted videos that he or she may create from the Database. Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions. The organizers reserve the right to terminate Researcher's access to the Database at any time. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.


Start: June 1, 2019, midnight


Start: Aug. 15, 2019, midnight

Competition Ends

Aug. 30, 2019, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In
# Username Score
1 gb7 0.512
2 mikirui 0.482
3 linhj 0.482