Video object segmentation has been studied extensively in the past decade due to its importance in understanding video spatial-temporal structures as well as its value in industrial applications. Recently, data-driven algorithms (e.g. deep learning) have become the dominant approach to computer vision problems and one of the most important keys to their successes is the availability of large-scale datasets. Last year, we presented the first large-scale video object segmentation dataset named YouTubeVOS and hosted the 1st Large-scale Video Object Segmentation Challenge in conjuction with ECCV 2018. This year, we are thrilled to invite you to the 2nd Large-scale Video Object Segmentation Challenge in conjunction with ICCV 2019. The benchmark would be an augmented version of the YouTubeVOS dataset with more annotations. Some incorrect annotations are also corrected. For more details, check our website for the workshop and challenge.
Video instance segmentation extends the image instance segmentation task from the image domain to the video domain. The new problem aims at simultaneous detection, segmentation and tracking of object instances in videos. Given a test video, the task requires not only the masks of all instances of a predefined category set to be labeled but also the instance identities across frames to be associated. A detailed explanation of the new task can be found in our paper.
We collected the first large-scale dataset for video instance segmentation, called YouTube-VIS, which is based on our initial YouTube-VOS dataset. Specifically, our new dataset has the following features.
We split the YouTube-VIS dataset into 2,238 training videos, 302 validation videos and 343 test videos.
We borrow the standard evaluation metrics in image instance segmentation with modification adapted to our new task. Specifically, the metrics are
Both of the two metrics are first evaluated per category and then averaged over the category set.
The only modification made to the two standard metrics for our new task is the IoU computation. Because differently from image instance segmentation, each instance in a video contains a sequence of masks. Therefore, the IoU computation is carried out not only in the spatial domain, but also in the temporal domain, i.e. the sum of intersection at every single frame over the sum of union at every single frame.
The toolkit for evaluation is public on GitHub.
Note that the dataset is built on top of video object segmentation dataset in Track 1. It is not allowed to use the given segmentation masks in validation and test phases from Track 1.
The annotations in this dataset belong to the organizers of the challenge and are licensed under a Creative Commons Attribution 4.0 License.
The data is released for non-commercial research purpose only.
The organizers of the dataset as well as their employers make no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose. Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify the organizers, against any and all claims arising from Researcher's use of the Database, including but not limited to Researcher's use of any copies of copyrighted videos that he or she may create from the Database. Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions. The organizers reserve the right to terminate Researcher's access to the Database at any time. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.
Start: June 1, 2019, midnight
Start: Aug. 15, 2019, midnight
Aug. 30, 2019, 11:59 p.m.
You must be logged in to participate in competitions.Sign In