Occluded Video Instance Segmentation (OVIS)

Organized by qjy - Current server time: March 30, 2025, 10:45 a.m. UTC

Current

Development

Aug. 1, 2021, midnight UTC

End

Competition Ends

Never

Overview
Evaluation
Terms and Conditions

Occluded Video Instance Segmentation (OVIS)

News

2022.05.17: Our paper is accepted by IJCV!
2022.05.17: The 2nd Occluded Video Instance Segmentation Challenge is held in ECCV 2022 Workshop on Multiple Object Tracking and Segmentation in Complex Environments. Call for papers!
[Important!]2022.05.08: As this old codalab website will be phased out this summer, we have moved our evaluation server to the new website.
2021.11.14: The code for our CMaskTrack R-CNN and metrics has been published on Github!
2021.10.10: The paper that introduces our dataset and the ICCV 2021 challenge is accepted by NeurIPS 2021 Datasets and Benchmarks Track!

OVIS (short for Occluded Video Instance Segmentation) is a new large scale benchmark dataset for video instance segmentation task. It is designed with the philosophy of perceiving object occlusions in videos, which could reveal the complexity and the diversity of real-world scenes.

Abstract

Can our video understanding systems perceive objects when a heavy occlusion exists in a scene?

To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16.3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. For more details, please refer to our website and paper.

Overview

OVIS Consists of:

296k high-quality instance masks
25 commonly seen semantic categories
901 videos with severe object occlusions
5,223 unique instances

Given a video, all the objects belonging to the pre-defined category set are exhaustively annotated. All the videos are annotated per 5 frames.

Distinctive Properties

Severe occlusions. The most distinctive property of our OVIS dataset is that a large portion of objects is under various types of severe occlusions caused by different factors.
Long videos. The average video duration and the average instance duration of OVIS are 12.77s and 10.05s respectively.
Crowded scenes. On average, there are 5.80 instances per video and 4.72 objects per frame.

Categories

The 25 semantic categories in OVIS are Person, Bird, Cat, Dog, Horse, Sheep, Cow, Elephant, Bear, Zebra, Giraffe, Poultry, Giant panda, Lizard, Parrot, Monkey, Rabbit, Tiger, Fish, Turtle, Bicycle, Motorcycle, Airplane, Boat, and Vehicle.

For a detailed description of OVIS, please refer to our paper.

For any questions or suggestions, please contact Jiyang Qi (jiyangqi AT hust.edu.cn).

You can evaluate your results on this tab.

We use AP (average precision) as our main evaluation metric. The code of our metric has been published by on github. For more details about the occlusion associated metrics (AP_SO, AP_MO, AP_HO)， please refer to our paper.

OVIS is under Attribution-NonCommercial-ShareAlike License (CC BY-NC-SA 4.0).

Development

Start: Aug. 1, 2021, midnight

Description: Development phase: create models and submit results on validation set.

Competition Ends

Never

You must be logged in to participate in competitions.

#	Username	Score
1	timi	42.57
2	deahuang	38.83
3	Anwesa	38.22