OVIS (short for Occluded Video Instance Segmentation) is a new large scale benchmark dataset for video instance segmentation task. It is designed with the philosophy of perceiving object occlusions in videos, which could reveal the complexity and the diversity of real-world scenes.
Can our video understanding systems perceive objects when a heavy occlusion exists in a scene?
To answer this question, we collect a large-scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16.3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. For more details, please refer to our website and paper.
OVIS Consists of:
Given a video, all the objects belonging to the pre-defined category set are exhaustively annotated. All the videos are annotated per 5 frames.
Distinctive Properties
Categories
The 25 semantic categories in OVIS are Person, Bird, Cat, Dog, Horse, Sheep, Cow, Elephant, Bear, Zebra, Giraffe, Poultry, Giant panda, Lizard, Parrot, Monkey, Rabbit, Tiger, Fish, Turtle, Bicycle, Motorcycle, Airplane, Boat, and Vehicle.
For a detailed description of OVIS, please refer to our paper.
For any questions or suggestions, please contact Jiyang Qi (jiyangqi AT hust.edu.cn).
Start: Aug. 1, 2021, midnight
Description: Development phase: create models and submit results on validation set.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | timi | 42.57 |
2 | deahuang | 38.83 |
3 | Anwesa | 38.22 |