YouCook2 Dense Video Captioning

Organized by youcook2 - Current server time: Oct. 22, 2019, 6:50 a.m. UTC

Current

Final Phase
Dec. 29, 2018, midnight UTC

End

Competition Ends
Never

Overview

YouCook2 is the largest task-oriented, instructional video dataset in the vision community. It contains 2000 long untrimmed videos from 89 cooking recipes; on average, each distinct recipe has 22 videos. The procedure steps for each video are annotated with temporal boundaries and described by imperative English sentences (see the example below). The videos were downloaded from YouTube and are all in the third-person viewpoint. All the videos are unconstrained and can be performed by individual persons at their houses with unfixed cameras. YouCook2 contains rich recipe types and various cooking styles from all over the world. Explore the dataset or read more details.
YouCook2 is currently suitable for video-language research, weakly-supervised activity and object recognition in video, common object and action discovery across videos and procedure learning.
This evaluation server is for dense video captioning on YouCook2 testing set. More details regarding the task and dataset can be found here.
 

Evaluation

We use the same evaluation code as in here. We evaluate the model on both localizing and describing events. The metric first finds the proposals that have tIoU overlapping with arbitrary GT segment higher than a threshold (in our case 0.3, 0.5, 0.7, and 0.9). Then it measures the caption quality against GT caption (e.g., BLEUMETEOR). Proposals without significant overlappings will have 0 language scores. Up to 1000 proposals are considered. More details please refer to the original paper.

Rules

You may submit 1 submissions every day and 100 in total.

This challenge is governed by the general ChaLearn contest rules.

Final Phase

Start: Dec. 29, 2018, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In