Convolutional neural networks have yielded unprecedented progress on a wide range of image-centric benchmarks, driven through a combination of well-annotated datasets and end-to-end training. However, naively extending this approach from images to higher-level video understanding tasks quickly becomes prohibitive with respect to the computation and data annotation required to jointly train multi-modal high-capacity models. An attractive alternative is to repurpose collections of existing pretrained models as "experts", offering representations which have been specialised for semantically relevant machine perception tasks. In addition to efficacy, this approach offers a second key advantage---it encourages researchers without access to industrial computing clusters to contribute towards questions of fundamental importance to video understanding: How should temporal information be used to maximum effect? How best to exploit complementary and redundant signals across different modalities? How can models be designed that function robustly across different video domains? To stimulate research into these questions, we are hosting this challenge that focuses on learning from videos and language with experts: making available a diverse collection of carefully curated visual and audio pre-extracted features across a set of five influential video datasets as part of a "pentathlon" of video understanding.
Samuel Albanie, VGG, University of Oxford
Yang Liu, VGG, University of Oxford
Arsha Nagrani, VGG, University of Oxford
Antoine Miech, INRIA
Ernesto Coto, VGG, University of Oxford
Ivan Laptev, INRIA
Rahul Sukthankar, Google/Robotics Institute - Carnegie Mellon University
Bernard Ghanem, VCC, King Abdullah University of Science and Technology
Andrew Zisserman, VGG, University of Oxford
We are extremely grateful to the authors of the original dataset papers for allowing us to use their data as part of the challenge.
Our use of a pentathlon scoring metric for this workshop was inspired by the visual decathlon challenge.
This work is supported by the EPSRC programme grant Seebibyte EP/M013774/1: Visual Search for the Era of Big Data.
The goal of this challenge is to build a system to retrieve videos from natural language queries. The data for the challenge consists of videos and queries from a "pentathlon" of five video retrieval benchmarks. Each dataset comprises a collection of videos with natural language descriptions (some statistics are given below). The final score of a system will be a weighted combination of its performance on the test set of each of the five datasets (more details on the metrics are given below). This can be done by training a single shared model across all the datasets, or training separate models per-dataset.
Via this CodaLab competition, teams will be able to compare the results of their system with the competition's blind set of results, during (and even after) the evaluation period. To participate, open the competition page and click on the "Participate" tab. Then accept the Terms and Conditions and click on the "Register" button.
Registration is conditioned to the use of an institutional e-mail (e.g., your university e-mail, your company e-mail, etc.). The organizers will check your registration details and approve your registration as long as an institutional e-mail was used.
After you registration is approved you will be able to upload files via the "Participate" tab. Note that a team must only submit to CodaLab the outputs of its system, not the system itself. Make sure you have read all the submission instructions under the "Participate" tab before uploading any files.
Statistics of the five datasets that form the pentathlon are in the workshop's challenge page, together with information about different partitions of the data.
The score of your submission will be calculated according to the metrics explained at the workshop's challenge page.
Participants are welcome to use their own codebase if preferred. However, to get started, we recommend using the baseline implementation provided as part of the Collaborative Experts codebase. This baseline will produce submission results in the format required for upload to the CodaLab server.
The results of this CodaLab competition will be announced at the challenge workshop, where we will invite presentations from the most exciting and novel submissions, as well as from the challenge winners. The challenge workshop will be held on the 15th of June 2020, in conjunction with CVPR 2020.
We ask that all participants who wish to be considered in the final leaderboard submit a short PDF summary describing their submission. The details about this summary will be published at the workshop's challenge page. The leaderboard will decide the winners for the challenge.
Participation in this competition is open to all who are interested and willing to comply with the rules laid out under the "Learn the Details" and "Participate" tabs, as well as the workshop's challenge page. We reserve the right to revoke access to the competition to any team or participant breaking these rules.
There is no cost to participate, although teams are encouraged to submit a paper to the corresponding CVPR 2020 workshop, to be held on the 15th of June 2020.
By accepting these Terms and Conditions, you also agree to us storing your submission results for evaluation purposes.
In case of any issues, all decisions made by the Organizing Committee will be final.
There are instructions for downloading data, see "Downloading the data" at the workshop's challenge page.
Start: April 9, 2020, midnight
Description: VAL partition submissions for the challenge workshop that will be held in conjunction with CVPR 2020
Start: May 9, 2020, midnight
Description: Submissions to the VAL partition for comparison with previous ones. Not to be taken into account for the challenge workshop
Start: May 9, 2020, 12:01 a.m.
Description: TEST partition submissions for the challenge workshop that will be held in conjunction with CVPR 2020
Start: June 3, 2020, midnight
Description: THE CHALLENGE IS FINISHED. SUBMISSIONS TO THIS PHASE WILL FAIL. Please submit to: Permanent - VAL partition
You must be logged in to participate in competitions.Sign In