if you have any question please email to torabi.atousa@gmail.com
Natural language-based video and image search has been a long standing topic of research among information retrieval, multimedia, and computer vision communities. Several existing on-line platforms (e.g. Youtube) rely on massive human curation efforts, manually assigned tags, however as the amount of unlabeled video content grows, with advent of inexpensive mobile recording devices (e.g. smart phones), the focus is rapidly shifting to automated understand, tagging and search. In this challenge, we would like to explore a variety of different joint language-visual learning models for video annotation and retrieval task, which is based on a unified version of the recently published large-scale movie datasets (M-VAD and MPII-MD). More information about the datasets and challenge can be found here.
Multiple-Choice Test: Given a video query and 5 captions, find the correct caption for the video among 5 possible choices. The evaluation is only on public test set that we provided for multiple-choice test.
Other challenges:
To participate, you should first create an account on CodaLab. In order to submit your results, please, perform these steps:
Note, that we allow up to 10 submission a day. In total maximum submission per team is 100.
The evaluation is based on accuracy which is the percentage of correctly answered questions in multiple-choice test.
Winners will be selected based on a maximum accuracy of submissions on the multiple-choice test.
Start: Aug. 25, 2016, midnight
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | MERLOT | 81.730 |
2 | yj | 78.150 |
3 | danieljf24 | 75.170 |