Automatically describing open-domain videos using rich natural sentences is among the most challenging tasks of computer vision, natural language processing and machine learning. This year we introduce Large Scale Movie Description Challenge v2 (LSMDCv2), aiming at a more realistic and practical setting of multi-sentence movie description generation. Specifically, movie descriptions are evaluated on sets of 5 clips. When describing sequences of events, it becomes important to distinguish "who is who" in order to provide a coherent and informative narrative. Thus, the challenge will have a focus on identifying characters, rather than predicting generic "SOMEONE"-s in place of all the occurring character names.
We are interesetd in predicting local character IDs (see example above). That means that it is not required to recognize each character globally (in an entire movie), but locally (within a set of 5 clips). Practically, this means that the submissions should predict unique character IDs that are consistent within a given set of 5 clips, that is perform a local character re-identification.
The challenge consists of two phases: public test set evaluation and blind (where we will not provide the sentence descriptions) test set evaluation. The evaluation is performed on sets of 5 clips, i.e. reference and predicted descriptions are grouped for each 5 consecutive clips. We also report results evaluated per individual clip, for completeness.
In our automatic evaluation here we will focus on sentence content but ignore the predicted IDs. It is therefore required that the participants also submit to the complementary challenge track: "The Large Scale Movie Description Challenge (LSMDC) v2: Fill-in the Characters". Every approach will be evaluated in terms (1) sentence quality (this track) and (2) ability to fill-in the character IDs when the rest of a sentence is given ("Fill-in the Characters" track). (We are also considering an additional human evaluation to assess the correctness of the predicted IDs.)
To participate, you should first create an account on CodaLab. In order to submit your results, please, perform these steps:
Note, that we allow up to 5 submissions per day / 100 in total for the public test phase and 1 submision per day / 5 in total for the blind test phase.
We provide a baseline code for generating movie descriptions with "SOMEONE"-s: https://github.com/jamespark3922/lsmdc-baseline
We thank the "Microsoft COCO Image Captioning Challenge" organizers for sharing the evaluation code.
The MS COCO Caption Evaluation API is used to evaluate results. The software uses both candidate and reference captions, applies sentence tokenization, and output several performance metrics including BLEU-1, BLEU-2, BLEU-3, BLEU-4, ROUGE-L, METEOR and CIDEr-D. More details can be found in the paper Microsoft COCO Captions: Data Collection and Evaluation Server.
Start: Sept. 1, 2019, midnight
Description: Public Test set
Start: Sept. 1, 2019, midnight
Description: Blind Test set
Oct. 1, 2019, 11:59 p.m.
You must be logged in to participate in competitions.Sign In