The Large Scale Movie Description Challenge (LSMDC) v2: Multi-Sentence Description with Characters

Organized by arohrbach - Current server time: Nov. 17, 2019, 10:11 a.m. UTC

Previous

Public Test
Sept. 1, 2019, midnight UTC

Current

Blind Test
Sept. 1, 2019, midnight UTC

End

Competition Ends
Never

Overview

Automatically describing open-domain videos using rich natural sentences is among the most challenging tasks of computer vision, natural language processing and machine learning. This year we introduce Large Scale Movie Description Challenge v2 (LSMDCv2), aiming at a more realistic and practical setting of multi-sentence movie description generation. Specifically, movie descriptions are evaluated on sets of 5 clips. When describing sequences of events, it becomes important to distinguish "who is who" in order to provide a coherent and informative narrative. Thus, the challenge will have a focus on identifying characters, rather than predicting generic "SOMEONE"-s in place of all the occurring character names.

We are interesetd in predicting local character IDs (see example above). That means that it is not required to recognize each character globally (in an entire movie), but locally (within a set of 5 clips). Practically, this means that the submissions should predict unique character IDs that are consistent within a given set of 5 clips, that is perform a local character re-identification.

The challenge consists of two phases: public test set evaluation and blind (where we will not provide the sentence descriptions) test set evaluation. The evaluation is performed on sets of 5 clips, i.e. reference and predicted descriptions are grouped for each 5 consecutive clips. We also report results evaluated per individual clip, for completeness.

In our automatic evaluation here we will focus on sentence content but ignore the predicted IDs. It is therefore required that the participants also submit to the complementary challenge track: "The Large Scale Movie Description Challenge (LSMDC) v2: Fill-in the Characters". Every approach will be evaluated in terms (1) sentence quality (this track) and (2) ability to fill-in the character IDs when the rest of a sentence is given ("Fill-in the Characters" track). (We are also considering an additional human evaluation to assess the correctness of the predicted IDs.)

Participation

To participate, you should first create an account on CodaLab. In order to submit your results, please, perform these steps:

      • To officially take part in a challenge you have to submit your results on both, public and blind test sets.
      • Convert your generated descriptions into the following JSON format:
        [
        {
        "video_id": int,
        "caption": str,
        },
        ...
        ]
        where "video_id" are integer numbers starting with 1.
      • Name your JSON file publictest_[your_algorithm_name]_results.json or blindtest_[your_algorithm_name]_results.json, depending on a challenge phase, zip it in an archive.
      • Go to "Participate" tab, click "Submit / View Results" and select the respective challenge phase.
      • Fill in the form (specify any external training data used by your algorithm in the "Method description" field) and upload your ZIP archive.
      • Click "Refresh Status" to see how your submission is being processed. In case of errors, please, check and correct your submission.
      • Once the submission is successfully processed, you can view your scores via "View scoring output log" and click "Post to leaderboard" to make your results publicly available. You can  access the detailed evaluation output via "Download evaluation output from scoring step".

Note, that we allow up to 5 submissions per day / 100 in total for the public test phase and 1 submision per day / 5 in total for the blind test phase.

Baseline

We provide a baseline code for generating movie descriptions with "SOMEONE"-s: https://github.com/jamespark3922/lsmdc-baseline

Acknowledgement

We thank the "Microsoft COCO Image Captioning Challenge" organizers for sharing the evaluation code.

The MS COCO Caption Evaluation API is used to evaluate results. The software uses both candidate and reference captions, applies sentence tokenization, and output several performance metrics including BLEU-1, BLEU-2, BLEU-3, BLEU-4, ROUGE-L, METEOR and CIDEr-D. More details can be found in the paper Microsoft COCO Captions: Data Collection and Evaluation Server.

Public Test

Start: Sept. 1, 2019, midnight

Description: Public Test set

Blind Test

Start: Sept. 1, 2019, midnight

Description: Blind Test set

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 JiwanChung 0.088