Automatically describing open-domain videos using rich natural sentences is among the most challenging tasks of computer vision, natural language processing and machine learning. When describing sequences of events, it is important to distinguish "who is who" in order to provide a coherent and informative narrative. In this challenge track we focus on locally identifying characters, given the rest of a description.
Predicting local character IDs means that it is not required to recognize each character globally (in an entire movie), but locally (within a set of 5 clips). Practically, this means that the submissions should predict unique character IDs that are consistent within a given set of 5 clips, that is perform a local character re-identification.
Note, that while the provided annotations do contain global character IDs for completeness, it is not required to generate such global IDs, but only to predict consistent IDs within each set of 5 clips. The segmentation of annotations into sets of 5 clips will be simply performed sequentially.
To participate, you should first create an account on CodaLab. In order to submit your results, please, perform these steps:
where multiple IDs are separated with ",";
Name your JSON file test_[your_algorithm_name]_results.csv and zip it in an archive.
Note, that we allow up to 5 submissions per day / 100 in total.
For consecutive sets of 5 clips (for the entire test set) we construct lists of all occuring ground-truth IDs, e.g: [1020_PERSON11], [1020_PERSON5], [1020_PERSON11], [1020_PERSON6],[1020_PERSON5]. We transform this list into "local" IDs: 1, 2, 1, 3, 2. Similarly, we transofrm the predicted IDs, obtaining another list of local IDs. Both lists are compared and an accuracy is computed. The accuracy is then averaged over all sets of 5 clips. We report the final (average) accuracy.
Start: Aug. 1, 2019, midnight
Description: Public Test set
You must be logged in to participate in competitions.Sign In