Automatically describing open-domain videos using rich natural sentences is among the most challenging tasks of computer vision, natural language processing and machine learning. When describing sequences of events, it is important to distinguish "who is who" in order to provide a coherent and informative narrative. In this challenge track we focus on locally identifying characters, given the rest of a description.
Predicting local character IDs means that it is not required to recognize each character globally (in an entire movie), but locally (within a set of 5 clips). Practically, this means that the submissions should predict unique character IDs that are consistent within a given set of 5 clips, that is perform a local character re-identification.
Note, that while the provided annotations do contain global character IDs for completeness, it is not required to generate such global IDs, but only to predict consistent IDs within each set of 5 clips. The segmentation of annotations into sets of 5 clips will be simply performed sequentially.
To participate, you should first create an account on CodaLab. In order to submit your results, please, perform these steps:
...
1020_01.34.22.115-01.34.25.586 [1020_PERSON1]
1020_01.34.25.586-01.34.27.708 [1020_PERSON2]
1020_01.34.27.729-01.34.30.622 [1020_PERSON1]
1020_01.34.32.867-01.34.35.540 [1020_PERSON2]
1020_01.34.35.540-01.34.40.073 [1020_PERSON1],[1020_PERSON3]
...
where multiple IDs are separated with ",";
Name your JSON file test_[your_algorithm_name]_results.csv and zip it in an archive.
Note, that we allow up to 5 submissions per day / 100 in total.
For consecutive sets of 5 clips (in the test set) we construct lists of all occuring ground-truth IDs, e.g: [1020_PERSON11], [1020_PERSON5], [1020_PERSON11], [1020_PERSON6],[1020_PERSON5]. If a set of clips contains none or a single ID, we skip such sets as they are trivial. For sets with 2 or more IDs we construct an upper triangular matrix of pairwise comparisons between IDs, where 1 is a "match" and 0 is "not a match", skipping the diagonal (which always consists of 1s). Same is done for the submitted predicted IDs. The two upper triangular matrices are then compared to each other, and the accuracy is obtained as the ratio of correct correspondances to the total number of elements. The final accuracy is averaged over all considered sets of clips.
The evaluation script is provided here for your convenience to enable offline evaluation on the validation set.
Start: Aug. 1, 2019, midnight
Description: Test set
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | JiwanChung | 0.673 |
2 | YASA | 0.648 |