Just a couple of updates:
1. The test data is now available.
2. It has been noted that for two of the datasets (MSR-VTT and MVSD) there are multiple descriptions per video. It is permitted to exploit this information (e.g. by combining the queries together, or by averaging their embeddings when retrieving from a group of queries that belong to the same video). These groupings can be seen by inspecting the raw-captions.pkl files contained in the releases (in which captions are grouped by video in a python list).