Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Given the textual description of a person, your algorithm of the person search is required to rank all the samples in the person database then retrieve the most relevant sample corresponding to the queried description. The dataset adopted here is a large-scale person description dataset with detailed natural language annotations and person samples from various sources, termed as CUHK Person Description Dataset (CUHK-PEDES) [1]. Please be noted that the validation data on original CUHK-PEDES dataset will be added into train set and the test data on original CUHK-PEDES dataset will be used as validation set. New test data will be collected from MSMT17 [7]. You can also find an example code for training and validation on original CUHK-PEDES dataset from here.
We collected 43,264 images of 14,533 persons from six existing person re-identification datasets, CUHK03 [2], Market-1501 [3], SSM [4], VIPER [5], CUHK01 [6] and MSMT17[7] as the subjects for language descriptions. Since persons in Market-1501 and CUHK03 have many similar samples, to balance the number of persons from different domains, we randomly selected four images for each person in these two datasets. All the image were labeled by crowd workers from Amazon Mechanical Turk (AMT), where each image was annotated with two sentence descriptions and a total of 86,528 sentences were collected. The dataset incorporates rich details about person appearances, actions, poses and interactions with other objects. The sentence descriptions are generally long (> 23 words in average), and has abundant vocabulary and little repetitive information.
We provide all the meta data for this task in a JSON file. The structure of the JSON file is:
[
{
"file_path": "train_query/p11600_s14885.jpg",
"captions": [
"A woman is wearing a gray shirt, a pair of brown pants and a pair of shoes.",
"She is wearing a dark grey top and light colored pants."
],
"id": 11003,
"processed_tokens": [
["a", "woman", "wearing", "a", "gray", "shirt", "a", "pair", "of", "brown", "pants", "and", "a", "pair", "of", "shoes"],
["she", "has", "shoulder", "length", "brown", "hair", "she", "is", "wearing", "a", "dark", "grey", "top", "and", "light", "colored", "pants"]
],
"split": "train"
},
...
]
"split" Belonging to train or test set.
"captions" Two natural language descriptions.
"file_path" The save path of the image.
"processed_tokens" The processed sentence.
"id" Person ID of the image. There are 13,003 persons, so the "id" ranges from 1 to 13,003.
The submission file should be a zipped txt file. Please do not put the txt file in a folder, you should zip it directly.
For each query sentence in the test set, you must predict a comma-delimited list of candidates. The list should be sorted, such that the first candidate is considered the most relevant one, and the last the least relevant one. The file should contain the lauguage description as the query and the candidates list, which is delimited by '#'. An example is shown below:
Woman with her hair up is wearing a dark blue jacket with a bag on her right arm, black veiled stockings and black shoes. # 2376.jpg,2318.jpg,2060.jpg,2481.jpg,86.jpg,742.jpg,2710.jpg,1398.jpg,1614.jpg,1821.jpg
Please check the terms and conditions for further details.
[1] Li S, Xiao T, Li H, Zhou B, Yue D, Wang X. Person search with natural language description. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017.
[2] W. Li, R. Zhao, T. Xiao, and X. Wang. Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, pages 152–159, 2014.
[3] L. Zheng, L. Shen, L. Tian, S. Wang, J. Bu, and Q. Tian. Person re-identification meets image search. arXiv preprint arXiv:1502.02171, 2015.
[4] Xiao T, Li S, Wang B, Lin L, Wang X. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017.
[5] D. Gray, S. Brennan, and H. Tao. Evaluating appearance models for recognition, reacquisition, and tracking. In Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), number 5, 2007.
[6] W. Li, R. Zhao, and X. Wang. Human reidentification with transferred metric learning. In ACCV, pages 31–44, 2012.
[7] L. Wei, S. Zhang, W. Gao and Q. Tian} Person Trasfer GAN to Bridge Domain Gap for Person Re-Identification, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018.
We adopt the top-1 accuracy to evaluate the performance of person retrieval. Given a query sentence, all test images are ranked according to their affinities with the query.
Participants are recommended but not restricted to train their algorithms on the provided train and val sets. The CodaLab page of each track has links to the respective data. The test set is divided into two splits: test-dev and test-challenge. Test-dev is as the default test set for testing under general circumstances and is used to maintain a public leaderboard. Test-challenge is used for the workshop competition; results will be revealed at the workshop. When participating in the task, please be reminded that:
The datasets are released for academic research only and it is free to researchers from educational or research institutions for non-commercial purposes. When downloading the dataset you agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
Copyright © 2019, WIDER Consortium. All rights reserved. Redistribution and use software in source and binary form, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE AND ANNOTATIONS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
For more information, please refer to the challenge webpage or contact us at wider-challenge@ie.cuhk.edu.hk.
Start: May 10, 2019, 6:59 a.m.
Description: In this phase, you can submit the result of validation set and see your rank in leaderboard.
Start: June 11, 2019, 6:59 a.m.
Description: In this phase, we will release testing set and the leaderboard will show the result of testing set.
Aug. 2, 2019, 6:59 a.m.
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | Xiaojing | 0.5106 |
2 | Nuanyang | 0.5049 |
3 | ac5462 | 0.4431 |