After submission finished, all I got are shown below. Is that normal? How can I know my model's performance on AP or AR metrics? Does the "SCORE" correspond to AP?
"test_keypoints.json
/tmp/codalab/tmpa5hBVu/run/input/res/test_keypoints.json
loading annotations into memory...
Done (t=18.14s)
creating index...
index created!
Loading and preparing results...
DONE (t=56.81s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *keypoints*
DONE (t=127.88s).
Accumulating evaluation results...
DONE (t=4.38s). "
The score means AP at OKS=.50:.05:.95, which is the primary challenge metric.
Posted by: geyuying @ March 28, 2020, 2:22 p.m.Thank you.
For validation set, I got much lower score by using this online evaluation than locally using deepfashion2_api(https://github.com/switchablenorms/DeepFashion2/tree/master/evaluation).
Could you please give me some advice?
Hi, I didn't see your submission in development phase, which is evaluated with the validation set. Did you upload your result to development phase rather than test phase for validation results?
Posted by: geyuying @ March 28, 2020, 3:09 p.m.Can you see it now? I'm confused about why the online evaluation result is not the same with my local testing with deepfashion2_api.
Anything wrong in my json file? Please help me.
Hi, I saw several submissions. Which one is not the same with your local testing with deepfashion2_api?
Posted by: geyuying @ March 29, 2020, 2:59 a.m.The submission-6 is my result that gets lower score by online evaluation than by my local testing with the deepfashion2_api. And the ground truth is generated with deepfashion2_to_coco.py.
Could you please help me to check whether my result json is in correct format? Or is there anything else I can do to solve the problem? Thanks a lot.
We provide ground-truth for evaluating validation set as keypoints_val_vis_and_occ.json in dataset folder: json_for_validation. Have you ever tried locally testing your result with keypoints_val_vis_and_occ.json instead of the ground-truth file generated by yourself?
Posted by: geyuying @ March 29, 2020, 8:24 a.m.Yes, I just tried to evaluate my result with the ground truth of json_for_validation/keypoints_val_vis_and_occ.json and got the exactly same performance as using my own generated ground truth.
Posted by: litepose @ March 29, 2020, 8:36 a.m.Hi, I just downloaded your submission file and the deepfashion2_api on github. With the json_for_validation/keypoints_val_vis_and_occ.json as ground-truth, below is the result:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.268
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.435
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.283
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.172
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.269
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.368
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.552
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.394
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.170
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.370
The Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.268 is consistent with the score shown in your submitted result.
Posted by: geyuying @ March 29, 2020, 1:28 p.m.Thank you for your help. I guess I found the problem and need more work to do.
Posted by: litepose @ March 29, 2020, 1:48 p.m.