DeepFashion2 Challenge 2020 - Track 1 Clothes Landmark Estimation Forum

Go back to competition Back to thread list Post in this thread

> "View scoring output log" doesn't show AP or AR results

After submission finished, all I got are shown below. Is that normal? How can I know my model's performance on AP or AR metrics? Does the "SCORE" correspond to AP?
"test_keypoints.json
/tmp/codalab/tmpa5hBVu/run/input/res/test_keypoints.json
loading annotations into memory...
Done (t=18.14s)
creating index...
index created!
Loading and preparing results...
DONE (t=56.81s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *keypoints*
DONE (t=127.88s).
Accumulating evaluation results...
DONE (t=4.38s). "

Posted by: litepose @ March 28, 2020, 2:03 p.m.

The score means AP at OKS=.50:.05:.95, which is the primary challenge metric.

Posted by: geyuying @ March 28, 2020, 2:22 p.m.

Thank you.
For validation set, I got much lower score by using this online evaluation than locally using deepfashion2_api(https://github.com/switchablenorms/DeepFashion2/tree/master/evaluation).
Could you please give me some advice?

Posted by: litepose @ March 28, 2020, 3:02 p.m.

Hi, I didn't see your submission in development phase, which is evaluated with the validation set. Did you upload your result to development phase rather than test phase for validation results?

Posted by: geyuying @ March 28, 2020, 3:09 p.m.

Can you see it now? I'm confused about why the online evaluation result is not the same with my local testing with deepfashion2_api.
Anything wrong in my json file? Please help me.

Posted by: litepose @ March 28, 2020, 3:28 p.m.

Hi, I saw several submissions. Which one is not the same with your local testing with deepfashion2_api?

Posted by: geyuying @ March 29, 2020, 2:59 a.m.

The submission-6 is my result that gets lower score by online evaluation than by my local testing with the deepfashion2_api. And the ground truth is generated with deepfashion2_to_coco.py.
Could you please help me to check whether my result json is in correct format? Or is there anything else I can do to solve the problem? Thanks a lot.

Posted by: litepose @ March 29, 2020, 6:59 a.m.

We provide ground-truth for evaluating validation set as keypoints_val_vis_and_occ.json in dataset folder: json_for_validation. Have you ever tried locally testing your result with keypoints_val_vis_and_occ.json instead of the ground-truth file generated by yourself?

Posted by: geyuying @ March 29, 2020, 8:24 a.m.

Yes, I just tried to evaluate my result with the ground truth of json_for_validation/keypoints_val_vis_and_occ.json and got the exactly same performance as using my own generated ground truth.

Posted by: litepose @ March 29, 2020, 8:36 a.m.

Hi, I just downloaded your submission file and the deepfashion2_api on github. With the json_for_validation/keypoints_val_vis_and_occ.json as ground-truth, below is the result:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.268
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.435
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.283
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.172
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.269
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.368
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.552
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.394
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.170
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.370

The Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.268 is consistent with the score shown in your submitted result.

Posted by: geyuying @ March 29, 2020, 1:28 p.m.

Thank you for your help. I guess I found the problem and need more work to do.

Posted by: litepose @ March 29, 2020, 1:48 p.m.
Post in this thread