It's allowed to predict at most 10 human boxes. I think it's not enough and a little unfair for scenes with lots of people. For exsample, there are 20 persons in an image and your prediction have 10 persons, but unfortunately the "gt" labeled the othre 10 persons, then you may got 0 after the evaluation. Maybe we can loosen it to 20. As far as I can see, 20 is enough.
Posted by: fangwudi @ June 13, 2021, 7:08 a.m.