As I understand, in the track with known association, we can run our method and then, per frame, pick out our best matching detected 3D pose for each ground truth 2D pose.
In the track without known association, how do we proceed? Several of the 3DPW videos contain multiple people, some of whom are not annotated. Can we perform a sequence level (not frame-level) matching to the ground truth in order to know which person is annotated? The eval code on Github seems to assume that the predictions are already ordered the same way as the ground truth pose sequences.
Posted by: isarandi @ July 15, 2020, 4:15 p.m.The website has been updated to clear all ambiguities. The first frame GT data can be used to ascertain the number and identity of people tracked
Posted by: aymen @ July 17, 2020, 10:07 p.m.