Can we use the camera extrinsic/intrinsic calibration?
Posted by: isarandi @ July 12, 2020, 2:13 p.m.Camera intrinsics are never used during evaluation. It doesn't make sense to use them. In 3DPW the camera coordinate frame and the joint positions frame are not aligned. In most 3D pose estimation algorithms, the camera coordinate and body/joint coordinates are in the same frame. For example: https://arxiv.org/abs/1912.05656 The camera extrinsics are used to bring the two into alignment. If in your predictions, the camera and joint coordinate frames are not the same then you need a post-processing step where you bring the joint positions into the camera coordinate frame, but you cannot use the GT camera extrinsics during a post-processing step of your algorithm.
Posted by: aymen @ July 13, 2020, 7:45 a.m.So as I understand the intrinsics can theoretically be used (although not necessary), but the GT extrinsics are not part of the input.
Posted by: isarandi @ July 15, 2020, 4:10 p.m.Neither the GT extrinsics or intrinsics can be used during evaluation
Posted by: aymen @ July 17, 2020, 4:33 p.m.3D pose methods often use the camera intrinsics for various purposes, for example as part of scale recovery or for adjusting the prediction to account for the implied camera rotation. See Fig. 2 in the supplemental material of https://arxiv.org/pdf/1611.09813.pdf (Mehta et. al, 3DV'17).
Papers often argue the intrinsic calibration is cheap and simple to obtain in practical scenarios (either by calibrating the camera once, or by reading out focal length info from metadata). Therefore, I assume many participants will just simply use it. However, if it definitely cannot be used, I guess I'll just assume a field-of-view that is typical of smartphone cameras, or find some automatic calibration algorithm.
Posted by: isarandi @ July 17, 2020, 6:46 p.m.Actually, we don't use the perspective camera model. For monocular 3D mesh recovery, we usually use the weak-perspective camera model for simplification.
Have a nice day!
Thank you! I am still a bit confused. Quoting from Mehta et al. 3DV'17: "On our MPI-INF-3DHP test set perspective correction improves the PCK by 3 percent points. On HumanEva the improvement is up to 3 mm MPJPE, see Table 1. The correction is most pronounced for cameras with a large field of view, e.g. Go-Pro and similar outdoor cameras, and when the subject is located at the border of the view". So in general, the difference is noteworthy.
I'm not quite sure how to interpret that you did not use the perspective camera model. Are you referring to the construction of the ground truth of the 3DPW datasets? Does it mean that the ground truth itself is in this case somewhat distorted and therefore NOT applying Mehta et al.'s correction would give better results on 3DPW and applying it would make the score worse?
Posted by: isarandi @ July 18, 2020, 5:06 p.m.Of course, I understand your idea. It's all about the setting of the problem. 3DPW is usually used in the evaluation of methods for monocular 3D mesh recovery. The key difference is that we take the un-calibrated image as input. Please refer to the introduction part of HMR, CVPR, 2018. During evaluation, we take randomly sellected internet video as input, which is hard to get the accurate intrinsic matrix. As the evaluation protocal set, 3DPW is used for evaluation, without any fine-tuning. Based on this thought, we generally consider that the camera matrix is unkown.
Posted by: Arthursy @ July 20, 2020, 2:05 a.m.