In our recent publication we presented the challenging FreiHAND dataset, which can serve both as training and benchmarking dataset for deep learning algorithms. Here, we provide the possibility to evaluate approaches on our evaluation split, guaranteeing a fair protocol through a centralized evaluation server and withhold annotations. For more information see our project page, the respective publication or the accompanying Github repository.
If our work helps your research consider citing our publication:
@InProceedings{Freihand2019, author = {Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus and Thomas Brox}, title = {FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images}, booktitle = {IEEE International Conference on Computer Vision (ICCV)}, year = {2019} }
We currently support evaluation for two tasks: Estimation of keypoints and estimation of hand shape. For both tasks we score the predictions with and without alignment with respect to the ground truth. We estimate scale, rotation and translation to align the prediction to the ground truth ('procrustes analysis').
See our accompanying Github repository for the evaluation code used.
Please note: Compared to the camera ready version of the paper we improved the alignment algorithm and additionally estimate scale difference. Therefore, the repective scores slighly improved compared to the Table 3 of the camera ready paper version.
Provided scale information
At evaluation time information about the scale of the hand is provided. For this purpose the metric length of a reference bone is given, which algorithms can decide to use. Per definition the reference bone is the phalangal proximal bone of the middle finger (i.e. between keypoint 9 and 10) and its length is provided in meters.
The task is to predict the 3D location of 21 keypoints from a single RGB image as input, using the keypoint definition of [1]. For keypoint evaluation we follow [2]. We calculate the mean error between prediction and ground truth as well as the area under curve (AUC) of the percentage of correct keypoints (PCK) curve in an interval from 0cm to 5cm with 100 equally spaced thresholds.
Because inference of the scale is illposed for monocular approaches for each sample we additionally provide information about the camera intrinsics and the metric length of the reference bone for every sample.
The task is to predict a shape from a single RGB color image input, which models the hand including fingers and palm up to the wrist. The shape is represented by a set of points (point cloud) that resembles the outer hull of the hand. If an algorithm predicts a triangle mesh a natural choice is to use the vertex locations as points.
For the predictions we calculate the following scores:
There are two caveats with these scores if the predicted points differ in amount or semantical meaning from the ground truth:
1. Mesh error: Correspondences between points of the prediction and ground truth are needed for calculation. Because its not well defined how to find these correspondences in the general case, our evaluation script will only calculate this measure when the shape prediction provides exactly 778 vertices, assuming these are representing the vertex definition according to MANO [4] and otherwise -1 is reported.
2. F-score: While this score is well defined between shape representations with unequal number of points the procrustes approach to align prediction with ground truth fails. Therefore we resort to use the predicted keypoints to estimate the procrustes transformation and apply it to the shape prediction if the number of vertices differs from 778.
Due to a bug in the public submission board, we provide results of the baseline methods here for now. If you want to have your paper/method added here contact me.
Paper, Method, Results (Mesh error al/ F@5mm al/ F@15mm al)
[5], Mean shape *, 1.64 / 0.336 / 0.837 / full results
[5], Mano fit with Inverse Kinematics, 1.37 / 0.439 / 0.892 / full results
[5], Mano CNN, 1.09 / 0.516 / 0.934 / full results
[6], RGB only *‡, 1.32 / 0.427 / 0.894 / full results
[7], Hands only †‡, 1.33 / 0.429 / 0.907 / full results
*: Method predicts relative coordinates, i.e. there is no global alignment and therefore only the 'aligned' scores are meaningful.
‡: Code and trained model provided by the authors, evaluated by us.
†: Calculated global root from 2D keypoint estimations and root relative 3D estimations.
We also provide the PCK and PCV curves of the baseline methods in conjunction with a script for plotting them here. If you want to add your methods results to these plots simply get the pck_data.json file from the output of the scoring step.
1: Simon et al., 'Hand keypoint detection in single images using multiview bootstrapping', CVPR 2017
2: Zimmermann et al., 'Learning to Estimate 3D Hand Pose from Single RGB Images', ICCV 2017
3: Knapitsch et al., 'Tanks and temples: Benchmarking large-scale scene reconstruction', ACM TOG 2017
4: Romero et al., 'Embodied hands: Modeling and capturing hands and bodies together', ACM TOG 2017
5: Zimmermann et al., 'FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images', ICCV 2019
6: Boukhayma et al., '3D Hand Shape and Pose from Images in the Wild', CVPR 2019
7: Hasson et al., 'Learning joint reconstruction of hands and manipulated objects', CVPR 2019
This dataset is provided for research purposes only and without any warranty. Any commercial use is prohibited. If you use the dataset or parts of it in your research, you must cite the respective paper mentioned on the Overview page.
Start: Sept. 1, 2019, midnight
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | SSI_S.LSI_RD_CV | 0.19 |
2 | vector | 0.40 |
3 | SSI_S.LSI_RD_CV_mobile | 0.43 |