ChaLearn LAP Large-scale Continuous Gesture Recognition Challenge Forum

Go back to competition Back to thread list Post in this thread

> About capture settings

Hello,
If possible, I'd like to know more on the process that transforms the 3D points cloud to depth videos. It doesn't seem to be revertible because the data is scaled down between 0 and 255 but the original range is not provided as it is the case in other databases. Also is black color (0, 0, 0) the label for unknown depth?
Is there any non-technical reason why you did not provide the skeleton tracking as in the 2013 dataset? Do you want to put the emphasis on dense feature extraction this year?
Regards,
Nicolas

Posted by: Nicolas @ July 8, 2016, 9:02 a.m.

Dear Nicolas,
Thanks very much for your questions. The answers are below:
1. 3D points cloud
In both IsoGD and ConGD datasets, we provide the depth images with no 3D points cloud. But from the CGD dataset, we can know that the depth images were produced after a normalization f(x)=(x-mini)/(maxi-mini), where mini is the minimum distance to the camera and maxi the maximum distance to the camera for an entire batch. To restore the data from the AVI files use:
step 1: Average the R,G, and B values to get a value v.
step 2: Perform v/255*(maxi-mini)+mini
The table of normalization constants is given at the end of this document. And the maxi and mini values (480 numbers from 480 batches) are provides for each batch of the CGD dataset in the end of the next text:
http://www.causality.inf.ethz.ch/Gesture/CGD2011/README.txt
Therefore, for the CGD dataset, it can be calcuate the 3D points clound.
In our both datasets, we cannot provide the maxi and mini values. That is because we don't want the pariticipates to find the corresponding ground truth labels of our dataset from the CGD dataset according to the maxi and mini values. However, the participates can also simply calcuate the 3D point clound via the above steps with the average maxi and mini values from the 480 batches according to the end of the text of the above mentioned link:
http://www.causality.inf.ethz.ch/Gesture/CGD2011/README.txt
2. skeleton tracking
We cannot provide the skeleton information. Tahti is becasue our both datasets are derived from the CGD dataset which does not provide the skeletion infromation.
3. We hope the participates can use any methods, not only dense feature extraction.
best wishes,
Jun Wan (one of the contest organizers)

Posted by: gesture_challenge @ July 9, 2016, 2:28 p.m.

Thank you very much for your answer.
Just as a suggestion for other realeases of this dataset: as far as I can tell, the three video color channel are equal for depth video. Isn't it possible to encode the 16bit depth data on two channels insteads? Video encoding algorithms may alter the resulting values though.

Posted by: Nicolas @ July 11, 2016, 7:39 a.m.

Dear Nicolas,
We can not provide the 16bit depth data. We maybe release the original mini and maxi value after both gesture challenges.

best wishes,
Jun Wan (one of the contest organizers)

Posted by: joewan10 @ July 11, 2016, 7:53 a.m.
Post in this thread