> The labels should be considered "weak" cross-modal annotations

Dear participants,

this is a message regarding the accuracy of the annotations provided:

The provided labels should be considered "weak" cross-modal annotations. This is now more clearly stated in the dataset webpage (

Remember bounding boxes were annotated in the Color modality, not in Depth and Thermal directly (mainly because human blobs are much more hard to discern visually by human annotators in those, which makes it impractical for larger amounts of data). Unfortunately, being cross-modal annotations, their accuracy depend also on the accuracy of the alignment to Color, i.e. the spatial registration and temporal synchronization. Despite spatial registration being accurate among the 3 modalities, the temporal synchronization has its flaws (even between Color and Depth), which can affect annotations in the more dynamic scenes. We provided more detail on the sources of temporal misalignment in the dataset webpage (

For the final evaluation (test), we try to minimize this problem as much as possible by very exhaustive careful supervision directly in Depth and Thermal modalities. We expect this plus the relatively small IoU=0.5 of the AP@0.5 metric to account for the misalignment and, hence, ensure fair comparison of results.

Kind regards,
The organization team.

Posted by: aclapes @ Nov. 26, 2019, 12:02 p.m.
