Welcome to the first Australian Centre for Robotic Vision (ACRV) robotic vision challenge!
In this challenge, participants are tasked with object detection on a video stream, where each detection must provide accurate estimates of spatial and semantic uncertainty using probabilistic bounding boxes (PBoxes). Participants are evaluated using a new probability-based detection quality (PDQ) measure which will reward accurate uncertainty estimates.
Our robotic vision challenges are designed to encourage research into the specific needs of robotic vision by providing new data, metrics, and tests which focus on these needs.
Unlike computer vision, robotic vision needs to:
This challenge focusses on the first two points. Future challenges will support active vision, continuous learning and interaction. Stay tuned!
If you want to quickly get started with this challenge just follow the procedure below.
For robotics applications, detections must not just provide where and what an object is, but provide how certain they are about the detection. Failing to do so can lead to catastrophic consequences from over or under-confident detections. To encourage researchers to provide accurate estimates for both spatial and semantic uncertainty, our object detection challenge provides a new detection format and evaluation measure suited for analysing detections which provide said uncertainty. We test on video data from multiple realistic simulated indoor environments, complete with cluttered surfaces, day and night lighting variations, and multiple camera heights.
Our challenge uses new high-fidelity simulated data. You can find further information and download the full set of data in the "Participate" tab under the heading "Get Data". In summary, our data contains:
Example of data used in this challenge showing both the lighting variations (day and night) and camera height variations (tall, medium and short) provided. Playlist of all the test image sequences in this challenge.
To enable detections with spatial uncertainty, we provide a new detection format which:
Example of generating spatial probability heatmap for PBox defined by two 2D Gaussian corners
To evaluate detections in a manner which incorporates both the semantic and spatial uncertainty which should be included in a robotic vision system, we developed the probability-based detection quality (PDQ) measure for this challenge. Full implementation details are supplied here and a summary is provided under the "Evaluation" heading.
The benefits of PDQ as a measure are that it:
This competition is evaluated using the probability-based detection quality (PDQ) measure.
We provide a summary here but direct you to the following paper for full details on implementation http://arxiv.org/abs/1811.10800. The evaluation code is publicly available here: https://github.com/jskinn/rvchallenge-evaluation
In brief, PDQ is calculated as follows:
Pairwise PDQ is measured between any given detection and ground-truth object and is defined as the geometric mean of the spatial quality and label quality of the detection with respect to the ground-truth object.
Spatial quality is a measure of how well a detection captures the spatial extent of a ground-truth object, taking into account the spatial probabilities for individual pixels as calculated by the detector. It is a merged measure of foreground loss (loss from missing the detected object's pixels) and background loss (loss from detecting background pixels as foreground). The less uncertainty that a detection provides for an incorrect prediction (e.g. background pixels detected as foreground) the higher the loss which is incurred.
Note that as we are considering box-style detections, we ignore pixels which are not a part of the ground-truth object but are within the bounding box which encapsulates it when calculating either type of spatial loss. This can be seen visually below.
Example of foreground (white) and background (red) regions evaluated when calculating spatial quality. Detection pixels (orange) within the ground-truth object's bounding box (blue) that are not part of the foreground are simply ignored.
Final spatial quality will always be between zero and one.
Label quality measures how effectively a detection identifies what the object is. Within a given detection's label probability distribution, label quality is defined as the probability the detection gave to the correct class of the object being identified.
Note that this is irrespective of whether the class has the highest rank within the label probability distribution.
For each image, every detection is optimally assigned to a single ground-truth object. This is done using the Hungarian algrotihm based upon the pPDQ scores between all detections and ground-truth objects. Any detection which is either unassigned or assigned to a ground-truth object with zero pPDQ (no association) is considered a false positive (FP). Any ground-truth object which is either unassigned or assigned to a detection with zero pPDQ is considered a false negative (FN). All other detection-object pairings with some level of association are called "true positives" (TPs) for simplicity in terminology.
Final PDQ score across a set of ground-truth objects and detections is the total pPDQ for each optimally assigned pairing in each image divided by the total number of TPs, FNs and FPs. This final scoring gives a result between zero and one. Note however that for ease of visualisation on the challenge leaderboard graph, PDQ scores are multiplied by 100 and so the final score provided will be between 0 and 100.
After submitting results in the format outlined in the "Submission" tab, evaluation shall be performed using the PDQ measure across all frames of every video sequence of our test data. This PDQ score is multiplied by 100 for ease of visualisation on the challenge leaderboard graph. For more information about the test data please go to the "Participate" tab under the heading "Get Data". Please note that as the PDQ performs pixel-wise evaluations on probabilistic output across many detections and ground-truths in every image, this process may take some time. Please be patient.
Alongside the main PDQ score (listed as "Score" in the leaderboard) we also provide overall pPDQ, spatial quality, and label quality scores averaged across all TP detections, and the number of TPs , FPs, and FNs. In the leaderboard these are named "Average Overall Quality", "Average Spatial Quality", "Average Label Quality", "True Positives", "False Positives", and "False Negatives" respectively.
Error messages and further evaluation information shall be supplied within the output file to help with debugging. For more information about interpreting error messages, please see the "Troubleshooting" tab.
If the sum of the final label probability distribution for a detection exceeds 1.0, the probability distribution shall be re-normalized as part of the evaluation process.
If the sum of the final label probability distribution for a detection is less than 0.5, the detection is not used in evaluation.
Note that during evaluation we ignore objects which are on-screen but have height or width of less than 10 pixels or a total area of less than 100 pixels.
The data provided for this challenge along with this website belong to the ARC Centre of Excellence in Robotic Vision and are licensed under a Creative Commons Attribution 4.0 License.
Copyright (c) 2018, John Skinner, David Hall, Niko Sünderhauf, and Feras Dayoub, ARC Centre of Excellence for Robotic Vision, Queensland University of Technology
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
To enter the competition, first download the test data (see the "Participate" tab under the heading "Get Data"). The downloaded image data consists of several folders, each containing a sequence of images that the robot sees.
You may also want to download the starter kit, located here: https://github.com/jskinn/rvchallenge-starter-kit
The submission format is a zipped folder containing a json file for each sequence ('000000.json', '000001.json', etc..). These .json files can either be in the root directory or in some sub-directory within the zip file submission.
Each json file must contain the following structure:
"classes": [<an ordered list of class names>],
"bbox": [x1, y1, x2, y2],
[[xx1, xy1],[xy1, yy1]],
[[xx2, xy2],[xy2, yy2]]
"label_probs": [<an ordered list of probabilities for each class>]
In the starter kit, 'submission_builder.py' contains helper code to generate json files in this format, see the comments there for more exmaples.
The "classes" element is a list of class name strings for each class which the detector can detect and provides probability outputs for. The list is ordered to match the order of the label probabilities ("label_probs") output provided for each detection. For example if your classes are ['cat', 'bat', 'ball'], and your detection provides a label probability distribution of [0.1, 0.2, 0.3], it is assumed the detection is providing a 0.1 probability of 'cat', 0.2 of 'bat', and 0.3 of 'ball'.
The classes evaluated within this challenge are a subset of 30 classes from the COCO detection challenge. The full list of classes is as follows:
[bottle, cup, knife, bowl, wine glass, fork, spoon, banana, apple, orange, cake, potted plant, mouse, keyboard, laptop, cell phone, book, clock, chair, dining table, couch, bed, toilet, television, microwave, toaster, refrigerator, oven, sink, person]
It is recommended that the classes list in the submission file matches this list of classes, however, there are provisions in place for handling different class lists if provided.
If a detector returns more classes than the 30 being evaluated, such classes will not be considered when evaluated on our data, so if the system were trained on the full set of 80 COCO classes and returns probability distributions as appropriate, probabilities not associated with our 30 classes shall be removed from the distribution.
If the class list is in a different order than the one provided, this shall also be handled appropriately.
We allow for some synonyms of classes in the submitted "classes" list to be converted to match the naming conventions in the main class list. Conversions are as follows:
The detections element is a list of list of detections (represented by dictionaries). Each entry in the outer list represents an image, and each entry in the inner list represents the detections present in that image. Note that there must be an entry in the detections list for every image in the sequence, even if the inner list is empty (no detections for that image).
Individual detections are described by dictionaries containing the keys "bbox", "covars", and "label_probs" representing bounding box corners, covariance matrices, and label probability distributions respectively.
The "bbox" value for a detection dictionary is a 4-element list describing the locations of the corners of the predicted bounding box.
The format of this is [x1, y1, x2, y2] where x1, y1, x2, and y2 are the far left, top, right, and bottom coordinates of the box respectively.
The "covars" value for a detection dictionary is a list of two, 2D lists outlining the covariance matrices for the top-left and bottom-right corners of the detected bounding box respectively.
Each covariance matrix is a list of lists formatted as [[xx, xy], [xy, yy]] where xx is the variance along the x axis, yy is the variance along the y axis and xy is the covariance between x and y. Covariance matrices are supplied for both the top-left and bottom-right corner of the bounding box in turn.
Any supplied covariance matrix must be positive and semi-definite.
Unlike the other keys in the detection dictionary, covars is optional for reasons outlined below.
If a detection does not contain the "covars" key, a standard bounding box (BBox) without any uncertainty is utilised within evaluation. Note that these are treated as having no spatial uncertainty and any pixel within the BBox will have 100% spatial probability of describing the detected object (corners inclusive).
Detections with "covars" provided are used to generate probabilistic bounding box (PBox) detections. The corners of PBox detections are 2D Gaussians depicting where BBox corners which fully encapsulate the given object may exist. This provides spatial uncertainty, particularly as it relates to the extremes of the detected object with spatial uncertainty increasing for pixels further from the centre of the detection. Full details about how PBoxes are implemented can be found in the following paper http://arxiv.org/abs/1811.10800. An example of a PBox generated from two Gaussian corners can be seen below.
Example of generating spatial probability heatmap for PBox defined by two 2D Gaussian corners
Please note that for reasons of speed, the calculation of probability heatmaps after submission is an approximation which contains some low level of error.
The "label_probs" value for a detection dictionary is a list of class probabilities for each class outlined in the main "class" element of the result. The "label_probs" list in each detection must match the "classes" list at the top level, such that the i-th entry in "label_probs" is the probability of the i-th class in the class list as outlined previously under Classes.
If the sum of the final label probability distribution for a detection (after removing unevaluated class probabilities) exceeds 1.0, the probability distribution shall be re-normalized as part of the evaluation process.
If the sum of the final label probability distribution for a detection (after removing unevaluated class probabilities) is less than 0.5, the detection is not used in evaluation.
The standard output within this challenge has several types of error messaging to help with debugging submissions. What follows are the different error and warning messages that can be raised and what information they provide.
If any of these messages get raised your code will be stopped and you shall not receive a score for your submission.
If this error has occurred, no detections were provided for a given sequence. This means you have not provided a json results file for that given set of sequences. In the afteramble you will be provided with the sequences which have no detections provided.
If this error has occurred, the submission has provided more than one json file for the given sequence. In the preamble you will be provided the name of the sequence with duplicate .json files and after the message you will be provided with the locations where the duplicate detections exist.
If this error has occurred, there is no detection entry for each image in the given sequence. The list of detections for a sequence must have an entry for every image, even if that entry is an empty list (no detections made for that image). The message will include which detections file is missing detections, how many entries were provided and how many entries should have been supplied.
If this error has occurred, a sequence's json results file was found not to contain the key 'classes'. Most likely, this means you have not supplied the class list that corresponds to your probability distributions for your results for that given sequence. Which sequence has caused the error will be supplied in the error message preamble.
If this error has occurred, there are none of the recognised classes in the .json file for the given sequence. Which sequence has caused the error will be supplied in the error message preamble.
If this error has occurred, a sequence's json results file was found not contain the key 'detections'. Most likely, this means you have not supplied the list of detections within your detections result for for that given sequence. Which sequence has caused the error will be supplied in the error message preamble.
If this error has occurred it means that a given detection does not contain the key 'label_probs'. Most likely this means that you have not supplied a given detection with a probability distribution. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If this error has occurred, the label probability distribution provided by a given detection does not match the size of the class list provided with the sequence's json results file. All label probability distributions need to be a full probability distribution over all classes which the detector may output. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If this error has occured it means that a given detection does not contain the key 'bbox'. Most likely this means you have not provided the bounding box corner locations for the given detection. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If this error has occurred, a given detection's bounding box has been provided in an invalid format. the list of bounding box values must be a list containing four elements [x1, y1, x2, y2] denoting the upper-left and lower-right corners of the bounding box. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If this error has occurred, a given detection's bounding box's x1 (left) coordinate was bigger than the box's x2 (right) coordinate. x1 must always be less than or equal to x2 as we use a coordinate frame where (0,0) is the upper-left corner of the image. Therefore the leftmost coordinate must be a smaller number than the rightmost one. If the two are a equal, you have a box with a width of one pixel. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If this error has occurred, a given detection's bounding box's y1 (upper) coordinate was bigger than the box's y2 (lower) coordinate. y1 must always be less than or equal to y2 as we use a coordinate frame where (0,0) is the upper-left corner of the image. Therefore the uppermost coordinate must be a smaller number than the lowermost one. If the two are equal, you have a box with a height of one pixel. In the preamble you will be provided with the sequence name, image index, and detection index where the error has occurred.
If this error has occurred, the covariances you have supplied for a given detection are invalid. If a detection has a 'covars' key, it must supply 2 covariance matrices. One for the upper-left corner of the PBox and one for the lower-right corner of the PBox. The format of the content within 'covars' for a given detection must have a shape (2, 2, 2). In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If this error has occurred, at least one of the covariances provided for a given detection are invalid. All covariance matrices provided for generating PBoxes must be symmetric. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If this error has occurred, the upper-left covariance provided for a given detection was invalid. All covariance matrices provided for generating PBoxes must be positive and semi-definite. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If this error has occurred, the lower-right covariance provded for a given detection was invalid. All covariance matrices provided for generating PBoxes must be positive and semi-definite. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.
If any of these messages get raised your code will still run and finish but there may be a condition you should be made aware of that might effect your performance.
If this warning occurs, more results have been provided than there are sequences to be evaluated. This shall not cause an error but any extra results provided will not be evaluated. The afteramble will outline which named sequences shall not be evaluated as they do not correspond with any ground-truth sequences.
Start: Dec. 1, 2018, midnight
Description: Object Detection: Create models that detect objects in images
You must be logged in to participate in competitions.Sign In