ACRV Robotic Vision Challenge 1

Organized by ACRV - Current server time: Jan. 16, 2019, 6 p.m. UTC

Current

Object Detection
Dec. 1, 2018, midnight UTC

End

Competition Ends
Never

The First Australian Centre for Robotic Vision Challenge

Welcome to the first Australian Centre for Robotic Vision (ACRV) robotic vision challenge!

In this challenge, participants are tasked with object detection on a video stream, where each detection must provide accurate estimates of spatial and semantic uncertainty using probabilistic bounding boxes (PBoxes). Participants are evaluated using a new probability-based detection quality (PDQ) measure which will reward accurate uncertainty estimates.

Our robotic vision challenges are designed to encourage research into the specific needs of robotic vision by providing new data, metrics, and tests which focus on these needs.

Unlike computer vision, robotic vision needs to:

  • Operate using streams of video information
  • Provide estimates of uncertainty when exploring and understanding the world
  • Be active within its environment, moving and interacting to better understand the world around it
  • Be capable of continuous learning, updating its understanding of its environment

This challenge focusses on the first two points. Future challenges will support active vision, continuous learning and interaction. Stay tuned!

Quick Start

If you want to quickly get started with this challenge just follow the procedure below.

  1. Download starter kit here
  2. Run script "download_test_data.sh" to download every folder of images for each image sequence in our test set (approx 24 GB of data)
    • Alternatively you can download individual sequences in the "Participate" tab under the heading "Get Data". Note that to do this you must be logged in to CodaLab.
  3. Run your detection algorithm (trained on the full COCO class list or our subset thereof) over all images in all sequences. Each detection requires:
    • A full class label probability distribution
    • Bounding box top-left and bottom-right corner locations
    • Covariances for top-left and bottom-right corners indicating corner location uncertainty
  4. Use tools in "submission_builder.py" in the starter kit to help format the final submission as a single .json file for each image sequence, describing all detections therein.
  5. On this website in the "Participate" tab under the heading "Submit/View Results", submit zip folder of .json file results. Note that you must be be logged in to CodaLab to submit a result.
  6. After some evaluation time (please be patient) find your detection results under our new PDQ evaluation measure and post them on the leaderboard.

Introduction

For robotics applications, detections must not just provide where and what an object is, but provide how certain they are about the detection. Failing to do so can lead to catastrophic consequences from over or under-confident detections. To encourage researchers to provide accurate estimates for both spatial and semantic uncertainty, our object detection challenge provides a new detection format and evaluation measure suited for analysing detections which provide said uncertainty. We test on video data from multiple realistic simulated indoor environments, complete with cluttered surfaces, day and night lighting variations, and multiple camera heights.

Data

Our challenge uses new high-fidelity simulated data. You can find further information and download the full set of data in the "Participate" tab under the heading "Get Data". In summary, our data contains:

  • Over 56,000 images from 18 simulated indoor video sequences, approximately 24GB
  • 30 object classes
  • Day and night lighting variations
  • Cluttered surfaces
  • Varied camera heights (tall, medium and short)

Example Sim Data Image

Example of data used in this challenge showing both the lighting variations (day and night) and camera height variations (tall, medium and short) provided.
Playlist of all the test image sequences in this challenge.

Probabilistic Bounding Box (PBox) Detections

To enable detections with spatial uncertainty, we provide a new detection format which:

  • Describes bounding box corners as Gaussian distributions
  • Can be used to calculate pixel-wise spatial probabilites
  • Increases spatial uncertainty closer to box edges

Generation of PBox from two Gaussian corners.

Example of generating spatial probability heatmap for PBox defined by two 2D Gaussian corners

Probability-based Detection Quality (PDQ) Evaluation Measure

To evaluate detections in a manner which incorporates both the semantic and spatial uncertainty which should be included in a robotic vision system, we developed the probability-based detection quality (PDQ) measure for this challenge. Full implementation details are supplied here and a summary is provided under the "Evaluation" heading.

The benefits of PDQ as a measure are that it:

  • Evaluates based on joint spatial and semantic uncertainties
  • Rewards accurate spatial uncertainty estimation
  • Explicitly measures semantic probability
  • Optimally assigns detections and ground-truth objects by combined spatial and semantic quality
  • Contains no variable thresholds which define success

Organizers

  • Niko Sünderhauf
  • Feras Dayoub
  • John Skinner
  • David Hall

Sponsors

ACRV LogoGoogle Logo

Evaluation Measure

This competition is evaluated using the probability-based detection quality (PDQ) measure.

We provide a summary here but direct you to the following paper for full details on implementation http://arxiv.org/abs/1811.10800. The evaluation code is publicly available here: https://github.com/jskinn/rvchallenge-evaluation

In brief, PDQ is calculated as follows:

  1. Calculate pairwise PDQ (pPDQ) for all detection-object pairs in each image as the geometric mean of a detection's:
    • Spatial quality
    • Label quality
  2. Find optimal assignments for detections and objects in each image from pPDQs in image.
  3. Compute overall PDQ based on all optimal pPDQs across all images

Pairwise PDQ (pPDQ)

Pairwise PDQ is measured between any given detection and ground-truth object and is defined as the geometric mean of the spatial quality and label quality of the detection with respect to the ground-truth object.

Spatial Quality

Spatial quality is a measure of how well a detection captures the spatial extent of a ground-truth object, taking into account the spatial probabilities for individual pixels as calculated by the detector. It is a merged measure of foreground loss (loss from missing the detected object's pixels) and background loss (loss from detecting background pixels as foreground). The less uncertainty that a detection provides for an incorrect prediction (e.g. background pixels detected as foreground) the higher the loss which is incurred.

Note that as we are considering box-style detections, we ignore pixels which are not a part of the ground-truth object but are within the bounding box which encapsulates it when calculating either type of spatial loss. This can be seen visually below.

PDQ Foreground and Background

Example of foreground (white) and background (red) regions evaluated when calculating spatial quality. Detection pixels (orange) within the ground-truth object's bounding box (blue) that are not part of the foreground are simply ignored.

Final spatial quality will always be between zero and one.

Label Quality

Label quality measures how effectively a detection identifies what the object is. Within a given detection's label probability distribution, label quality is defined as the probability the detection gave to the correct class of the object being identified.

Note that this is irrespective of whether the class has the highest rank within the label probability distribution.

Optimal Assignment

For each image, every detection is optimally assigned to a single ground-truth object. This is done using the Hungarian algrotihm based upon the pPDQ scores between all detections and ground-truth objects. Any detection which is either unassigned or assigned to a ground-truth object with zero pPDQ (no association) is considered a false positive (FP). Any ground-truth object which is either unassigned or assigned to a detection with zero pPDQ is considered a false negative (FN). All other detection-object pairings with some level of association are called "true positives" (TPs) for simplicity in terminology.

PDQ Score

Final PDQ score across a set of ground-truth objects and detections is the total pPDQ for each optimally assigned pairing in each image divided by the total number of TPs, FNs and FPs. This final scoring gives a result between zero and one. Note however that for ease of visualisation on the challenge leaderboard graph, PDQ scores are multiplied by 100 and so the final score provided will be between 0 and 100.

Evaluation Process

After submitting results in the format outlined in the "Submission" tab, evaluation shall be performed using the PDQ measure across all frames of every video sequence of our test data. This PDQ score is multiplied by 100 for ease of visualisation on the challenge leaderboard graph. For more information about the test data please go to the "Participate" tab under the heading "Get Data".  Please note that as the PDQ performs pixel-wise evaluations on probabilistic output across many detections and ground-truths in every image, this process may take some time. Please be patient.

Alongside the main PDQ score (listed as "Score" in the leaderboard) we also provide overall pPDQ, spatial quality, and label quality scores averaged across all TP detections, and the number of TPs , FPs, and FNs. In the leaderboard these are named "Average Overall Quality", "Average Spatial Quality", "Average Label Quality", "True Positives", "False Positives", and "False Negatives" respectively. 

Error messages and further evaluation information shall be supplied within the output file to help with debugging. For more information about interpreting error messages, please see the "Troubleshooting" tab.

If the sum of the final label probability distribution for a detection exceeds 1.0, the probability distribution shall be re-normalized as part of the evaluation process.

If the sum of the final label probability distribution for a detection is less than 0.5, the detection is not used in evaluation.

Note that during evaluation we ignore objects which are on-screen but have height or width of less than 10 pixels or a total area of less than 100 pixels.

Terms and Conditions

Data and Website

The data provided for this challenge along with this website belong to the ARC Centre of Excellence in Robotic Vision and are licensed under a Creative Commons Attribution 4.0 License.

Software

Copyright (c) 2018, John Skinner, David Hall, Niko Sünderhauf, and Feras Dayoub, ARC Centre of Excellence for Robotic Vision, Queensland University of Technology
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Submission

To enter the competition, first download the test data (see the "Participate" tab under the heading "Get Data"). The downloaded image data consists of several folders, each containing a sequence of images that the robot sees.

You may also want to download the starter kit, located here: https://github.com/jskinn/rvchallenge-starter-kit

The submission format is a zipped folder containing a json file for each sequence ('000000.json', '000001.json', etc..). These .json files can either be in the root directory or in some sub-directory within the zip file submission.
Each json file must contain the following structure:

{
"classes": [<an ordered list of class names>],
"detections": [
[
{
"bbox": [x1, y1, x2, y2],
"covars": [
[[xx1, xy1],[xy1, yy1]],
[[xx2, xy2],[xy2, yy2]]
],
"label_probs": [<an ordered list of probabilities for each class>]
},
{
}
],
[],
[]
...
]
}

In the starter kit, 'submission_builder.py' contains helper code to generate json files in this format, see the comments there for more exmaples.

Classes

The "classes" element is a list of class name strings for each class which the detector can detect and provides probability outputs for. The list is ordered to match the order of the label probabilities ("label_probs") output provided for each detection. For example if your classes are ['cat', 'bat', 'ball'], and your detection provides a label probability distribution of [0.1, 0.2, 0.3], it is assumed the detection is providing a 0.1 probability of 'cat', 0.2 of 'bat', and 0.3 of 'ball'.

The classes evaluated within this challenge are a subset of 30 classes from the COCO detection challenge. The full list of classes is as follows:

[bottle, cup, knife, bowl, wine glass, fork, spoon, banana, apple, orange, cake, potted plant, mouse, keyboard, laptop, cell phone, book, clock, chair, dining table, couch, bed, toilet, television, microwave, toaster, refrigerator, oven, sink, person]

It is recommended that the classes list in the submission file matches this list of classes, however, there are provisions in place for handling different class lists if provided.

If a detector returns more classes than the 30 being evaluated, such classes will not be considered when evaluated on our data, so if the system were trained on the full set of 80 COCO classes and returns probability distributions as appropriate, probabilities not associated with our 30 classes shall be removed from the distribution.

If the class list is in a different order than the one provided, this shall also be handled appropriately.

We allow for some synonyms of classes in the submitted "classes" list to be converted to match the naming conventions in the main class list. Conversions are as follows:

  • "tv" -> "television"
  • "tvmonitor" -> "television"
  • "computer monitor" -> "television"
  • "stool" -> "chair"
  • "diningtable" -> "dining table"
  • "pottedplant" -> "potted plant"
  • "cellphone" -> "cell phone"
  • "wineglass" -> "wine glass"

Detections

The detections element is a list of list of detections (represented by dictionaries). Each entry in the outer list represents an image, and each entry in the inner list represents the detections present in that image. Note that there must be an entry in the detections list for every image in the sequence, even if the inner list is empty (no detections for that image).

Individual detections are described by dictionaries containing the keys "bbox", "covars", and "label_probs" representing bounding box corners, covariance matrices, and label probability distributions respectively.

Bounding Box Corners

The "bbox" value for a detection dictionary is a 4-element list describing the locations of the corners of the predicted bounding box.

The format of this is [x1, y1, x2, y2] where x1, y1, x2, and y2 are the far left, top, right, and bottom coordinates of the box respectively.

Covariance Matrices

The "covars" value for a detection dictionary is a list of two, 2D lists outlining the covariance matrices for the top-left and bottom-right corners of the detected bounding box respectively. 

Each covariance matrix is a list of lists formatted as [[xx, xy], [xy, yy]] where xx is the variance along the x axis, yy is the variance along the y axis and xy is the covariance between x and y. Covariance matrices are supplied for both the top-left and bottom-right corner of the bounding box in turn.

Any supplied covariance matrix must be positive and semi-definite.

Unlike the other keys in the detection dictionary, covars is optional for reasons outlined below.

If a detection does not contain the "covars" key, a standard bounding box (BBox) without any uncertainty is utilised within evaluation. Note that these are treated as having no spatial uncertainty and any pixel within the BBox will have 100% spatial probability of describing the detected object (corners inclusive).

Detections with "covars" provided are used to generate probabilistic bounding box (PBox) detections. The corners of PBox detections are 2D Gaussians depicting where BBox corners which fully encapsulate the given object may exist. This provides spatial uncertainty, particularly as it relates to the extremes of the detected object with spatial uncertainty increasing for pixels further from the centre of the detection. Full details about how PBoxes are implemented can be found in the following paper http://arxiv.org/abs/1811.10800. An example of a PBox generated from two Gaussian corners can be seen below.

Generation of PBox from two Gaussian corners.

Example of generating spatial probability heatmap for PBox defined by two 2D Gaussian corners

Please note that for reasons of speed, the calculation of probability heatmaps after submission is an approximation which contains some low level of error.

Label Probability Distributions

The "label_probs" value for a detection dictionary is a list of class probabilities for each class outlined in the main "class" element of the result. The "label_probs" list in each detection must match the "classes" list at the top level, such that the i-th entry in "label_probs" is the probability of the i-th class in the class list as outlined previously under Classes.

If the sum of the final label probability distribution for a detection (after removing unevaluated class probabilities) exceeds 1.0, the probability distribution shall be re-normalized as part of the evaluation process.

If the sum of the final label probability distribution for a detection (after removing unevaluated class probabilities) is less than 0.5, the detection is not used in evaluation.

 

Troubleshooting Messages

The standard output within this challenge has several types of error messaging to help with debugging submissions. What follows are the different error and warning messages that can be raised and what information they provide.

Error Messages:

If any of these messages get raised your code will be stopped and you shall not receive a score for your submission.

The following sequences do not have any detections submitted

If this error has occurred, no detections were provided for a given sequence. This means you have not provided a json results file for that given set of sequences. In the afteramble you will be provided with the sequences which have no detections provided.

more than one json file found for sequence

If this error has occurred, the submission has provided more than one json file for the given sequence. In the preamble you will be provided the name of the sequence with duplicate .json files and after the message you will be provided with the locations where the duplicate detections exist.

The number of detections does not match the number of images ...

If this error has occurred, there is no detection entry for each image in the given sequence. The list of detections for a sequence must have an entry for every image, even if that entry is an empty list (no detections made for that image). The message will include which detections file is missing detections, how many entries were provided and how many entries should have been supplied.

Missing key 'classes'

If this error has occurred, a sequence's json results file was found not to contain the key 'classes'. Most likely, this means you have not supplied the class list that corresponds to your probability distributions for your results for that given sequence. Which sequence has caused the error will be supplied in the error message preamble.

classes does not contain any recognized classes

If this error has occurred, there are none of the recognised classes in the .json file for the given sequence. Which sequence has caused the error will be supplied in the error message preamble.

Missing key 'detections'

If this error has occurred, a sequence's json results file was found not contain the key 'detections'. Most likely, this means you have not supplied the list of detections within your detections result for for that given sequence. Which sequence has caused the error will be supplied in the error message preamble.

missing key 'label_probs'

If this error has occurred it means that a given detection does not contain the key 'label_probs'. Most likely this means that you have not supplied a given detection with a probability distribution. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

The number of class probabilities doesn't match the number of classes

If this error has occurred, the label probability distribution provided by a given detection does not match the size of the class list provided with the sequence's json results file. All label probability distributions need to be a full probability distribution over all classes which the detector may output. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

missing key 'bbox'

If this error has occured it means that a given detection does not contain the key 'bbox'. Most likely this means you have not provided the bounding box corner locations for the given detection. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

The bounding box must contain exactly 4 entries

If this error has occurred, a given detection's bounding box has been provided in an invalid format. the list of bounding box values must be a list containing four elements [x1, y1, x2, y2] denoting the upper-left and lower-right corners of the bounding box. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

 The x1 coordinate must be less than the x2 coordinate

If this error has occurred, a given detection's bounding box's x1 (left) coordinate was bigger than the box's x2 (right) coordinate. x1 must always be less than or equal to x2 as we use a coordinate frame where (0,0) is the upper-left corner of the image. Therefore the leftmost coordinate must be a smaller number than the rightmost one. If the two are a equal, you have a box with a width of one pixel. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

The y1 coordinate must be less than the y2 coordinate

If this error has occurred, a given detection's bounding box's y1 (upper) coordinate was bigger than the box's y2 (lower) coordinate. y1 must always be less than or equal to y2 as we use a coordinate frame where (0,0) is the upper-left corner of the image. Therefore the uppermost coordinate must be a smaller number than the lowermost one. If the two are equal, you have a box with a height of one pixel. In the preamble you will be provided with the sequence name, image index, and detection index where the error has occurred.

Key 'covars' must contain 2 2x2 matrices

If this error has occurred, the covariances you have supplied for a given detection are invalid. If a detection has a 'covars' key, it must supply 2 covariance matrices. One for the upper-left corner of the PBox and one for the lower-right corner of the PBox. The format of the content within 'covars' for a given detection must have a shape (2, 2, 2). In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

Given covariances are not symmetric

If this error has occurred, at least one of the covariances provided for a given detection are invalid. All covariance matrices provided for generating PBoxes must be symmetric. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

The upper-left covariance is not positive semi-definite

If this error has occurred, the upper-left covariance provided for a given detection was invalid. All covariance matrices provided for generating PBoxes must be positive and semi-definite. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

The lower-right covariance is not positive semi-definite

If this error has occurred, the lower-right covariance provded for a given detection was invalid. All covariance matrices provided for generating PBoxes must be positive and semi-definite. In the preamble you will be provided with the sequence name, image index, and detection index where this error has occurred.

Warning Messages:

If any of these messages get raised your code will still run and finish but there may be a condition you should be made aware of that might effect your performance.

The submission has data for the following extra sequences which is being ignored

If this warning occurs, more results have been provided than there are sequences to be evaluated. This shall not cause an error but any extra results provided will not be evaluated. The afteramble will outline which named sequences shall not be evaluated as they do not correspond with any ground-truth sequences.

Contact Us

If you have any major issues or queries beyond what is covered in this website, or what to find out about what we have in store feel free to contact us.

Website: roboticvisionchallenge.org

Twitter: @robVisChallenge

e-mail: contact@roboticvisionchallenge.org

 

 

Object Detection

Start: Dec. 1, 2018, midnight

Description: Object Detection: Create models that detect objects in images

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 jskinn 5.983
2 bt 0.002
3 nikosuenderhauf 0.000