CodaLab - Competition

ICCV DeeperAction Challenge - MultiSports Track on Spatiotemporal Action Detection (Test Version)

Organized by yixuanli - Current server time: March 30, 2025, 5:57 p.m. UTC

First phase

Development

June 1, 2021, midnight UTC

End

Competition Ends

Sept. 12, 2021, 11:59 p.m. UTC

Overview
Evaluation
Terms and Conditions
Submissions

MultiSports Track on Spatio-Temporal Action Detection

Welcome to the ICCV DeeperAction Challenge - MultiSports Track on Spatio-Temporal Action Detection.

Description

The challenge is Track 2 at ICCV DeeperAction Challenge. This track is for spatio-temporal action localization within an untrimmed video. The challenge will be carried out on the MultiSports dataset. More information on the dataset can be found on MultiSports Page.

Goal

Given an untrimmed video, we aim at spatio-temporal action detection. Hence, participants should find the frames that contains actions, and where these actions occur.

Evaluation Criteria

Following the standard practice[1,2], we utilize frame-mAP and video-mAP to evaluate action localization performance. For video-mAP, we use the 3D IoU, which is defined as the temporal domain IoU of two tracks, multiplied by the average of the IoU between the overlapping frames. The threshold is 0.5 for frame-mAP, 0.2, 0.5, 0.05:0.45, 0.5:0.95, 0.1:0.9 for video-mAP. V@0.05:0.45 is the average of V@0.05 to V@0.45 with 0.05 gap. V@0.5:0.95 is the average of V@0.5 to V@0.95 with 0.05 gap. V@0.1:0.9 is the average of V@0.1 to V@0.9 with 0.1 gap. We rank according to the V@0.1:0.9.

Following AVA[3], we evaluate on 60 classes that have at least 25 instances in validation and test splits. The excluded classes are 'aerobic kick jump', 'aerobic off axis jump', 'aerobic butterfly jump', 'aerobic balance turn', 'basketball save', 'basketball jump ball'.

[1] Learning to track for spatio-temporal action localization. https://arxiv.org/abs/1506.01929

[2] Action Tubelet Detector for Spatio-Temporal Action Localization. https://arxiv.org/abs/1705.01861

[3] AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. https://arxiv.org/abs/1705.08421

Terms and Conditions

You agree to us storing your submission results for evaluation purposes.
You agree that if you place in the top-10 at the end of the challenge you will submit your code so that we can verify that you have not cheated.
You agree not to distribute the DeeperAction ICCV2021 MultiSports dataset without prior written permission.

Submissions

To submit, upload a .zip file containing the frame_detections.pkl and video_detections.pkl files. We calculate the frame mAP from frame_detections.pkl and the video mAP from video_detections.pkl. Please save the pkl files with protocol=2, e.g. pickle.dump(result, result_file, protocol=2).

frame_detections.pkl is a list. Every item is a numpy array with shape (8,), <video_index><frame_number><label_index><score><x1><y1><x2><y2>. video_index is the index of the video in the multisports_half_test['test_videos'][0] or multisports_test['test_videos'][0], which starts from 0, e.g. 0 is the index of 'aerobic_gymnastics/v_E32m4g8M1lo_c045' in multisports_half_test['test_videos'][0] and the index of 'aerobic_gymnastics/v_LLCYT3XTyMU_c014' in multisports_test['test_videos'][0]. frame_number starts from 1. score is the score of this box, which affects the frame mAP result. label_index starts from 0.

video_detections.pkl is a dictionary that associates from each index of label (start from 0), a list of tubes. A tube is a tuple (tube_v, tube_score, tube_boxes). tube_v is the video name, such as 'aerobic_gymnastics/v_2KroSzspz-c_c024'. tube_score is the score of this tube, which affects the video mAP result. tube_boxes is a numpy array with tube-length rows and 6 columns, <frame number> <x1> <y1> <x2> <y2> <box_score>. frame number starts from 1. box_score is the single frame's confidence and does not affect the video mAP result.

We give the evaluate code and an example submission file in this website https://github.com/MCG-NJU/MultiSports.

Development

Start: June 1, 2021, midnight

Description: The Development Leaderboard is based on a fixed random subset of 50% of the test dataset. To submit, upload a .zip file containing the frame_detections.pkl and video_detections.pkl files.

Testing

Start: Sept. 1, 2021, midnight

Description: The Test Leaderboard is based on the whole test dataset. To submit, upload a .zip file containing the frame_detections.pkl and video_detections.pkl files. The file with the best V@0.1:0.9 will be used to determine the winner.

Competition Ends

Sept. 12, 2021, 11:59 p.m.

You must be logged in to participate in competitions.