EPIC-KITCHENS-100 Action Detection

Organized by antoninofurnari - Current server time: Sept. 23, 2020, 11:52 p.m. UTC


CVPR 2021 Challenge
Aug. 5, 2020, midnight UTC


Competition Ends
May 28, 2021, 11:59 p.m. UTC

EPIC-KITCHENS-100 Action Detection Challenge

Welcome to the EPIC-KITCHENS-100 Action Detection Challenge.


The challenge requires to detect and recognise all action instances within an untrimmed video. The challenge will be carried out on the EPIC-KITCHENS-100 dataset. More information on the dataset & downloads can be found at https://epic-kitchens.github.io/2020-100.


Given a video, we aim to predict the set of all actions instances {Ai}i=1M where Ai=(ts, te, v, n, a), ts and te are the starting and end times of the action, whereas v, n, a are the predicted verb, noun and action classes.

For further details about the challenge, please see Sec. 4.3 of [1].

Dataset details

EPIC-KITCHENS-100 is an unscripted egocentric action dataset collected from 45 kitchens from 4 cities across the world.

  • 100 hours of video
  • 20M frames
  • Full HD
  • 90k action segments
  • 20k unique narrations
  • 97 verb classes, 300 noun classes


[1] Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Vangelis Kazakos, Davide Moltisanti, Jonathan Munro, Will Price, Michael Wray. Rescaling Egocentric Vision. ArXiv, 2020. [arXiv]

Evaluation Criteria

Submissions are evaluated on the test set. We report mean Average Precision (mAP) for verbs, nouns and actions at different IoU thresholds (0.1, 0.2, 0.3, 0.4, 0.5) and average mAP across IoU thresholds on the overall test set:

We consider mAP as implemented in [1]. Methods are ranked by average action mAP.


Terms and Conditions

  • You agree to us storing your submission results for evaluation purposes.
  • You agree that if you place in the top-10 at the end of the challenge you will submit your code so that we can verify that you have not cheated.
  • You agree not to distribute the EPIC-KITCHENS-100 dataset without prior written permission.


To submit your results to the leaderboard you must construct a submission zip file containing a single file test.json containing the model’s results on the test set. This file should follow format detailed in the subsequent section.

JSON Submission Format

The JSON submission format is composed of a single JSON object containing entries for every detected action in all the videos of the test set. Specifically, the JSON file should contain:

  • a 'version' property, set to '0.2'
  • a 'challenge' property, set to 'action_detection'
  • a set of sls properties (see the Supervision Levels Scale (SLS) page for more details):
    • sls_pt: SLS Pretraining level.
    • sls_tl: SLS Training Labels level.
    • sls_td: SLS Training Data level.
  • a 'results' object containing entries for every video in the test set (e.g . 'P01_101').

Each video entry is a list of objects describing each detected action. Each of these objects should contain:

  • a 'verb' property, reporting the detected verb class (e.g., 1).
  • a 'noun' property, reporting the detected noun class (e.g., 34).
  • a 'action' property, reporting the detected action class in the format '<verb_class>,<noun_class>' (e.g., ‘1,34’). The predicted action may differ from the pair of predicted verb and noun classes.
  • a 'score' property, reporting the confidence score of the prediction (e.g., 0.78).
  • a 'segment' property, which is a list containing the starting and ending timestamps of the detected action in seconds (e.g., [6.13, 9.20]).
  "version": "0.2",
  "challenge": "action_detection",
  "sls_pt": -1,
  "sls_tl": -1,
  "sls_td": -1,
  "results": {
    "P26_122": [
                "verb": 0,
                "noun": 16,
                "action": "0,16",
                "score": 0.7398802638053894,
                "segment": [
                "verb": 0,
                "noun": 58,
                "action": "0,58",
                "score": 0.0001102862200564619,
                "segment": [
    "P36_102": [
                "verb": 9,
                "noun": 27,
                "action": "9,27",
                "score": 0.8049795031547546,
                "segment": [
                "verb": 17,
                "noun": 65,
                "action": "17,65",
                "score": 0.0006565209107522163,
                "segment": [

You can provide scores and timestamps in any float format that numpy is capable of reading (i.e. you do not need to stick to a given number of decimal places).

Submission archive

To upload your results to CodaLab you have to zip the test file into a flat zip archive (it can’t be inside a folder within the archive).

You can create a flat archive using the command providing the JSON file is in your current directory.

$ zip -j my-submission.zip test.json

CVPR 2021 Challenge

Start: Aug. 5, 2020, midnight

Description: CVPR 2021 Action Detection Challenge

Competition Ends

May 28, 2021, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In