The 4th Look Into Person (LIP) Challenge - Track 2 Video Multi-Person Human Parsing Challenge

Organized by ZhenyuXie - Current server time: Jan. 21, 2021, 3:53 p.m. UTC


Feb. 20, 2020, midnight UTC


Competition Ends
Sept. 30, 2021, midnight UTC


This task aims to conduct the video instance human parsing.

Data Description

VIP(Video instance-level Parsing) dataset, the first video multi-person human parsing benchmark, consists of 404 videos covering various scenarios. For every 25 consecutive frames in each video, one frame is annotated densely with pixel-wise semantic part categories and instance-level identification. There are 21247 densely annotated images in total. We divide these 404 sequences into 304 train sequences, 50 validation sequences and 50 test sequences.

You can download the dataset at VIP(One Drive) or VIP(Baidu Drive).

  • VIP_Fine: All annotated images and fine annotations for train and val sets.
  • VIP_Sequence: 20-frame surrounding each VIP_Fine image (-10 | +10).
  • VIP_Videos: 404 video sequences of VIP dataset.

Class Definition

Class ID012345
Class Name Background Hat Hair Glove Sunglasses Upper-clothes
Class ID67891011
Class Name Dress Coat Socks Pants Tosor-skin Scarf
Class ID121314151617
Class Name Skirt Face Left-arm Right-arm Left-leg Right-leg
Class ID1819
Class Name Left-shoe Right-shoe

Dataset Examples

Evaluation Metrics

For video instance-level human parsing, we use three metrics for multi-human parsing evaluation. The final score is the average of the results of these three metrics.

  • Mean IoU(%) for semantic part segmentation, reported by the FCN paper.
  • Follow Mask-RCNN paper, we used the mean value of several mean Average Precision(mAP) with IOU thresholds from 0.5 to 0.95 for evaluation of human instance segmentation, referred as APr.
  • APrvol for instance-level human parsing, reported by Holistic, Instance-Level Human Parsing

Submission Format

The results should be pack into a single zip file. Example zip file is available in

Specifically, the zip file contains 50 sub-folders in it. Each sub-folder represents a video result of test set video. Each video folder contains 3 sub-folders in it:

A folder of png images, named as "global_parsing". The content of id.png is the global human parsing results (instance-agnostic) for the image with exactly the same size.

Named as "instance_parsing", this folder consist of two things:
1) An indexed-png image with the segmentation. Here, each number belongs to a unique part. 0 is always assumed to be the background label.
2) A text file. Each line is of the format < class_id score >. The first line of this file corresponds to 1 in the indexed png, the second line corresponds to 2 in the indexed png and so on.

Named as "instance_segmentation", this folder consists of two things:
1) The content of id.png is the instance segmentation index image with exactly the same size. Each human instance belongs a unique human index id. 0 is always assumed to be the background label.
2) A text file id.txt. Each line is of the format . The first line of this file corresponds to human instance index 1 in instance segmentation indexed image. The second line corresponds to 2 in indexed png and so on.

After uploading you results, please wait for about 2 hours and refrash your page to see the scores.

Terms and Conditions

General Rules

  • Each entry is required to be associated to a team and its affiliation.
  • Using multiple accounts to increase the number of submissions is strictly prohibited.
  • Results should follow the correct format and must be uploaded to the evaluation server through the CodaLab competition site. Detailed information about how results will be evaluated is represented on the evaluation page.
  • The best entry of each team will be public in the leaderboard at all time.
  • The organizer reserves the absolute right to disqualify entries which is incomplete or illegible, late entries or entries that violate the rules.


The datasets are released for academic research only and it is free to researchers from educational or research institutions for non-commercial purposes. When downloading the dataset you agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.

Concate Us

For more information, please concate us at or


Start: Feb. 20, 2020, midnight

Competition Ends

Sept. 30, 2021, midnight

You must be logged in to participate in competitions.

Sign In
# Username Score
1 Tencent_YouTu_Lab 71.10
2 soeaver 70.53
3 PAT_CV_HUMAN 67.02