The 4th Look Into Person (LIP) Challenge - Track 4 Video Virtual Try-on Challenge

Organized by ZhenyuXie - Current server time: Jan. 21, 2021, 2:47 p.m. UTC

First phase

Feb. 20, 2020, midnight UTC


Competition Ends
Sept. 30, 2020, midnight UTC


This task targets at the video virtual try-on. Specifically, given a clothing image, a target person image, and a pose sequence, the participators are asked to design algorithms to transfer the desired clothing onto a person to produce a high-quality video sequence while preserving the identity information of the person, clothing texture details and the temporal coherence of the synthesized video.

Data Description

We constructes a new video dataset VVT for the video virtual try-on task, which contains 791 videos of fashion model catwalk. We split the videos into a training set and a testing set with 661 videos and 130 videos respectively. The total frame numbers of the training set and the testing set are 160492 and 31191 respectively. We also crawled 791 person images and 791 clothes images and made every video associated with a person image and a clothes image. Therefore, a sample in the dataset is composed of a video, a person image and a clothes image.

You can download the dataset at VVT(Google Drive) or VVT(Baidu Drive). The Baidu Drive extract password is ddoy.

Specifically, the dataset contains the following folders:

  • lip_train_frames contains 661 sub-folders which represent 661 training videos. Each sub-folder contains the frames extracted from the corresponding videos.
  • lip_train_frames_keypoints contains the pose sequences of the training set, which are estimated by Openpose.
  • lip_clothes_personcontains the person image and the clothing image associated with each video. Note tha the person image is selected from the video frames.
  • lip_test_frames_keypoints contains the pose sequence of the testing set, which are estimated by Openpose. Participators are required to synthesize the videos from these pose sequences.

In the Final phase, we provide the person-clothse-pose tuples. Each tuple is represented in the format of "video1 video2 video3", in which videox is the name of video. Given such tuple, participators must utilize the person image associated with video1, clothing image associated with video2, and pose sequence from video3 to synthesize the virtual try-on video. You can download the tuple file through the Google Drive and the Baidu Drive. The Baidu Drive extract password is a448.

For more details about the video virtual try-on algorithm and VVT, please refer to FW-GAN(ICCV2019).

Evaluation Metrics

For video virtual try on, we use two metrics for evaluation:

It is worth to note that, the evaluation metric for the Development phase is SSIM while the evaluation metric for the Final phase is AMT.
Specifically, during the Development phase, We measure the SSIM score between the synthesized frames and ground truth frames in the testing set. The due date for the Development phase is May 20, 2020, 00:00 AM UTC. The TOP 10 participators will be invited to attend the Final phase.
During the Final phase, we shuffle the testing set to ensure the person image is different from the person in the associated video and clothing image is different from the clothing in the associated person image. Then we provide the shuffled person-clothes-pose tuples and the participators are asked to synthesize virtual try-on videos according to these tuples. We will pick the TOP 10 participators of the Development phase to participate in the AMT evaluation. The 10 participators must upload the new results for AMT evaluation before June, 6, 2020, 00:00 AM UTC. We will announce the AMT scores before June 12, 2020.

Submission Format

A folder named (Click to download a template file) contains your synthesized virtual try-on video frames with .png format. The number of synthesized video and the number of synthesized frames in each video folder should be the same of our testing set. Make sure these and then package the folder with zip format. Submit your and wait to see your rank.

Terms and Conditions

General Rules

  • Each entry is required to be associated to a team and its affiliation.
  • Using multiple accounts to increase the number of submissions is strictly prohibited.
  • Results should follow the correct format and must be uploaded to the evaluation server through the CodaLab competition site. Detailed information about how results will be evaluated is represented on the evaluation page.
  • The best entry of each team will be public in the leaderboard at all time.
  • The organizer reserves the absolute right to disqualify entries which is incomplete or illegible, late entries or entries that violate the rules.


  • The datasets are released for academic research only and it is free to researchers from educational or research institutions for non-commercial purposes. When downloading the dataset you agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
  • Please cite the following paper if you use the VVT dataset.

Haoye Dong, Xiaodan Liang, XiaohuiShen, Bowen Wu, Bing-Cheng Chen, and Jian Yin. Flow-navigated warping gan for video virtual try-on. InICCV,pages 9206–9035, 2019.

Concate Us

For more information, please concate us at or


Start: Feb. 20, 2020, midnight

Competition Ends

Sept. 30, 2020, midnight

You must be logged in to participate in competitions.

Sign In