The 2nd Remote Physiological Signal Sensing (RePSS) Challenge & Workshop Associated with ICCV 2021

Organized by sunhm15 - Current server time: April 18, 2025, 8:44 p.m. UTC

First phase

Final test phase
June 26, 2021, midnight UTC

End

Competition Ends
July 16, 2021, midnight UTC

Challenge on Remote Physiological Signal Sensing (RePSS)

Introduction about RePSS series: https://vipl.ict.ac.cn/view_forumchallen.php?id=2

Notice:

1. The Workshop schedule can be found at the workshop tab. We reserved 15 minutes for each accepted challenge paper for presentation + Q&A.

The presenters give live presentations and answer questions. The presenters should:

1) prepare one presentation slides of 10 – 12 minutes.

2) send one copy of the PowerPoint slides (and if you pre-recorded your presentation as a video, the recorded video) to xiaobai.li@oulu.fi before the 10th. Oct. 2021. The materials will be made available on the conference platform during the conference time, and please do not include any confidential information that might raise privacy issues, e.g., face sample figures should be masked.

2. The competition phase has ended, and the final ranking and results are shown below. Detailed results can also be found in this link

Challenge overview

Remote measurement of physiological signals from videos is an emerging topic. The topic draws great interest from both researchers and companies, and the number of published papers is growing every year. Despite the thriving research interests, the lack of publicly available benchmark databases and a fair validation platform are the major issues that hinder its further development. Kin researchers have to make repetitive efforts to self-collect small datasets to test proposed methods, making it difficult to fairly compare and evaluate each proposed method's actual strength and weakness, as self-collected data are of different recording conditions and qualities.

This year we organize the 2nd challenge of RePSS (the 1st RePSS please see this link) in conjunction with the ICCV 2021 workshop. The 2nd RePSS includes the workshop, invited talks, and the challenge part. There will be two parallel challenge tasks (Tracks):

1) Track1: inter-beat-interval (IBI) curve measurement from facial videos. Many previous studies only measured the average heart rate (HR) from facial videos, but in many applications, the average HR is not enough while more detailed information such as heart rate variability (HRV) features are needed, which requires accurate measurement of the time location of each heartbeat, i.e., the IBI curve. This task requires participants to reconstruct the IBI curve from facial videos, which can be then processed to achieve detailed cardiac activity analysis. Face videos and corresponding BVP/ECG curves will be provided for training. 

Figure 1 shows how a peak binary signal and an IBI curve are obtained from a facial video. We first locate the peaks from remote-PPG signals and calculate the time interval between every two peaks. The red arrow shows a one-time interval. Thus, we can plot the IBI curve with the x-axis as the beat index and the y-axis as the IBI values.

2) Track2: respiration measurement from facial videos. The teams can use any clue or approach, e.g., color, motion, frequency domain analysis, etc., to estimate the respiration frequency from facial videos. Face videos and corresponding breathing curves will be provided for training.

Important Dates (may adjust later)

  • Apr.  28    Challenge opening online
  • May. 15    Training data release
  • Jun.  10    Team register deadline
  • Jun.  25    Testing data release
  • Jul.   09    Final test submission deadline
  • Jul.   12    Challange results release
  • Jul.   29    Paper submission deadline
  • Aug. 10    Notification to authors

 

Workshop schedule

The workshop will be held in virtually in conjunction with ICCV 2021 on 16th Oct. Saturday from 8am to 12am. (Eastern time zone).

8:00 – 8: 10 opening (Prof. Hu Han)

8:10 – 9: 00 invited talk 1 + discussion (Dr. Daniel McDuff)

Seeing Inside Out: Camera-based Physiological Sensing with Applications in Telehealth and Wellbeing [video]
Abstract:The growing need for technology that supports remote healthcare has been acutely highlighted by the SARS-CoV-2 (COVID-19)  pandemic. A specific example of how the face of healthcare is transforming is in the number of medical appointments held via teleconference, which has increased by more than an order of magnitude because of stay-at-home orders and greater burdens on healthcare systems. Experts suggest that particular attention should be given to cardiovascular and pulmonary protection during treatment of many conditions; however, in most telehealth scenarios physicians lack access to objective measurements of a patient’s condition because of the inability to capture vital signs. I will present work focusing on methods that leverage ordinary ubiquitous sensors (e.g., webcams) to measure physiological signals (e.g., peripheral blood flow, heart rate, respiration, blood oxygenation, A.Fib., blood pressure) without contact with the body. We are developing state-of-the-art, on-device neural models and a synthetics data pipeline to help us learn more robust representations and achieve performance close to that of contact sensors.

9:00 – 9: 50 invited talk 2 + discussion (Prof. Steven Porges)

Polyvagal Theory: Implications for Sensor Development [video]
Abstract:Polyvagal Theory emphasizes the role of the autonomic nervous system as an intervening variable that dynamically mediates behavioral and physiological reactivity to the challenges of life.  According to the theory, the ventral vagus provides a unique neural pathway that is capable of instantaneously down-regulating threat reactions and supporting both sociality and the homeostatic processes of health, growth, and restoration. The talk will focus on a signal processing strategy informed by several disparate disciplines (e.g., evolutionary biology, comparative neuroanatomy, physiology, time series analyses, systems theory, and clinical medicine) that have been applied to extract ‘neural metrics.’ An example of this strategy will be provided by describing the quantification of respiratory sinus arrhythmia as a valid neural metric of cardioinhibitory ventral vagal tone.  Valid ‘neural’ metrics enable the testing of theory driven hypotheses related to vagal function with noninvasive and noncontact technologies.  The PhysioCam, a non-contact video imaging device, will be described, and data presented documenting the potential real time application of the device in dynamically measuring ventral vagal tone.  The talk highlights the potential value of ‘neural’ informed signal processing strategies that lead to smart sensors that would provide neural metrics.

9:50- 10: 00 coffee break

10:00- 10:20 challenge summary (Xiaobai Li)

10: 20 – 10:25 challenge award (Prof. Hu Han)

10: 25 – 10:40 challenge paper presentation 1 + discussion (Ke-Yue Zhang, track 1.3) 

An End-to-end Efficient Framework for Remote Physiological Signal Sensing [ppt][video]

10:40 – 10: 55 challenge paper presentation 2 + discussion (Yuhang Dong, track 1.1)

Time Lab's approach to the Chanllange on Computer Vision for Remote Physiological Measurement [ppt][video]

10:55 – 11: 10 challenge paper presentation 3 + discussion (Xuenan Liu, track 1.2+track2)

MANet: a Motion-Driven Attention Network for Detecting the Pulse from a Facial Video with Drastic Motions [ppt][video]

11: 10 – 11: 25 challenge paper presentation 4 + discussion (Jingda Du, track 2)

Weakly Supervised rPPG Estimation for Respiratory Rate Estimation [ppt][video]

11:25-11:30 closing (Prof. Hu Han)

 

 

Invited Speaker

Invited speaker 1:

Stephen W. Porges (sporges@indiana.edu), Indiana University, US

https://www.stephenporges.com/ 

Stephen W. Porges, Ph.D. is Distinguished University Scientist at Indiana University where he is the founding director of the Traumatic Stress Research Consortium. He is Professor of Psychiatry at the University of North Carolina, and Professor Emeritus at both the University of Illinois at Chicago and the University of Maryland.  He served as president of the Society for Psychophysiological Research and the Federation of Associations in Behavioral & Brain Sciences and is a former recipient of a National Institute of Mental Health Research Scientist Development Award. He has published more than 300 peer-reviewed papers across several disciplines including anesthesiology, biomedical engineering, critical care medicine, ergonomics, exercise physiology, gerontology, neurology, neuroscience, obstetrics, pediatrics, psychiatry, psychology, psychometrics, space medicine, and substance abuse. In 1994 he proposed the Polyvagal Theory, a theory that links the evolution of the mammalian autonomic nervous system to social behavior and emphasizes the importance of physiological state in the expression of behavioral problems and psychiatric disorders. The theory is leading to innovative treatments based on insights into the mechanisms mediating symptoms observed in several behavioral, psychiatric, and physical disorders. He is the author of The Polyvagal Theory: Neurophysiological foundations of Emotions, Attachment, Communication, and Self-regulation (Norton, 2011), The Pocket Guide to the Polyvagal Theory: The Transformative Power of Feeling Safe, (Norton, 2017) and co-editor of Clinical Applications of the Polyvagal Theory: The Emergence of Polyvagal-Informed Therapies (Norton, 2018).  He is the creator of a music-based intervention, the Safe and Sound Protocol, which currently is used by more than 1400 therapists to improve spontaneous social engagement, to reduce hearing sensitivities, to improve language processing and state regulation.

Invited speaker 2:

Daniel McDuff (djmcduff@media.mit.edu), Microsoft Research, Redmond 

https://www.microsoft.com/en-us/research/people/damcduff/

Daniel McDuff is a Principal Researcher at Microsoft where he leads research and development of affective technology.  Daniel completed his Ph.D. at the MIT Media Lab in 2014 and has a B.A. and Masters from Cambridge University. Daniel’s work on non-contact physiological measurement helped to popularize a new field of low-cost health monitoring using webcams. Previously, Daniel worked at the UK MoD, was Director of Research at MIT Media Lab spin-out Affectiva and a post-doctoral research affiliate at MIT.  His work has received nominations and awards from Popular Science magazine as one of the top inventions in 2011, South-by-South-West Interactive (SXSWi), The Webby Awards, ESOMAR and the Center for Integrated Medicine and Innovative Technology (CIMIT). His projects have been reported in many publications including The Times, the New York Times, The Wall Street Journal, BBC News, New Scientist, Scientific American and Forbes magazine. Daniel was named a 2015 WIRED Innovation Fellow, an ACM Future of Computing Academy member and has spoken at TEDx and SXSW.  Daniel has published over 100 peer-reviewed papers on machine learning (NeurIPS, ICLR, ICCV, ECCV, ACM TOG), human-computer interaction (CHI, CSCW, IUI) and biomedical engineering (TBME, EMBC). 

 

Evaluation

Results Submission

To simplify the evaluation process, the participants should submit one peak binary signal for each test video. There are 2 requirements for your submitted binary signals.

1. The binary signals should only have zeros and ones. Ones mean the peaks of the remote PPG signals and zeros mean no peak. Your submitted binary signal should be like the peak binary signal in Fig.1 (shown in the Data tab).

2. The length of the peak signals should be the same as the video frame length. For example, if the video has 300 frames, the peak signal should be a 300-dimensional binary vector.

Evaluation Metrics

We will use the submitted peak binary signals to compute both HR and IBI metrics for evaluation and ranking. The following evaluation metrics [1] will be calculated from your submission.

1. mean of IBI error: M_IBI

For two IBI curves R1(t) and R2(t), the IBI error and M_IBI can be defined as,

where T is the time length of the IBI curves, and K is the number of videos.

2. standard deviation of IBI error: SD_IBI

3. mean absolute error of heart rate: MAE_HR

4. root mean squared error of heart rate: RMSE_HR

5. Pearson correlation coefficient of heart rate: R_HR

For the final ranking in the leaderboard, we will only consider mean of IBI error, but other metrics will also be shown for reference. Participants who have questions about the evaluation metrics can refer to this code.

 

For the respiration estimation task, the following evaluation metrics will be calculated from your submission.

1. mean absolute error of respiration rate: MAE_RR

2. root mean squared error of respiration rate: RMSE_RR

3. Pearson correlation coefficient of respiration rate: R_RR

For the final ranking of track 2, we will only consider MAE_RR, but other metric will also be shown for reference.

 

[1] Xuenan Liu et al., “Detecting Pulse Wave From Unstable Facial Videos Recorded From Consumer-Level Cameras: A Disturbance-Adaptive Orthogonal Matching Pursuit,” IEEE Transactions on Biomedical Engineering 67, no. 12 (December 2020): 3352–62, https://doi.org/10.1109/TBME.2020.2984881.

 

Data

Data Access

Data for the challenge include two parts, one part is provided by the Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) and the other part is provided by the University of Oulu. Two license agreements (LA) must be signed before participants can get access to any data. The license agreements can be downloaded from Baidu Drive (with the extraction code ga8w) or Google DrivePlease follow the following instructions carefully for preparing and signing the license agreements:

1). The LA must be signed by a person with an email that affiliated with an institution or company (e.g., xx.xx@mit.eduxx.xx@microsoft.com ), which means that the person has a fixed position in the institution or company (e.g., a professor from a university, or an employee from a company). Personal emails (e.g., xxx@gmail.comxxx@qq.com ) are NOT valid. For students please ask your supervisor to sign the LA.

2). One signer can be associated with multiple registered competition IDs, i.e., one professor can sign the LA and multiple students from his/her group can register to the competition.

3). The signer must read through the LA carefully, and only sign the document when he/she fully understands and agree to all the items listed in the LA.

4). The signed LA will be scanned into PDF format, named as RePSS 2021 data agreement_Yourname.pdf, and sent to Dr. Hu Han (hanhu[at]ict.ac.cn) and Dr. Xiaobai Li (xiaobai.li[at]oulu.fi). We will send you the download link and password after receiving the signed LA.

5). The signer is fully responsible (for all users whose IDs are associated with him/her) to make sure that all associated ID users are fully aware of the LA contents, and the data is accessed and used in the proper way according to LA. Data users have no right to distribute the data in any form.

6). The data is shared only for research purpose of this competition usage but not for any other usage. All data must be deleted after the competition by 10.08.2021.

 

Training Data 

Track 1:

VIPL-HR-V2 is the second version of VIPL-HR database for remote heart rate (HR) estimation from face videos under less-constrained situations, which contains 2500 RGB videos of 500 subjects recorded with RealSense F200 camera with a resolution of 960 by 720. For each subject, we cut five clips of ten-second long videos from a thirty-second long video with a five-second stride. Each video provided a corresponding mean heart rate and BVP signal. It should be noted that the frame rate of the 10-second video is not fixed, so some interpolation processing is needed.

Track 2:

There are three folders in the OBF Respiration training data.

- videos

This folder contains 100 subjects' face videos with a resolution of 1920*1080 at 30 fps. For each subject, there are 10 video clips (60s per clip) from 2 sessions. For example, in the file '005_RGB_2_60s-120s.avi', '005' is the subject number, '2' means the second session, '60s-120s' is the start time and end time. This time is useful to find the landmarks in our provided .csv files.

- resp

This folder has the corresponding ground truth respiration waves. The sampling rate is 256 Hz.

- landmarks

Concerning the data privacy issue, all faces were masked with mosaics. To compensate for this we provide facial landmarks generated from OpenFace. You can go to https://github.com/TadasBaltrusaitis/OpenFace/wiki/Output-Format to check the landmark format. For video '005_RGB_2_60s-120s.avi', you can find the landmarks at 005_RGB_2.csv between timestamp 60s and 120s. Please note that the landmarks are generated from our original 60 fps videos. Since we only provide 30 fps videos for this challenge, you should downsample the landmarks along with the timestamp columns.

 

Test Data:

Track 1:

The test set of Track 1 contains two parts, OBF and VIPL-HR, with 1000 videos of 200 subjects in total. The videos of OBF and VIPL-HR are in 1080p and 720p resolution, respectively. Each video has a fixed length of 10 seconds. However, it should be noted that the frame rate of the 10-second video is not fixed, so some interpolation processing is needed. Each team should report the IBI signal for each video.

Track 2:

The test set of Track 2 was captured while running and cycling in the gym. This test is very challenging due to the complicated body movement and environment changes during exercise. There are 283 videos of 10 subjects, with a resolution of 720p and 30 fps. The length of each video clip is 1 minute. Each team should report a single respiratory rate for each 1-min video.

 

Submission Sample

The submission Samples could be found at this link.

 

https://github.com/sunhm15/REPSS-2021

Final test phase

Start: June 26, 2021, midnight

Competition Ends

July 16, 2021, midnight

You must be logged in to participate in competitions.

Sign In