Giter VIP home page Giter VIP logo

xliucs / mtts-can Goto Github PK

View Code? Open in Web Editor NEW
163.0 8.0 47.0 18 MB

Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

Home Page: https://proceedings.neurips.cc/paper_files/paper/2020/file/e1228be46de6a0234ac22ded31417bc7-Paper.pdf

License: MIT License

Python 100.00%
cardiovascular computer-vision deep-learning healthcare mobilehealth neurips neurips-2020 physiological-signals physiologicalsensing rppg

mtts-can's Introduction

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement

License: MIT made-with-python

Paper

Xin Liu, Josh Fromm, Shwetak Patel, Daniel McDuff, “Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement”, NeurIPS 2020, Oral Presentation (105 out of 9454 submissions)

New Results (Trained only on PURE, and Tested on UBFC)

(BPM) MAE MAPE RMSE Pearson Coef.
TS-CAN 1.47 1.56% 2.31 0.99

Pleae use rPPG-Toolbox for training.

New Pre-Trained Model (Updated March 2023)

Please refer to rPPG-Toolbox

Abstract

Telehealth and remote health monitoring have become increasingly important during the SARS-CoV-2 pandemic and it is widely expected that this will have a lasting impact on healthcare practices. These tools can help reduce the risk of exposing patients and medical staff to infection, make healthcare services more accessible, and allow providers to see more patients. However, objective measurement of vital signs is challenging without direct contact with a patient. We present a video-based and on-device optical cardiopulmonary vital sign measurement approach. It leverages a novel multi-task temporal shift convolutional attention network (MTTS-CAN) and enables real-time cardiovascular and respiratory measurements on mobile platforms. We evaluate our system on an ARM CPU and achieve state-of-the-art accuracy while running at over 150 frames per second which enables real-time applications. Systematic experimentation on large benchmark datasets reveals that our approach leads to substantial (20%-50%) reductions in error and generalizes well across datasets.

Waveform Samples

Pulse

pulse_waveform

Respiration

resp_waveform

Citation

@article{liu2020multi,
  title={Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement},
  author={Liu, Xin and Fromm, Josh and Patel, Shwetak and McDuff, Daniel},
  journal={arXiv preprint arXiv:2006.03790},
  year={2020}
}

Demo

Try out our live demo via link here.

Our demo code: https://github.com/ubicomplab/rppg-web

TVM

If you want to use TVM, pleaea follow this tutorial to set it up. Then, you will need to replace the code in incubator-tvm/python/tvm/relay/frontend/keras.py with our code/tvm-ops-mtts-can.py. We implemented required tensor operations for attention, tensor shift module used in our models.

Training

python code/train.py --exp_name test --exp_name [e.g., test] --data_dir [DATASET_PATH] --temporal [e.g., MMTS_CAN]

Inference

python code/predict_vitals.py --video_path [VIDEO_PATH]

The default video sampling rate is 30Hz.

Note

During the inference, the program will generate a sample pre-processed frame. Please ensure it is in portrait orientation. If not, you can comment out line 30 (rotation) in the inference_preprocess.py.

Requirements

Tensorflow 2.2-2.4

conda create -n tf-gpu tensorflow-gpu cudatoolkit=10.1 -- this command takes care of both CUDA and TF environments.

pip install opencv-python scipy numpy matplotlib

Ifpip install opencv-python does not work, I found these commands always work on my mac.

conda install -c menpo opencv -y
pip install opencv-python

Contact

Please post your technical questions regarding this repo via Github Issues.

mtts-can's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtts-can's Issues

labeling the training data

I am currently working on a research project where I aim to retrain the model using my own dataset. However, I have encountered some difficulties in the process of labeling the data, and I was wondering if you could kindly provide some assistance.

Specifically, I am seeking help with the following:
Sample labeled data: If you have any sample labeled data that you could share as a reference, it would be immensely helpful for me to understand the labeling process better. This would enable me to align my labeling efforts with the intended structure and format. If you have any recommendations for labeling tools or software that could streamline the labeling process, I would highly value your insights. I am open to any suggestions that could improve efficiency and accuracy while labeling a large dataset.

Pure dataset

I am facing an issue regarding the database, the timestamps in the json file with the corresponding readings are not the same as the images in the folders. As the timestamps are also in the image names but I cannot find any images with the same timestamp as in the json file provided. Furthermore, can you please provide a dataloader for PURE dataset that is not preprocessed

Error with the new weight value

Hi! the new weight value shows a error:
"ValueError: You are trying to load a weight file containing 12 layers into a model with 14 layers."
You should update the model.py script

dataset problem

Hello,

I am interested in your work but I cannot obtain the AFRL dataset. I tried to contact the authors of the [33] in your paper but no one replies to me. Do you know how to access such a dataset?

Besides, I have successfully received the VIPL-HR v1 dataset, and I am wondering whether you can provide the benchmark on that dataset.

Best Wishes,
Yue

Data input

Hi! I would like to know how you give the training signal, do you do a sampling?

Video processing issues

The problem is whenever I try to run the script by providing a video frame, it shows a sample processed image, then it waits for a while and the code breaks again. I’ve tried it with multiple files where the face of the user in the center of frame with illuminated background as well but the code breaks even if the file size is small.

Here are a few screenshots of the issue:

User 1 issues:

user-1-issues

user-1-sample-processed-frame

User 2 issues:

user-2-issues

user-2-sample-processed-frame

Test respiration rate

Hi, i would like to know how do you test Respiration Rate through the UBFC dataset. In that dataset i do not find any information about respiration. Can you tell me how can I do it?

In file: predict_vitals.py, what the meaning of the param: frame_depth ?

From the predict_vitals.py, the loaded data is a video file, then the total number of frames is treated as batch*frame_depth, which means the video is splited to many video clips with frame length equal to frame_depth. However, the frame_depth is set to 10, whether the length of frame is too short? And in traning phrase, the video clip length is also 10?

Enquiry about training time and project code

I'm planning on training the TS-CAN and MTTS-CAN models on the UBFC and PURE datasets and would like to know if it's possible to train these models on a single Nvidia RTX 2060 GPU having 6GB VRAM as the code says "Only supporting 4 GPUs or 8 GPUs now. Please adjust learning rate in the training script"

raise Exception('Only supporting 4 GPUs or 8 GPUs now. Please adjust learning rate in the training script!')

I also see that you are loading in the video frames from MATLAB along with their labels and the normalized frames. May I know how the frames and the labels were stored?

Also, why is the transpose taken when storing data in dXsub as shown below?

dXsub = np.transpose(np.array(f1["dXsub"])) #dRsub for respiration

TS-CAN and Hybrid-Can dataloaders

From the paper I see that TS-CAN and Hybrid-CAN have the same input (appearance branch is averaged 1x36x36x3 and the motion branch has 10x36x36 Normalized frames) but there are two different loaders for both Hybrid CAN and TS-CAN in the dataloaders.py file. I fail to understand why there is a difference between them? Any clue would help.

About the order of the Butterworth filter

Hi @xliucs,

In your paper, I think you mentioned that you used the 2nd-order Butterworth filter. But in the "predict_vitals.py", you used the 1st-order Butterworth filter. May I know why or if this is a subtle mistake?

Best,
Zechen Zhang

how to use the code

I am sorry that my issue may be stupid, but I indeed dont know how to transmit the videos into the input of the codes, and I dont know what is "args" in those codes...

Face or RoI Detection

Is there a face detection part in the project, either with mtcnn or opencv? Actually, our main concern is to find cheeks. Any kind of help/comment would be useful.

Could you provide more details how you pre-process the dataset?

Dear Sir/Madam,

Thanks for your code. I tried to reproduce your code on VIPL_HR1 dataset but I am not sure how the dataset is pre-processed. Could you provide more details such as how the video is preprocessed, what is the format of the path of the input data, and how the ground truth labels are preprocessed and saved? I checked your pre_process.py file and it seems that the dataset is saved into a h5py file with .mat. Could you at least provide an example of the h5py file so that I can know how to preprocess my dataset into the correct format?
I also studied your predict_vitals.py file, but It seems that it only makes a prediction of a single video. Could you also provide a code about the evaluation metrics and how the ground truth labels are pre-processed?

Best Wishes,
Yue

dataset

can you tell me from where I could get AFRL dataset?

calculating the snr and denoising the signal.

how can I calculate the SNR of the obtained pulse signal from the MTTS_CAN? and would you please tell me if there is a way of denoising the ppg signal in order to make it more likely to the ppg signal of the ground truth?

video input

J3~M3_X5L@(V2O}DF3$)LE
i want to know what the 'M.mat' denotes. video? or others

Data preprocessing and interface

Hello, I want to ask what kind of requirements are there for the data preprocessing part of the program, and how to write the interface with the main model framework? I would appreciate it if you could answer.

Batch size for model predict.

First of all, thank for provide such a useful algorithm to predict BVP and respiratory rate. My questions are:

  1. Why do we need to set the batch size for model prediction in this scenario (for predict BVP and respiratory rate) ?
  2. How do I determine the batch size?
  3. Will the batch size affect the predict results?

Using time series data

Hello, is it possible to use time-series data instead of MP4 videos as input ? Thanks in advance

Heart Rate Estimation

Do we have a chance to get heart rate values instead of pulse prediction and resp prediction?
Or if it is convertable, how we can do that?

Enquiry about training on UBFC-RPPG and PURE dataset

Hi, I want to train the models on the UBFC-RPPG and PURE datasets but I don't how.

I read through the code in train.py but cannot understand it fully. I would like to know where in the code are the ground truth signals passed into the model.

I have applied preprocessing on the UBFC and PURE datasets which puts all the ground truth data in an excel file (per subject) and also resamples all physiological sensor data to 30 Hz for both datasets.

I also have the original data that's not preprocessed. In that case, the ground truth data is in a text file for each subjects of UBFC dataset and in a JSON file for subjects in the PURE dataset.

Any advice would be appreciated.

Real-Time Query

I'm a little confused about the real-time implications of MTTS-CAN. From what I can gather, based on the model itself, it will only operate on a complete video and does so in isolation (i.e, the temporal information is only leveraged 'per-prediction' to the model and nothing is stored or leveraged for subsequent calls to the model), so while it can retrospectively give a HR and BR estimation for each frame provided to the model it can only do so after being provided all the frames at once for a prediction. Is my understanding of this correct, and that is it not possible to process a live video feed; providing a live HR and BR estimation actually in real-time?

If that is correct then my followup question would be, in your experiementation what is the minimum number of frames that can still provide an accurate estimation? As the only alternative to get would be to process n-previous frames each time a new live frame is received and only use the end values output by the model each time

Request for infos regarding environment setup (strange results with tensorflow 2.8.0)

Hi Xin. It would be very helpful if you could deliver some information about the environment within which you ran your experiments (some kind of requirements.txt). It would be good to know the exact versions of the packages you used - especially regarding tensorflow.

I discovered the problem, that when using tensorflow==2.8.0 (the latest version at the time of writing) the network outputs nonsense. I checked this backwards and beginning with tensorflow==2.6.x this problem occurs. I have no clue what causes this issue but I think it is important to know! So when using tensorflow==2.4.x all seems to work fine.

Steps to reproduce:

  • Run inferencing on the same video (I used the UBFC)
    • once with tensorflow==2.8.0
    • once with tensorflow==2.4.1

You can even check this by feeding a batch of zeros into the network. You will get significant different results depending on the tensorflow version.

Tested this on two different machines - same observation. Maybe someone can check this - am still not sure if I am doing anything wrong as this is a very strange behavior.

Custom dataset

How can I train your model on a custom dataset? I have recorded several videos at 30 fps with synchronized oximeter data in .mat format.

Any one got NaN as output when tested on CPU machine?

I have tested on TensorFlow 2.2.0 and 2.3.0 both on GPU and CPU machine, I noticed when tested on CPU it resulted on NaN while the same video and packages worked on GPU. Anyone else encountered this issue or have an idea how to resolve this

Values of the diagramm

Hey guys,
we are a group from a university, and we are trying to use your implementation to analyze the face in order to get the pulse and resp diagrams.
The problem is that we are getting weird values in the axes of the diagram despite using a video with 30 s length.
We would like to know, how to change those values in the code.
Thank you very much.
Screenshot 2023-05-09 152935

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.