mkocabas / pose-residual-network Goto Github PK

Code for the Pose Residual Network introduced in 'MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network (ECCV 2018)' paper

Home Page: https://arxiv.org/abs/1807.04067

Shell 0.68% Python 99.32%

deep-learning deep-neural-networks eccv eccv-2018 keras pose-estimation python tensorflow

pose-residual-network's People

Contributors

Stargazers

Watchers

Forkers

xingyizhou wcy116 zumbalamambo jacke121 tjussh aifihenryma hengshan123 accountcwd newskylark trantorrepository conanjm azuredsky chenxingqiang yushenxiang wangzheallen jiyongma lc82111 yoyokitartora igormunizs nmxnql sy1300 hlig senyao-han xxy90 sivanke wanjinchang weiweili123 mornydew soulempty b-xiang gherao dltensor my3jie karimamd qianrenjian kk52099 liyifeng123 flyingbird93 mathpopo ieee820 chdzhen bboyhanat aidarikako fendaq insightai vidproc fendou201398 gm19900510 yinrui1991 miaochenguo ieyer lunalulu tragedyn anglebinbin kelvinson khanha2 koryako zero2er0 andersonyangoh pzw520125 donglibing donproc mantianlong mars-y470 zhaoxueyu hxhh daydreamer2023 nawalgao feng-leaf daxiafresh jinfei3459 dung38tn zbxzc35 civcpeihongwei jonyboy2000 madonokouki sailychen hell-to-heaven lobsterbaby todosthing bruinxiong wuyangfeng iamsoroush xiaozhimabing xrosliang zeta1999 niranths zw20yz naveenkumarmulabitmovin

pose-residual-network's Issues

how to build a complete network with video or image as input?

@mkocabas Thanks for you excellent works .I read the the code and found it is a key part of the solution your papers mentioned .
do you have a entire solution with video or image as input?
thanks!

Evaluate detection part with freezed backbone

Hi,
Could you please tell me how did you evaluate the detection part after training with a freezed backbone (trained for keypoint net)?
I can't achieve the mAP... not even close.

Reproducing results reported by paper

Given that the full network & training flow is not released by the authors, did anyone actually fully succeed in reproducing the results written in the paper (both the accuracy & speed of 23 FPS)? Either DL framework is ok. Thank you.

How can I predict a image

Thanks for work，How can I predict a image？

Can't reach the performance reported in project.

I trained your code from scratch without any changes and I got mAP=0.861 of coco_val2017 at the end of 16 epochs.
I want to know how can I reach mAP=0.894?

Hocam bu cok harika olmus

Emre simdi bu hem pose estimation, hem instance segmentation, hem real time.
Kodunu yayinlasaniz da kullansak.

about the infer fps and the network

HI,i head that your implement run 23fps in GTX1080Ti .Did you use mobilenetV2 or some other lite architecture to achieve 23fps?Or you just use traditional convolution which means it can be faster using mobilenetV2?

A question about the speed

As mentioned in the paper：

In terms of running time, our method appears to be the fastest of all multiperson
2D pose estimation methods. Depending on the number of people in the
input image, our method runs at between 27 frames/sec (FPS) (for one person
detection) and 15 FPS (for 20 person detections). For a typical COCO image,
which contains ∼3 people on average, we achieve ∼23 FPS

But，as we can see the speed of Retina Net in the paper 《Focal Loss for Dense Object Detection》 is almost 70ms. So is there any method to speed up?

Is there any pre-trained model available?

Hi,
Your work is great. Look like your method can run at 23 frames / sec while having such good results. I want to ask if you have any plan to publish pre-trained models.
Thank you very much.
Quan Hua

Which path should the COCO images be placed ?

Hi! thank you for you excellent job. When I trained on COCO dataset, the annotations are placed in
/data/annotations/. but which path should the COCO images be placed ?

Complete results and a full pretrained model

Hey,
I have read your paper but its unjust not to release a pretrained model for the entire network. I agree you need not release any code base. But promising to do something and failing to do is a bad thing. If you cannnot release the entire pipeline please release the testing framework for testing your model. Im doubtful if the results reported in the paper are reproducable.
Because of such large hyper paramater tuning one will not be able to reproduce exact results as you report.
I think you wont have "license" issue releasing the code for testing the entire pretrained model everyone does it. Atleast point us to the pretrained models you are using.
Its been already approx 6 months from ECCV camera ready. Its high time to release the testing framework for the pretrained model or we would like to take it to the notice of higher authorities.
We will soon open a reddit post regarding the discussion about this paper as a lot of people are facing issues to reproduce the result.

Weird evaluation code in evaluate.py

Start from Line #140 in evaluation.py, it seems to me that you are using the groundtruth keypoints to obtain you keypoint estimation, which should not happen when you evaluate your PRN network. This issue makes it an improper evaluation.

A brief summary is that, first, you generate indexes from the old_weights_bbox (groundtruth). Then. you seem to utilize the index to place a window around that groundtruth position and calculate your estimated scores. Then the output keypoints are summarized from the scores.

I found the issue in the PyTorch version. There was another guy found the same issue Issue #17. Then I came here and found the same issue in this Keras version. @mkocabas please respond to our concerns. Thanks!

No module named 'gaussian

Traceback (most recent call last): File "main.py", line 6, in <module> from src.utils import train_bbox_generator, val_bbox_generator File "/home/dh/github/pose-residual-network/src/utils.py", line 3, in <module> from gaussian import gaussian, gaussian_multi_input_mp, gaussian_multi_output ModuleNotFoundError: No module named 'gaussian'

Predict on my own data

Hi,
Thank you very much for your amazing job.

I've used python main.py to train the model. How can i use the weights to infer my own images?
I do know how to load weights, but the input has to be (56, 36, 17).

If anyone have a code snippet it would be much appreciated.

Thank you!

When can I see the code?

Can this model train keypoint and segmentation?

I have my own dataset which similar to coco which includes keypoint and segmentation, so i was wondering if i can get the result which output keypoint and body segmentation

how to evaluate the human segmentation?

HI,
How to evaluate the human segmentation part as paper described? just using last channel of (K+1) keypoints heatmap? How to extract the segmentation from PRN network.
Any post-processing op to generate the final segmentation?

Thanks!

Missing license

Could you add a license please to this repo. Thanks

configurations clustering?

checkpoint？

could you provide your model?

the speed of this paper, i have some questions. Please help me.

in your paper's abstract ,"the fastest real time system with ∼23 frames/sec". And we can find "Keypoint and person detections take 35 ms while PRN takes 2 ms per instance" in 4.5 runtime analysis ,So we can get a result 1000/(40) ~=23 fps.

my questions are :
1."is the 35ms include "load image, resize image, transform image to tensor, normlized the image , put the image to cuda, and model inference" ? or "35 ms only for model inference?"

2.i test the speed of PRN , is same as reported on paper 2ms/instance ,but how about the time for "select the box from the retinanet" , "crop the feature map for every instance " and "resize the every instance "

i try to implementation whole repo by pytorch, but "load image , resize , to tensor ....." "select box , nms, crop heatmaps for everyone " cost too many time..., So ,how long your code take for those operation?
^-^,Thanks...!

Run the model on webcam

@mkocabas Hey, Thanks for the interesting paper. I am reviewing the code and I am wondering if there is a function that can return directly the 2D joints. I need to run the program on a webcam to check the real-time. Thanks

Will the rest of the MultiPoseNet framework be assembled ot this ?

Hi. I'm very interested in this code, but unfortunately, it seems to only be usable if the actual major part of the undertaking is assembled, which is the backbone, the keypoint detector and the people detector described in the paper.

They seem to all come from different githubs. Will there be a streamlined integration to allow for easy testing and reproduction ?

Thanks.

A question abount performance on embedded systems

Hi, great work!
I am researching the application of person detection and pose estimation in robots
and I'm wondering if you tested the performance of the entire model on an embedded device like Jetson TX2.
Do you think it's possible to replace RetinaNet with squeezeDet(https://arxiv.org/pdf/1612.01051.pdf) for detection?

Thanks!

what is the inputdata of PRN?

i use the input heatmap for instance , crop from global heatmaps .resized to 1K36*56, when i vislized the output of the PRN,is not right

Code Release

The method and results shown in the paper are amazing. When do you plan to release the code?

Congratulations for your work!