mks0601 / tf-simplehumanpose Goto Github PK
View Code? Open in Web Editor NEWTensorFlow implementation of "Simple Baselines for Human Pose Estimation and Tracking", ECCV 2018
TensorFlow implementation of "Simple Baselines for Human Pose Estimation and Tracking", ECCV 2018
If there are some other indicators to tell me when to stop except epoch loss?
The epoch loss changes heavily and it's hard to determine if the model converged or not
Hey,
Thanks for the pre-trained models.
I want to do transfer learning on top of that on my Custom Dataset.
I have the Custom training images tagged in COCO format and saved in json Format.
Can you please guide on how to go ahead from here ?
hello, i have trained my datasets by this code, but the result is very strange, it seems can't learning anything. i had check my datasets seriously, it's right, could you give me some advices for solve this. the result is shown when i trained about 40 epochs. Any suggestions from you will give me a lot of help, thank you very much.
Evaluate annotation type keypoints
DONE (t=52.07s).
Accumulating evaluation results...
DONE (t=11.49s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.001
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.001
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.001
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.001
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.057
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.080
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.060
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.056
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.061
hello, i want to use your code to inference single image, and i followed the code in test.py, but, when i use tester = Tester(Model(), cfg), i got this error:
TypeError: Can't instantiate abstract class Tester with abstract methods _make_data
do u know why and how can i repair this bug? thanks
Hi, you write very good code and recently I am working on PoseTrack (including 3 tasks) too.
I use the msra's pytorch code (simple baseline) finetune the PoseTrack 2018 dataset and all parameters I use according to the paper.
Using gt_box on PoseTrack2018 dataset for Human Pose Estimation, I get the results which is worse than you reported:
my eval results:
Loading data
('# gt frames :', 3902)
('# pred frames:', 3902)
Evaluation of per-frame multi-person pose estimation
('saving results to', './out/total_AP_metrics.json')
Average Precision (AP) metric:
& Head & Shou & Elb & Wri & Hip & Knee & Ankl & Total\
& 49.4 & 89.2 & 83.9 & 76.1 & 81.0 & 80.3 & 74.6 & 74.6 \
Can you teach me how to finetune the model and I want to get the same results as you!
Thank you!
BTW, do you plan to implement the tracking code in this paper(simple baseline for human pose estimation and tracking)? I am doing this section ...
get the 12g mpii dataset,run the mpii2coco.py,
error
FileNotFoundError: [Errno 2] No such file or directory: '../images/015601864.jpg'
and I check the imags,it has no 015601864.jpg.
how to solve it
Has anyone tried using the weights in the original pytorch repo with this code?
the data folder has 3 sub foldes and each as dataset.py in it. I tried running the codes but i couldn't get the datasets. The train and test codes are also not working without the data.
In the original paper, the encoder part is freezing. Is there a reason why you chose the trainable "True" in the encoder(ResNet50) of the code?
Thanks.
I am sorry that I ask this question because I barely know tf. Could you pls describe what you have done and what parameters you use during training? Did you follow the training technique used on posetrack 17 in the original paper?
hello, i am confused about test result. As we know, all keypoints are composed 3 parts: (x,y,v), v is valid. but i found your test result(https://cv.snu.ac.kr/research/TF-SimpleHumanPose/COCO/pose_result/person_keypoints_256x192_resnet50_val2017_results.json), is all 1, and test code,
set v=1 of all keypoints.
Simple Baselines for Human Pose Estimation and Tracking
I have not changed anything in the code and using the mentioned configurations described in the project. Finished the training. I am getting an error. Does anyone have any idea? The traceback is:
Traceback (most recent call last):
File "test.py", line 24, in
from nms.nms import oks_nms
File "/datadrive/common/Ishrak/Integral_Pose_estimation/TF-SimpleHumanPose/main/../lib/nms/nms.py", line 14, in
from .gpu_nms import gpu_nms
ImportError: /datadrive/common/Ishrak/Integral_Pose_estimation/TF-SimpleHumanPose/main/../lib/nms/gpu_nms.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaRegisterFatBinaryEnd
Best
Ishrak
Technos Data Science Engineering Inc, Tokyo
I ran the test.py
prorgramme on the COCO test2017 dataset with the human_detections_text-dev2017.json
renamed to human_detections.json
. However, the output results of all the Average Precision and Average Recall metrics are -1. Here is the snapshot:
I am unsure whether the problem is due to the human_detections_text-dev2017.json
is not the right file to use.
Could you please help? Thanks in advance.
Hi,
I wonder who's the author of tfflat modules. It's another repo that I see similar code. It reminds me a bit of tensorpack style, but simpler. Is it a part of some general library for training? Just curious
Regards,
Hi @mks0601, I use provided model of 256x192_resnet50 and human detection results to test the accuracy on the PoseTrack 2018 dataset. The result is as follows:
Loading data
gt frames : 8923
pred frames: 8923
Evaluation of per-frame multi-person pose estimation
saving results to ./out/total_AP_metrics.json
Average Precision (AP) metric:
& Head & Shou & Elb & Wri & Hip & Knee & Ankl & Total\
& 39.1 & 41.3 & 39.6 & 35.6 & 38.5 & 38.9 & 36.5 & 38.5 \
Furthermore, I downloaded the pose results from the provided link(https://cv.snu.ac.kr/research/TF-SimpleHumanPose/PoseTrack/pose_result/person_keypoints_256x192_resnet50_val_results.zip) and the number of included json files is only 74 while the number of groundtruth of val set is 170. The evaluation cannot be finished. I am not sure where is the problem. Could you give some suggestions? Thanks in advance.
def make_network(self, is_train)
backbone = eval(cfg.backbone)
resnet_fms = backbone(image, is_train, bn_trainable=True)
heatmap_outs = self.head_net(resnet_fms, is_train)
what is it ?????eval in backbone = eval(cfg.backbone)??
this line of code ???i do not understand..
eval ???
Do you have the pre-trained model for Resnet_152 ?
Hi,
Can I get input stream from OpenCV?
Thank you very much.
Hi! Thanks for your repo.
The number of joints in posetrack is a little different from COCO. How did you handle the missing joints like left_ear and right_ear. Also the different joint head_bottom and head_top?
Hi,
Could someone explain why do we need tf.stop_gradient applied to gt_heatmap. I mean this line from model.py:
gt_heatmap = tf.stop_gradient(self.render_gaussian_heatmap(target_coord, cfg.output_shape, cfg.sigma))
I don't see any dependency on tf.Variables in render_gaussian_heatmap() graph?
Hi mks,
First of all, thanks for making this such an amazing implementation of SHP; however, I encounter an OOM issue while training the model that The more time I utilize to train the model the larger memory is occupied. The way I fixed is to simply add a "graph.finalize" before iter.
Please just let me know If you need more info.
Thanks
Hi mks,
I gotta run inference while loading the ckpt file as well as the input node name I choose is "tower_0/Placeholder" based on the tester.predict function you wrote. the ideal shape should be (None,256,192,3), but I don't know why the shape I get is (32,256,192,3). The code is following.
saver = tf.train.import_meta_graph('location of meta file',clear_devices=True)
sess = tf.Session()
saver.restore(sess, 'location of ckpt file')
_input = sess.graph.get_tensor_by_name("tower_0/Placeholder:0")
print(_input)
<tf.Tensor 'tower_0/Placeholder:0' shape=(32, 256, 192, 3) dtype=float32>
Any idea to change the batch size to None? Thanks
Hi can you suggest any model to create human detection json. We are Technos data science Inc . from Japan.
Hi,
Thank you very much for the work. Can you give an example of how to inference on an image with your pretrained weights? Thanks a lot.
Thanks for your great work.
I would like to know how to start on my own image or video and how about the speed.
Hello!
What is the license of usage?
I can't see the visualization in vis output after running the test. The test generated results and saved in result.pkl. I commented out from the /lib/nms/nms.py:
from .gpu_nms import gpu_nms
to bypass CudaRegistarFatbinaryending error. Do you have any suggestions for why the vis folder empty?
Thanks for your work! Is is possible to upload the pre-trained model to Google Drive? The currently provided link too slow for downloading (less than 100KB/s here in German).
안녕하세요. 한국인인듯하여 한국어로 질문을 드립니다.
이 프로젝트를 macOS에서 돌려보기 위해서 시도해보고 있는데 어려움을 겪고 있습니다. 그래서 몇가지 질문을 드립니다.
감사합니다 :-)
안녕하세요 질문이 2가지 있습니다.
제가 COCO val2017 데이터를 올려주신 pre-trained model 을 restore 하여 테스트 해보려고하는데
test.py 에 MultiGPUFunc.work() 에서 에러가 납니다, 혹시 multi-gpu 환경에서만(gpu 2개) snapshot_140.ckpt 가 restore 되나요?
그리고 추가적으로 test.py 했을때, load_pkl 에서 tmp_result_0.pkl 파일 없다고 오류가 나는데,
위와 관련이 있는건지, 아니면 혹시 추가하지 않은 파일이 있는건지 궁금합니다.
감사합니다
how to get the human detection json???the mpii dataset has it????
it means get the pepole bbox from the imgs???
I want to test model trained from MPII, where can I get human_detection.json of MPII
in config ,the lr code
if epoch < self.lr_dec_epoch[-1]:
i = self.lr_dec_epoch.index(e)
what is lr_dec_epoch.index(e). the lr_dec_epoch is [90,120] but the index(e)?????????
what is the index ???????
Did you implement Flow Based Pose Tracking algorithm in the paper?
Could you please share the code?
Hi, I just try to modify an input image size from 256x288 to 384x28, resulting in a 62 AP. No idea why this happens. The step I use is simply downloaded model and modify the parameter of the image size in a config file
Hello,
I am trying to run the test.py on the provided pre-trained model 256x192_resnet50_coco but I am getting the following errors:
though I don't really understand what it does yet, I skimmed over the code and you seemed to try to delete it. So as I don't have that pickle file, I tried to bypass those blocks of code by setting argument dump_method=2
in test.py
like so MultiGPUFunc = MultiProc(len(args.gpu_ids.split(',')), func, dump_method=2)
but then I encountered another error as following:
the command I used to run the programme is : python test.py --gpu 0 --test_epoch 140
as I have only 1 gpu card. Also I tried to changed argument gpu
to 1, 0-1 but the error is still the same.
As an aside, I am not sure which file is right for the dets/human_detection.json and so I used the human_detection_test-dev2017.json of Human detection result on test-dev2017 (57.2 AP on human class) and renamed it to human_detection.json. I wonder if it is the cause to the error?
I need your help.
thanks in advance.
Thank you for your work. You are very cool. I have watched your two projects. There are some questions about this repo. Does this repo only implement the 2d human pose estimation with bboxs already in place. The Simple-Baseline paper has a lot of stuff about detector bbox and propagate bbox processing, optical flow tracking and so on. The original repo and yours don't seems to cover that . So the algorithm of tracking processing implementation is not open source?
There are some unused dependencies should be removed from requirement.txt
conda install
some dependencies should be added:
Here is my conda list, it works but may contain unused dependencies. you can copy it to a file and use conda create -n TF-SimpleHumanPose -f your_file_name
to create a identical conda environment. hope it will be helpful:
name: vac_py3_n
channels:
- conda-forge
- anaconda
- defaults
dependencies:
- _tflow_180_select=1.0=gpu
- absl-py=0.6.1=py36_0
- arrow-cpp=0.11.1=py36h5c3f529_0
- astor=0.7.1=py36_0
- bleach=1.5.0=py36_0
- c-ares=1.15.0=h7b6447c_1
- cudatoolkit=9.0=h13b8566_0
- cudnn=7.1.2=cuda9.0_0
- cupti=9.0.176=0
- gast=0.2.0=py36_0
- gflags=2.2.2=he6710b0_0
- glog=0.3.5=hf484d3e_1
- html5lib=0.9999999=py36_0
- intel-openmp=2019.1=144
- keras-applications=1.0.6=py36_0
- keras-base=2.2.4=py36_0
- keras-gpu=2.2.4=0
- keras-preprocessing=1.0.5=py36_0
- libboost=1.67.0=h46d08c1_4
- libedit=3.1.20170329=h6b74fdf_2
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=8.2.0=hdf63c60_1
- libgfortran-ng=7.3.0=hdf63c60_0
- libopenblas=0.3.3=h5a2b251_3
- libprotobuf=3.6.1=hd408876_0
- libsodium=1.0.16=h1bed415_0
- libstdcxx-ng=8.2.0=hdf63c60_1
- lz4-c=1.8.1.2=h14c3975_0
- markdown=3.0.1=py36_0
- mkl=2019.1=144
- ncurses=6.1=he6710b0_1
- numpy-base=1.15.4=py36h2f8d375_0
- olefile=0.46=py36_0
- pandas=0.23.4=py36h04863e7_0
- pillow=5.3.0=py36h34e0f95_0
- pip=18.1=py36_0
- protobuf=3.6.1=py36he6710b0_0
- pyarrow=0.11.1=py36he6710b0_0
- python-dateutil=2.7.5=py36_0
- pytz=2018.7=py36_0
- pyyaml=3.13=py36h14c3975_0
- pyzmq=17.1.2=py36h14c3975_0
- readline=7.0=h7b6447c_5
- setuptools=40.6.2=py36_0
- six=1.11.0=py36_1
- snappy=1.1.7=hbae5bb6_3
- tabulate=0.8.2=py36_0
- tensorboard=1.8.0=py36hf484d3e_0
- tensorflow=1.8.0=hb11d968_0
- tensorflow-base=1.8.0=py36hc1a7637_0
- tensorflow-gpu=1.8.0=h7b35bdc_0
- termcolor=1.1.0=py36_1
- tk=8.6.8=hbc83047_0
- tqdm=4.28.1=py36h28b3542_0
- werkzeug=0.14.1=py36_0
- wheel=0.32.3=py36_0
- xz=5.2.4=h14c3975_4
- yaml=0.1.7=had09818_2
- zeromq=4.2.5=hf484d3e_1
- zlib=1.2.11=h7b6447c_3
- zstd=1.3.3=h84994c4_0
- atk=2.25.90=hf2eb9ee_1001
- blas=1.1=openblas
- boost-cpp=1.68.0=h11c811c_1000
- bzip2=1.0.6=h14c3975_1002
- ca-certificates=2018.11.29=ha4d7672_0
- cairo=1.14.12=h80bd089_1005
- certifi=2018.11.29=py36_1000
- cycler=0.10.0=py_1
- dbus=1.13.0=h4e0c4b3_1000
- expat=2.2.5=hf484d3e_1002
- ffmpeg=4.1=h6dce934_1000
- fontconfig=2.13.1=h2176d3f_1000
- freetype=2.9.1=h3cfcefd_1004
- gdk-pixbuf=2.36.12=h4f1c04b_1001
- gettext=0.19.8.1=h9745a5d_1001
- giflib=5.1.4=h14c3975_1001
- glib=2.56.2=had28632_1001
- gmp=6.1.2=hf484d3e_1000
- gnutls=3.6.5=hd3a4fd2_1001
- gobject-introspection=1.56.1=py36h9e29830_1001
- graphite2=1.3.13=hf484d3e_1000
- grpcio=1.16.0=py36h4f00d22_1000
- gst-plugins-base=1.12.5=h3865690_1000
- gstreamer=1.12.5=h0cc0488_1000
- gtk2=2.24.31=h5baeb44_1000
- h5py=2.9.0=py36h31fdc65_1000
- harfbuzz=1.9.0=he243708_1001
- hdf5=1.10.4=nompi_h11e915b_1105
- icu=58.2=hf484d3e_1000
- jasper=1.900.1=h07fcdf6_1005
- jpeg=9c=h14c3975_1001
- kiwisolver=1.0.1=py36h6bb024c_1002
- libevent=2.0.22=hb7f436b_1002
- libiconv=1.15=h14c3975_1004
- libpng=1.6.36=h84994c4_1000
- libtiff=4.0.10=h648cc4a_1001
- libuuid=2.32.1=h14c3975_1000
- libwebp=1.0.1=h576950b_1000
- libxcb=1.13=h14c3975_1002
- libxml2=2.9.8=h143f9aa_1005
- matplotlib=3.0.2=py36h8a2030e_1001
- matplotlib-base=3.0.2=py36h167e16e_1001
- mkl_fft=1.0.10=py36_0
- mkl_random=1.0.2=py36_0
- msgpack-python=0.6.0=py36h6bb024c_1000
- nettle=3.4.1=h14c3975_1002
- numpy=1.16.0=py36_blas_openblash1522bff_1000
- openblas=0.3.3=h9ac9557_1001
- opencv=3.4.4=py36_blas_openblash85ad109_1203
- openh264=1.8.0=hdbcaa40_1000
- openssl=1.0.2p=h14c3975_1002
- pango=1.40.14=hf0c64fd_1003
- parsedatetime=2.4=py_1
- pcre=8.41=hf484d3e_1003
- pixman=0.34.0=h14c3975_1003
- pthread-stubs=0.4=h14c3975_1001
- pyparsing=2.3.1=py_0
- pyqt=5.6.0=py36h13b7fb3_1008
- python=3.6.6=hd21baee_1003
- qt=5.6.2=hf516382_1011
- scikit-learn=0.20.2=py36_blas_openblashebff5e3_1400
- scipy=1.2.0=py36_blas_openblash1522bff_1200
- setproctitle=1.1.10=py36h14c3975_1001
- sip=4.18.1=py36hf484d3e_1000
- thrift-cpp=0.11.0=h23e226f_1003
- tornado=5.1.1=py36h14c3975_1000
- x264=1!152.20180717=h14c3975_1001
- xorg-kbproto=1.0.7=h14c3975_1002
- xorg-libice=1.0.9=h14c3975_1004
- xorg-libsm=1.2.3=h4937e3b_1000
- xorg-libx11=1.6.6=h14c3975_1000
- xorg-libxau=1.0.8=h14c3975_1006
- xorg-libxdmcp=1.1.2=h14c3975_1007
- xorg-libxext=1.3.3=h14c3975_1004
- xorg-libxrender=0.9.10=h14c3975_1002
- xorg-libxt=1.1.5=h14c3975_1002
- xorg-renderproto=0.11.1=h14c3975_1002
- xorg-xextproto=7.3.0=h14c3975_1002
- xorg-xproto=7.0.31=h14c3975_1007
- cython=0.29.2=py36he6710b0_0
- sqlite=3.26.0=h7b6447c_0
- pip:
- keras==2.2.4
- msgpack==0.6.0
- msgpack-numpy==0.4.4.2
prefix: /data1/anaconda3/envs/vac_py3_n
blocks = [
resnet_utils.Block('block1', bottleneck,
[(256, 64, 1)] * 2 + [(256, 64, 1)]),
resnet_utils.Block('block2', bottleneck,
[(512, 128, 2)] + [(512, 128, 1)] * 3),
resnet_utils.Block('block3', bottleneck,
[(1024, 256, 2)] + [(1024, 256, 1)] * 5),
resnet_utils.Block('block4', bottleneck,
[(2048, 512, 2)] + [(2048, 512, 1)] * 2)
]
blocks means resnet50??? but why compute net1234???
resnet_features = [net, net2, net3, net4]
return resnet_features
it means???
Hello, thanks for your work.
I am currently struggling to get the same results as the ones announced on the github page.
Here are the results I get using your code and your pretrained model :
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.278
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.666
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.172
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.290
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.285
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.341
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.726
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.271
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.324
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.365
With the useGTbbox flag on, I get :
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.281
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.673
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.169
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.292
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.284
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.324
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.700
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.253
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.314
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.340
Is there something I misunderstood about he way to evaluate the model ?
Should I pass the result.pkl into coco api to get some new scores ?
Thank you very much for your time.
Hello, and thank you for this amazing work.
I have an Image and I used a pre-trained object detection model to get classID, scores, bounding box.
Now, using this, how can I use the pose estimation model.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.