jaehyunnn / vitpose_pytorch Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 20.0 2.16 MB

An unofficial implementation of ViTPose [Y. Xu et al., 2022]

License: Apache License 2.0

Python 27.67% Cython 0.38% Cuda 30.64% C++ 0.02% Jupyter Notebook 41.29%

computer-vision human-pose pose-estimation transformers vision-transformers vit vitpose

vitpose_pytorch's People

Contributors

Stargazers

Watchers

Forkers

nghorbani anzisheng tarkers junkybyte bilalsal sorayutmild sweaterr kovlo golfaivision dani2112 binary1215 chrisantuseze farid-karimli fredrik1997 shangoolmuktar jamesemi ren8394 serizard

vitpose_pytorch's Issues

About accuracy.

Thanks for your code contributions. I would like to ask whether your repo can be trained to the same accuracy as the original repo of ViTPose?

How to run it on 224x224 image?

As far as I understood, all pretrained models are based on 256x192 image resolution. Is it possible to run it on 224x224 image?

running demo on new image

Thanks for a simplification of the ViTPose.

A quick run on a novel image yielded wierd results:

Wrong preprocessing

Training preprocessing is missing during inference where images are only set to [0, 1] range but not normalized using mean / std.

Inference code:

ViTPose_pytorch/inference.py

Lines 46 to 48 in 1bd3cc3

 img_tensor = transforms.Compose ( 

 [transforms.Resize((img_size[1], img_size[0])), 

 transforms.ToTensor()]

Training code:

ViTPose_pytorch/datasets/COCO.py

Lines 119 to 122 in 1bd3cc3

 self.transform = transforms.Compose([ 

 transforms.ToTensor(), 

 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), 

 ])

Reference from issue that was created in my fork
JunkyByte/easy_ViTPose#12

Config File Issues while training

Hey @jaehyunnn ,
I was trying to finetune ViTPose-L on my own custom dataset using you repository. While doing that, I encountered with the following error:- " lr_mult = cfg.optimizer['paramwise_cfg']['layer_decay_rate']. KeyError: 'paramwise_cfg' ". This error occured in the train_model function where we try to access the optimizer's config values. However, this file has directly been imported from the official ViTPose repository and I think they are using some other base config files like one for coco to set all these attributes. I would like to know how to resolve the issue or if I should just create a new yaml file for ViT-L version and use that yaml file directly?

Thanking you in advance,
Bavesh Balaji

Only extract the posture of one person

Hello, thanks for your great work!

When an image consists multi people, the vit-pose usually detects multi boxes and plot all of their posture.

Is there any method to just extract only one posture of the person with the largest conf?

Thank u very much and looking forward to your reply.

Other data + other joints

Very nice repo! I am wondering if you have a process to modify the joint definitions as well as the input size for different data. We are looking at key pointing a variety of objects and I am wondering what to modify in order to allow for that. Basically, how to change the keypoints as well as the input/output size of the network.

Thank you and awesome work!

RuntimeError: Given groups=1, weight of size [768, 3, 16, 16], expected input[1, 1, 256, 192] to have 3 channels, but got 1 channels instead

#========= [Train Configs] =========#
# - Num GPUs: 1
# - Batch size (per gpu): 1
# - LR:  0.000063
# - Num params: 89,994,513
# - AMP: True
#===================================#

🏋️> Epoch [000/210] | Loss 0.0014 | LR 0.000006 | Step: 0%| | 51/149813 [00:08<6:55:19, 6.01it/s]]
Traceback (most recent call last):
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/train.py", line 163, in
main()
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/train.py", line 150, in main
train_model(
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/utils/train_valid_fn.py", line 131, in train_model
outputs = model(images)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/models/model.py", line 24, in forward
return self.keypoint_head(self.backbone(x))
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/models/backbone/vit.py", line 399, in forward
x = self.forward_features(x)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/models/backbone/vit.py", line 379, in forward_features
x, (Hp, Wp) = self.patch_embed(x)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/models/backbone/vit.py", line 226, in forward
x = self.proj(x)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [768, 3, 16, 16], expected input[1, 1, 256, 192] to have 3 channels, but got 1 channels instead

Python version?

Hi, it is not clear what version of Python you used for the project. Could you please add it to the README?

License

Thanks for the awesome work. I would like to know if this work follows the sampe Apache license as the original ViTPose repository. Thanks in advance :)

Reference to your work - fork for 25 keypoints skeleton

Hi, thanks for your code.
In the last few days I worked on a finetuned version on COCO + feet dataset (25 keypoints skeleton like openpose) with easy and fast inference using onnx / tensorrt.
I started with a fork but ended up publishing a standalone repository, if you want to be cited in any particular way please feel free to ask (I already put a reference to your work).

https://github.com/JunkyByte/easy_ViTPose

Also if any part of the code I wrote is interesting to your repo just tell me and I can work on a PR.

Wholebody inference

Hi, thank you for sharing a nice codebase

Is there a way to get COCO wholebody keypoints in this repo like original ViTPose?

Converting official checkpoints to the format used in this repo

Hello and thank you for this very useful implementation (I have managed to run it on Jetson devices, unlike the official implementation).

I have tried to use the official weights and configs for other models, provided here: https://github.com/ViTAE-Transformer/ViTPose, but I get some messages about key mismatch, such as

RuntimeError: Error(s) in loading state_dict for ViTPose:
	Unexpected key(s) in state_dict: "backbone.cls_token".

If you could provide info about the parameter names you used in your model, I think I might work to something similar to this conversion script: open-mmlab/mmsegmentation#1473

Best

about 'train_custom' coco dataset

Hello,
The annotations file, person_keypoints_train_custom.json, is not mentioned on the readme page.
The file may be modified from person_keypoints_train2017.json
What changes have been made? Could you please upload the custom files?

Thanks for your time.

学习率变为0？

🏋️> Epoch [000/210] | Loss 0.0029 | LR 0.00000000 | Step: 100%|██████████| 1827/1827 [03:38<00:00, 8.38it/s]
2023-04-24 21:32:07,523 - utils - INFO - [Summary-train] Epoch [000/210] | Average Loss (train) 0.0047 --- 218.09494 sec. elapsed
2023-04-24 21:32:40,968 - utils - INFO - [Summary-valid] Epoch [000/210] | Average Loss (valid) 0.0032 --- 32.84739 sec. elapsed
🏋️> Epoch [001/210] | Loss 0.0037 | LR 0.00000000 | Step: 100%|██████████| 1827/1827 [03:08<00:00, 9.69it/s]
2023-04-24 21:35:49,590 - utils - INFO - [Summary-train] Epoch [001/210] | Average Loss (train) 0.0034 --- 188.62070 sec. elapsed
2023-04-24 21:36:15,265 - utils - INFO - [Summary-valid] Epoch [001/210] | Average Loss (valid) 0.0032 --- 25.22072 sec. elapsed

博主好，训练coco的时候，学习率第一个epoch后就变为了0，可以指导下是什么原因吗？感谢

The learning rate warms up to the preset value during training and quickly becomes 0

Hello author, thank you for your work. I have a question to ask you: The learning rate warms up to the preset value during training and quickly becomes 0，but the training is still ongoing

	img_tensor = transforms.Compose (
	[transforms.Resize((img_size[1], img_size[0])),
	transforms.ToTensor()]

	self.transform = transforms.Compose([
	transforms.ToTensor(),
	transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
	])