Giter VIP home page Giter VIP logo

vitpose_pytorch's Introduction

ViTPose (simple version w/o mmcv)

An unofficial implementation of ViTPose [Y. Xu et al., 2022]
result_image

Usage

| Inference

python inference.py --image-path './examples/img1.jpg'

| Training

python train.py --config-path config.yaml --model-name 'b'
  • model_name must be in (b, l, h)

Note

  1. Download the trained model (.pth)
  2. Set the config. according to the trained model

Reference

All codes were written with reference to the official ViTPose repo.

vitpose_pytorch's People

Contributors

jaehyunnn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

vitpose_pytorch's Issues

Python version?

Hi, it is not clear what version of Python you used for the project. Could you please add it to the README?

Config File Issues while training

Hey @jaehyunnn ,
I was trying to finetune ViTPose-L on my own custom dataset using you repository. While doing that, I encountered with the following error:- " lr_mult = cfg.optimizer['paramwise_cfg']['layer_decay_rate']. KeyError: 'paramwise_cfg' ". This error occured in the train_model function where we try to access the optimizer's config values. However, this file has directly been imported from the official ViTPose repository and I think they are using some other base config files like one for coco to set all these attributes. I would like to know how to resolve the issue or if I should just create a new yaml file for ViT-L version and use that yaml file directly?

Thanking you in advance,
Bavesh Balaji

学习率变为0?

🏋️> Epoch [000/210] | Loss 0.0029 | LR 0.00000000 | Step: 100%|██████████| 1827/1827 [03:38<00:00, 8.38it/s]
2023-04-24 21:32:07,523 - utils - INFO - [Summary-train] Epoch [000/210] | Average Loss (train) 0.0047 --- 218.09494 sec. elapsed
2023-04-24 21:32:40,968 - utils - INFO - [Summary-valid] Epoch [000/210] | Average Loss (valid) 0.0032 --- 32.84739 sec. elapsed
🏋️> Epoch [001/210] | Loss 0.0037 | LR 0.00000000 | Step: 100%|██████████| 1827/1827 [03:08<00:00, 9.69it/s]
2023-04-24 21:35:49,590 - utils - INFO - [Summary-train] Epoch [001/210] | Average Loss (train) 0.0034 --- 188.62070 sec. elapsed
2023-04-24 21:36:15,265 - utils - INFO - [Summary-valid] Epoch [001/210] | Average Loss (valid) 0.0032 --- 25.22072 sec. elapsed

博主好 ,训练coco的时候,学习率第一个epoch后就变为了0,可以指导下是什么原因吗?感谢

RuntimeError: Given groups=1, weight of size [768, 3, 16, 16], expected input[1, 1, 256, 192] to have 3 channels, but got 1 channels instead

#========= [Train Configs] =========#
# - Num GPUs: 1
# - Batch size (per gpu): 1
# - LR:  0.000063
# - Num params: 89,994,513
# - AMP: True
#===================================# 

🏋️> Epoch [000/210] | Loss 0.0014 | LR 0.000006 | Step: 0%| | 51/149813 [00:08<6:55:19, 6.01it/s]]
Traceback (most recent call last):
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/train.py", line 163, in
main()
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/train.py", line 150, in main
train_model(
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/utils/train_valid_fn.py", line 131, in train_model
outputs = model(images)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/models/model.py", line 24, in forward
return self.keypoint_head(self.backbone(x))
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/models/backbone/vit.py", line 399, in forward
x = self.forward_features(x)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/models/backbone/vit.py", line 379, in forward_features
x, (Hp, Wp) = self.patch_embed(x)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/mxy/workspace/ViTPose_pytorch/models/backbone/vit.py", line 226, in forward
x = self.proj(x)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/ubuntu/anaconda3/envs/vitpose/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [768, 3, 16, 16], expected input[1, 1, 256, 192] to have 3 channels, but got 1 channels instead

Wrong preprocessing

Training preprocessing is missing during inference where images are only set to [0, 1] range but not normalized using mean / std.

Inference code:

img_tensor = transforms.Compose (
[transforms.Resize((img_size[1], img_size[0])),
transforms.ToTensor()]

Training code:

self.transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

Reference from issue that was created in my fork
JunkyByte/easy_ViTPose#12

Wholebody inference

Hi, thank you for sharing a nice codebase

Is there a way to get COCO wholebody keypoints in this repo like original ViTPose?

License

Thanks for the awesome work. I would like to know if this work follows the sampe Apache license as the original ViTPose repository. Thanks in advance :)

Other data + other joints

Very nice repo! I am wondering if you have a process to modify the joint definitions as well as the input size for different data. We are looking at key pointing a variety of objects and I am wondering what to modify in order to allow for that. Basically, how to change the keypoints as well as the input/output size of the network.

Thank you and awesome work!

About accuracy.

Thanks for your code contributions. I would like to ask whether your repo can be trained to the same accuracy as the original repo of ViTPose?

Converting official checkpoints to the format used in this repo

Hello and thank you for this very useful implementation (I have managed to run it on Jetson devices, unlike the official implementation).

I have tried to use the official weights and configs for other models, provided here: https://github.com/ViTAE-Transformer/ViTPose, but I get some messages about key mismatch, such as

RuntimeError: Error(s) in loading state_dict for ViTPose:
	Unexpected key(s) in state_dict: "backbone.cls_token". 

If you could provide info about the parameter names you used in your model, I think I might work to something similar to this conversion script: open-mmlab/mmsegmentation#1473

Best

about 'train_custom' coco dataset

Hello,
The annotations file, person_keypoints_train_custom.json, is not mentioned on the readme page.
The file may be modified from person_keypoints_train2017.json
What changes have been made? Could you please upload the custom files?

Thanks for your time.

Only extract the posture of one person

Hello, thanks for your great work!

When an image consists multi people, the vit-pose usually detects multi boxes and plot all of their posture.

Is there any method to just extract only one posture of the person with the largest conf?

Thank u very much and looking forward to your reply.

How to run it on 224x224 image?

As far as I understood, all pretrained models are based on 256x192 image resolution. Is it possible to run it on 224x224 image?

Reference to your work - fork for 25 keypoints skeleton

Hi, thanks for your code.
In the last few days I worked on a finetuned version on COCO + feet dataset (25 keypoints skeleton like openpose) with easy and fast inference using onnx / tensorrt.
I started with a fork but ended up publishing a standalone repository, if you want to be cited in any particular way please feel free to ask (I already put a reference to your work).

https://github.com/JunkyByte/easy_ViTPose

Also if any part of the code I wrote is interesting to your repo just tell me and I can work on a PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.