Giter VIP home page Giter VIP logo

motionbert's People

Contributors

baitian752 avatar shirleymaxx avatar viewsetting avatar walter0807 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

motionbert's Issues

mesh prediction error

I run this code:
!python3 infer_wild_mesh.py --vid_path ./4.mp4 --json_path ./alphapose-results.json --out_path /content/MotionBERT

I have saved the best_epoch here:
/content/MotionBERT/checkpoint/mesh/FT_MB_release_MB_ft_pw3d/best_epoch.bin

Traceback (most recent call last):
File "/content/MotionBERT/infer_wild_mesh.py", line 64, in
smpl = SMPL(args.data_root, batch_size=1).cuda()
File "/content/MotionBERT/lib/utils/utils_smpl.py", line 62, in init
super(SMPL, self).init(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/smplx/body_models.py", line 133, in init
assert osp.exists(smpl_path), 'Path {} does not exist!'.format(
AssertionError: Path data/mesh does not exist!

some train questions

Thanks for your great work! I have some questions about how this Dual-stream Spatio-temporal Transformer (DSTformer) can accelerate training in parallel. Besides, whether is T=243 too computationally intensive for T-MHSA.
image
Thank you very much!

Missing key(s) in state_dict: "temp_embed", "pos_embed",

Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin Traceback (most recent call last): File "/content/MotionBERT/infer_wild.py", line 45, in <module> model_backbone.load_state_dict(checkpoint['model_pos'], strict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DSTformer: Missing key(s) in state_dict: "temp_embed", "pos_embed", ... Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", ...

I had tried to solve this problem, according to this blog.
https://blog.csdn.net/yangwangnndd/article/details/100207686
however, there are more problems following.

I followed this guide
https://github.com/Walter0807/MotionBERT/blob/main/docs/inference.md
I use this 3dpose model
https://onedrive.live.com/?authkey=%21ALuKCr9wihi87bI&id=A5438CD242871DF0%21190&cid=A5438CD242871DF0

about 2d projection.

thanks for you great work.
image

in-the-wild RGB videos do not have 3d GT, no depth infomation or camera K, how to do the reprojection ?

License Information

Hi! Thank you so much for sharing this code. Can you please include the license information so that we know the restrictions/limitations if there are any?

Finetune data format

hello @Walter0807 , I wanna finetune pose3d task for my own dataset, what data format should I prepare, like what's inside .pkl file. Now I have a 2D skeleton video and the json file from Alphapose, what should I do next.
Sorry for keeping bothering you.

Real time application

Hi!

I was just wondering if you have some results on the speed and if this model (in the Lite variant) would be suitable for a real-time 3d pose estimation problem?

Thanks

Code release

Thanks for your great work! When will you release the code?

About half body mesh regression issue

Hello, I recently found mb seems not very good at half body mesh regression, just wonder if you have tested it before and what could be the reason caused this?

for image based model like PARE, SPIN, it can imaginary on blinded part, but for mb, it totally failed at this scene.

The formula notation in the paper

Thank you for your great work, and I have a question about one of the formulas in the paper. ◦ denotes element-wise production. But I didn't know ◦, what that symbol meant, could you tell me about it ?

Fine-tuning part layers

Hi,
You mentioned the fine-tuning part layers in your paper, but the code is fine-tuning the entire model, which is costly to calculate. May I ask what part layers refers to?

About Amass trainset

Hello, may I ask if the GRAB and SOMA in the AMASS training set are not used in the training of the pre trained model? If they are used, code tools/compress_amass.py seems to be incorrect?

Some excellent demo

Hi, just post some FBX (not rending, in real 3D) demos here, the result is impressive:

Clip_len 24

2023-04-25-14-47-18 00_00_00-00_00_30

Clip_len 48

2023-04-25-15-08-26 00_00_00-00_00_30

the video I tested is a very challenge one, still get some nice result!

Just still have one issue, the poses might blink in middle of frames. Do u got any thoughts? What's more, what's the best clip len here for realtime applications? (we can't using too big clip len here in realtime)

An error in infer_wild.py

Hello, I have met a difficulty and do not know how to solve it. My error is as follows. May I ask what happened and how to correct it? I'm using a Windows10 computer. Thank you so much!!

(motionbert) F:\DeepLearning\MotionBERT\MotionBERT-main> python infer_wild.py --vid_path video/me.mp4 --json_path video_json/vis_me.mp4.json --out_path output

Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin

0% | | 0/2 [00:00<?, ?it/s]L

oading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin

0% | | 0/2 [00:00<?, ?it/s]

Traceback (most recent call last):

File "< string>" , line 1, in < module>

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 116, in spawn_main

exitcode = _main(fd, parent_sentinel)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 125, in _main

prepare(preparation_data)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 236, in prepare

_fixup_main_from_path(data['init_main_from_path'])

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path

main_content = runpy.run_path(main_path,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 289, in run_path

return _run_module_code(code, init_globals, run_name,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 96, in _run_module_code

_run_code(code, mod_globals, init_globals,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 86, in _run_code

exec(code, run_globals)

File "F:\DeepLearning\MotionBERT\MotionBERT-main\infer_wild.py", line 70, in < module>

for batch_input in tqdm(test_loader):

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\tqdm\std.py", line 1178, in iter

for obj in iterable:

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter

self._iterator = self._get_iterator()

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator

return _MultiProcessingDataLoaderIter(self)

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init

w.start()

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\process.py", line 121, in start

self._popen = self._Popen(self)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\context.py", line 224, in _Popen

return _default_context.get_context().Process._Popen(process_obj)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\context.py", line 336, in _Popen

return Popen(process_obj)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\popen_spawn_win32.py", line 45, in init

prep_data = spawn.get_preparation_data(process_obj._name)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 154, in get_preparation_data

_check_not_importing_main()

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main

raise RuntimeError('''

RuntimeError:

An attempt has been made to start a new process before the

current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your

child processes and you have forgotten to use the proper idiom

in the main module:

if name == 'main':

freeze_support()

...

The "freeze_support()" line can be omitted if the program

is not going to be frozen to produce an executable.

Keypoint format

I get the coco 17 key-points or any other key-point format of my own custom data, and I know I should convert the coco format to human3.6, but how? The definition between coco and human 3.6 is different especially for the body. Is there any way to convert the format between these datasets?

PyTorch version for reproducing the result

I want to reproduce the result of "3D Pose (H36M-SH, scratch), 39.1mm", but I only can get 40.0mm. So I want to know what is the PyTorch version you used to train the model?

how to preprocess NTU dataset?

The 3D coordinates I received are pixel values, can you help me how to convert them into values corresponding to 3D space?

Cannot load checkpoint in docs/inference.MD

Hi,

Thanks for the great work!

I tried to follow the instructions in docs/inference.MD and got the following error while loading the checkpoint:

Error logs
(motionbert) H4dr1en@H4dr1en MotionBERT % /opt/miniconda3/envs/motionbert/bin/python /Users/H4dr1en/projects/MotionBERT/infer_wild_test.py
Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin
Traceback (most recent call last):
  File "/Users/H4dr1en/projects/MotionBERT/infer_wild_test.py", line 37, in <module>
    model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)
  File "/opt/miniconda3/envs/motionbert/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DSTformer:
        Missing key(s) in state_dict: "temp_embed", "pos_embed", "joints_embed.weight", "joints_embed.bias", "blocks_st.0.norm1_s.weight", "blocks_st.0.norm1_s.bias", "blocks_st.0.norm1_t.weight", "blocks_st.0.norm1_t.bias", "blocks_st.0.attn_s.proj.weight", "blocks_st.0.attn_s.proj.bias", "blocks_st.0.attn_s.qkv.weight", "blocks_st.0.attn_s.qkv.bias", "blocks_st.0.attn_t.proj.weight", "blocks_st.0.attn_t.proj.bias", "blocks_st.0.attn_t.qkv.weight", "blocks_st.0.attn_t.qkv.bias", "blocks_st.0.norm2_s.weight", "blocks_st.0.norm2_s.bias", "blocks_st.0.norm2_t.weight", "blocks_st.0.norm2_t.bias", "blocks_st.0.mlp_s.fc1.weight", "blocks_st.0.mlp_s.fc1.bias", "blocks_st.0.mlp_s.fc2.weight", "blocks_st.0.mlp_s.fc2.bias", "blocks_st.0.mlp_t.fc1.weight", "blocks_st.0.mlp_t.fc1.bias", "blocks_st.0.mlp_t.fc2.weight", "blocks_st.0.mlp_t.fc2.bias", "blocks_st.1.norm1_s.weight", "blocks_st.1.norm1_s.bias", "blocks_st.1.norm1_t.weight", "blocks_st.1.norm1_t.bias", "blocks_st.1.attn_s.proj.weight", "blocks_st.1.attn_s.proj.bias", "blocks_st.1.attn_s.qkv.weight", "blocks_st.1.attn_s.qkv.bias", "blocks_st.1.attn_t.proj.weight", "blocks_st.1.attn_t.proj.bias", "blocks_st.1.attn_t.qkv.weight", "blocks_st.1.attn_t.qkv.bias", "blocks_st.1.norm2_s.weight", "blocks_st.1.norm2_s.bias", "blocks_st.1.norm2_t.weight", "blocks_st.1.norm2_t.bias", "blocks_st.1.mlp_s.fc1.weight", "blocks_st.1.mlp_s.fc1.bias", "blocks_st.1.mlp_s.fc2.weight", "blocks_st.1.mlp_s.fc2.bias", "blocks_st.1.mlp_t.fc1.weight", "blocks_st.1.mlp_t.fc1.bias", "blocks_st.1.mlp_t.fc2.weight", "blocks_st.1.mlp_t.fc2.bias", "blocks_st.2.norm1_s.weight", "blocks_st.2.norm1_s.bias", "blocks_st.2.norm1_t.weight", "blocks_st.2.norm1_t.bias", "blocks_st.2.attn_s.proj.weight", "blocks_st.2.attn_s.proj.bias", "blocks_st.2.attn_s.qkv.weight", "blocks_st.2.attn_s.qkv.bias", "blocks_st.2.attn_t.proj.weight", "blocks_st.2.attn_t.proj.bias", "blocks_st.2.attn_t.qkv.weight", "blocks_st.2.attn_t.qkv.bias", "blocks_st.2.norm2_s.weight", "blocks_st.2.norm2_s.bias", "blocks_st.2.norm2_t.weight", "blocks_st.2.norm2_t.bias", "blocks_st.2.mlp_s.fc1.weight", "blocks_st.2.mlp_s.fc1.bias", "blocks_st.2.mlp_s.fc2.weight", "blocks_st.2.mlp_s.fc2.bias", "blocks_st.2.mlp_t.fc1.weight", "blocks_st.2.mlp_t.fc1.bias", "blocks_st.2.mlp_t.fc2.weight", "blocks_st.2.mlp_t.fc2.bias", "blocks_st.3.norm1_s.weight", "blocks_st.3.norm1_s.bias", "blocks_st.3.norm1_t.weight", "blocks_st.3.norm1_t.bias", "blocks_st.3.attn_s.proj.weight", "blocks_st.3.attn_s.proj.bias", "blocks_st.3.attn_s.qkv.weight", "blocks_st.3.attn_s.qkv.bias", "blocks_st.3.attn_t.proj.weight", "blocks_st.3.attn_t.proj.bias", "blocks_st.3.attn_t.qkv.weight", "blocks_st.3.attn_t.qkv.bias", "blocks_st.3.norm2_s.weight", "blocks_st.3.norm2_s.bias", "blocks_st.3.norm2_t.weight", "blocks_st.3.norm2_t.bias", "blocks_st.3.mlp_s.fc1.weight", "blocks_st.3.mlp_s.fc1.bias", "blocks_st.3.mlp_s.fc2.weight", "blocks_st.3.mlp_s.fc2.bias", "blocks_st.3.mlp_t.fc1.weight", "blocks_st.3.mlp_t.fc1.bias", "blocks_st.3.mlp_t.fc2.weight", "blocks_st.3.mlp_t.fc2.bias", "blocks_st.4.norm1_s.weight", "blocks_st.4.norm1_s.bias", "blocks_st.4.norm1_t.weight", "blocks_st.4.norm1_t.bias", "blocks_st.4.attn_s.proj.weight", "blocks_st.4.attn_s.proj.bias", "blocks_st.4.attn_s.qkv.weight", "blocks_st.4.attn_s.qkv.bias", "blocks_st.4.attn_t.proj.weight", "blocks_st.4.attn_t.proj.bias", "blocks_st.4.attn_t.qkv.weight", "blocks_st.4.attn_t.qkv.bias", "blocks_st.4.norm2_s.weight", "blocks_st.4.norm2_s.bias", "blocks_st.4.norm2_t.weight", "blocks_st.4.norm2_t.bias", "blocks_st.4.mlp_s.fc1.weight", "blocks_st.4.mlp_s.fc1.bias", "blocks_st.4.mlp_s.fc2.weight", "blocks_st.4.mlp_s.fc2.bias", "blocks_st.4.mlp_t.fc1.weight", "blocks_st.4.mlp_t.fc1.bias", "blocks_st.4.mlp_t.fc2.weight", "blocks_st.4.mlp_t.fc2.bias", "blocks_ts.0.norm1_s.weight", "blocks_ts.0.norm1_s.bias", "blocks_ts.0.norm1_t.weight", "blocks_ts.0.norm1_t.bias", "blocks_ts.0.attn_s.proj.weight", "blocks_ts.0.attn_s.proj.bias", "blocks_ts.0.attn_s.qkv.weight", "blocks_ts.0.attn_s.qkv.bias", "blocks_ts.0.attn_t.proj.weight", "blocks_ts.0.attn_t.proj.bias", "blocks_ts.0.attn_t.qkv.weight", "blocks_ts.0.attn_t.qkv.bias", "blocks_ts.0.norm2_s.weight", "blocks_ts.0.norm2_s.bias", "blocks_ts.0.norm2_t.weight", "blocks_ts.0.norm2_t.bias", "blocks_ts.0.mlp_s.fc1.weight", "blocks_ts.0.mlp_s.fc1.bias", "blocks_ts.0.mlp_s.fc2.weight", "blocks_ts.0.mlp_s.fc2.bias", "blocks_ts.0.mlp_t.fc1.weight", "blocks_ts.0.mlp_t.fc1.bias", "blocks_ts.0.mlp_t.fc2.weight", "blocks_ts.0.mlp_t.fc2.bias", "blocks_ts.1.norm1_s.weight", "blocks_ts.1.norm1_s.bias", "blocks_ts.1.norm1_t.weight", "blocks_ts.1.norm1_t.bias", "blocks_ts.1.attn_s.proj.weight", "blocks_ts.1.attn_s.proj.bias", "blocks_ts.1.attn_s.qkv.weight", "blocks_ts.1.attn_s.qkv.bias", "blocks_ts.1.attn_t.proj.weight", "blocks_ts.1.attn_t.proj.bias", "blocks_ts.1.attn_t.qkv.weight", "blocks_ts.1.attn_t.qkv.bias", "blocks_ts.1.norm2_s.weight", "blocks_ts.1.norm2_s.bias", "blocks_ts.1.norm2_t.weight", "blocks_ts.1.norm2_t.bias", "blocks_ts.1.mlp_s.fc1.weight", "blocks_ts.1.mlp_s.fc1.bias", "blocks_ts.1.mlp_s.fc2.weight", "blocks_ts.1.mlp_s.fc2.bias", "blocks_ts.1.mlp_t.fc1.weight", "blocks_ts.1.mlp_t.fc1.bias", "blocks_ts.1.mlp_t.fc2.weight", "blocks_ts.1.mlp_t.fc2.bias", "blocks_ts.2.norm1_s.weight", "blocks_ts.2.norm1_s.bias", "blocks_ts.2.norm1_t.weight", "blocks_ts.2.norm1_t.bias", "blocks_ts.2.attn_s.proj.weight", "blocks_ts.2.attn_s.proj.bias", "blocks_ts.2.attn_s.qkv.weight", "blocks_ts.2.attn_s.qkv.bias", "blocks_ts.2.attn_t.proj.weight", "blocks_ts.2.attn_t.proj.bias", "blocks_ts.2.attn_t.qkv.weight", "blocks_ts.2.attn_t.qkv.bias", "blocks_ts.2.norm2_s.weight", "blocks_ts.2.norm2_s.bias", "blocks_ts.2.norm2_t.weight", "blocks_ts.2.norm2_t.bias", "blocks_ts.2.mlp_s.fc1.weight", "blocks_ts.2.mlp_s.fc1.bias", "blocks_ts.2.mlp_s.fc2.weight", "blocks_ts.2.mlp_s.fc2.bias", "blocks_ts.2.mlp_t.fc1.weight", "blocks_ts.2.mlp_t.fc1.bias", "blocks_ts.2.mlp_t.fc2.weight", "blocks_ts.2.mlp_t.fc2.bias", "blocks_ts.3.norm1_s.weight", "blocks_ts.3.norm1_s.bias", "blocks_ts.3.norm1_t.weight", "blocks_ts.3.norm1_t.bias", "blocks_ts.3.attn_s.proj.weight", "blocks_ts.3.attn_s.proj.bias", "blocks_ts.3.attn_s.qkv.weight", "blocks_ts.3.attn_s.qkv.bias", "blocks_ts.3.attn_t.proj.weight", "blocks_ts.3.attn_t.proj.bias", "blocks_ts.3.attn_t.qkv.weight", "blocks_ts.3.attn_t.qkv.bias", "blocks_ts.3.norm2_s.weight", "blocks_ts.3.norm2_s.bias", "blocks_ts.3.norm2_t.weight", "blocks_ts.3.norm2_t.bias", "blocks_ts.3.mlp_s.fc1.weight", "blocks_ts.3.mlp_s.fc1.bias", "blocks_ts.3.mlp_s.fc2.weight", "blocks_ts.3.mlp_s.fc2.bias", "blocks_ts.3.mlp_t.fc1.weight", "blocks_ts.3.mlp_t.fc1.bias", "blocks_ts.3.mlp_t.fc2.weight", "blocks_ts.3.mlp_t.fc2.bias", "blocks_ts.4.norm1_s.weight", "blocks_ts.4.norm1_s.bias", "blocks_ts.4.norm1_t.weight", "blocks_ts.4.norm1_t.bias", "blocks_ts.4.attn_s.proj.weight", "blocks_ts.4.attn_s.proj.bias", "blocks_ts.4.attn_s.qkv.weight", "blocks_ts.4.attn_s.qkv.bias", "blocks_ts.4.attn_t.proj.weight", "blocks_ts.4.attn_t.proj.bias", "blocks_ts.4.attn_t.qkv.weight", "blocks_ts.4.attn_t.qkv.bias", "blocks_ts.4.norm2_s.weight", "blocks_ts.4.norm2_s.bias", "blocks_ts.4.norm2_t.weight", "blocks_ts.4.norm2_t.bias", "blocks_ts.4.mlp_s.fc1.weight", "blocks_ts.4.mlp_s.fc1.bias", "blocks_ts.4.mlp_s.fc2.weight", "blocks_ts.4.mlp_s.fc2.bias", "blocks_ts.4.mlp_t.fc1.weight", "blocks_ts.4.mlp_t.fc1.bias", "blocks_ts.4.mlp_t.fc2.weight", "blocks_ts.4.mlp_t.fc2.bias", "norm.weight", "norm.bias", "pre_logits.fc.weight", "pre_logits.fc.bias", "head.weight", "head.bias", "ts_attn.0.weight", "ts_attn.0.bias", "ts_attn.1.weight", "ts_attn.1.bias", "ts_attn.2.weight", "ts_attn.2.bias", "ts_attn.3.weight", "ts_attn.3.bias", "ts_attn.4.weight", "ts_attn.4.bias". 
        Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", "module.joints_embed.weight", "module.joints_embed.bias", "module.blocks_st.0.norm1_s.weight", "module.blocks_st.0.norm1_s.bias", "module.blocks_st.0.norm1_t.weight", "module.blocks_st.0.norm1_t.bias", "module.blocks_st.0.attn_s.proj.weight", "module.blocks_st.0.attn_s.proj.bias", "module.blocks_st.0.attn_s.qkv.weight", "module.blocks_st.0.attn_s.qkv.bias", "module.blocks_st.0.attn_t.proj.weight", "module.blocks_st.0.attn_t.proj.bias", "module.blocks_st.0.attn_t.qkv.weight", "module.blocks_st.0.attn_t.qkv.bias", "module.blocks_st.0.norm2_s.weight", "module.blocks_st.0.norm2_s.bias", "module.blocks_st.0.norm2_t.weight", "module.blocks_st.0.norm2_t.bias", "module.blocks_st.0.mlp_s.fc1.weight", "module.blocks_st.0.mlp_s.fc1.bias", "module.blocks_st.0.mlp_s.fc2.weight", "module.blocks_st.0.mlp_s.fc2.bias", "module.blocks_st.0.mlp_t.fc1.weight", "module.blocks_st.0.mlp_t.fc1.bias", "module.blocks_st.0.mlp_t.fc2.weight", "module.blocks_st.0.mlp_t.fc2.bias", "module.blocks_st.1.norm1_s.weight", "module.blocks_st.1.norm1_s.bias", "module.blocks_st.1.norm1_t.weight", "module.blocks_st.1.norm1_t.bias", "module.blocks_st.1.attn_s.proj.weight", "module.blocks_st.1.attn_s.proj.bias", "module.blocks_st.1.attn_s.qkv.weight", "module.blocks_st.1.attn_s.qkv.bias", "module.blocks_st.1.attn_t.proj.weight", "module.blocks_st.1.attn_t.proj.bias", "module.blocks_st.1.attn_t.qkv.weight", "module.blocks_st.1.attn_t.qkv.bias", "module.blocks_st.1.norm2_s.weight", "module.blocks_st.1.norm2_s.bias", "module.blocks_st.1.norm2_t.weight", "module.blocks_st.1.norm2_t.bias", "module.blocks_st.1.mlp_s.fc1.weight", "module.blocks_st.1.mlp_s.fc1.bias", "module.blocks_st.1.mlp_s.fc2.weight", "module.blocks_st.1.mlp_s.fc2.bias", "module.blocks_st.1.mlp_t.fc1.weight", "module.blocks_st.1.mlp_t.fc1.bias", "module.blocks_st.1.mlp_t.fc2.weight", "module.blocks_st.1.mlp_t.fc2.bias", "module.blocks_st.2.norm1_s.weight", "module.blocks_st.2.norm1_s.bias", "module.blocks_st.2.norm1_t.weight", "module.blocks_st.2.norm1_t.bias", "module.blocks_st.2.attn_s.proj.weight", "module.blocks_st.2.attn_s.proj.bias", "module.blocks_st.2.attn_s.qkv.weight", "module.blocks_st.2.attn_s.qkv.bias", "module.blocks_st.2.attn_t.proj.weight", "module.blocks_st.2.attn_t.proj.bias", "module.blocks_st.2.attn_t.qkv.weight", "module.blocks_st.2.attn_t.qkv.bias", "module.blocks_st.2.norm2_s.weight", "module.blocks_st.2.norm2_s.bias", "module.blocks_st.2.norm2_t.weight", "module.blocks_st.2.norm2_t.bias", "module.blocks_st.2.mlp_s.fc1.weight", "module.blocks_st.2.mlp_s.fc1.bias", "module.blocks_st.2.mlp_s.fc2.weight", "module.blocks_st.2.mlp_s.fc2.bias", "module.blocks_st.2.mlp_t.fc1.weight", "module.blocks_st.2.mlp_t.fc1.bias", "module.blocks_st.2.mlp_t.fc2.weight", "module.blocks_st.2.mlp_t.fc2.bias", "module.blocks_st.3.norm1_s.weight", "module.blocks_st.3.norm1_s.bias", "module.blocks_st.3.norm1_t.weight", "module.blocks_st.3.norm1_t.bias", "module.blocks_st.3.attn_s.proj.weight", "module.blocks_st.3.attn_s.proj.bias", "module.blocks_st.3.attn_s.qkv.weight", "module.blocks_st.3.attn_s.qkv.bias", "module.blocks_st.3.attn_t.proj.weight", "module.blocks_st.3.attn_t.proj.bias", "module.blocks_st.3.attn_t.qkv.weight", "module.blocks_st.3.attn_t.qkv.bias", "module.blocks_st.3.norm2_s.weight", "module.blocks_st.3.norm2_s.bias", "module.blocks_st.3.norm2_t.weight", "module.blocks_st.3.norm2_t.bias", "module.blocks_st.3.mlp_s.fc1.weight", "module.blocks_st.3.mlp_s.fc1.bias", "module.blocks_st.3.mlp_s.fc2.weight", "module.blocks_st.3.mlp_s.fc2.bias", "module.blocks_st.3.mlp_t.fc1.weight", "module.blocks_st.3.mlp_t.fc1.bias", "module.blocks_st.3.mlp_t.fc2.weight", "module.blocks_st.3.mlp_t.fc2.bias", "module.blocks_st.4.norm1_s.weight", "module.blocks_st.4.norm1_s.bias", "module.blocks_st.4.norm1_t.weight", "module.blocks_st.4.norm1_t.bias", "module.blocks_st.4.attn_s.proj.weight", "module.blocks_st.4.attn_s.proj.bias", "module.blocks_st.4.attn_s.qkv.weight", "module.blocks_st.4.attn_s.qkv.bias", "module.blocks_st.4.attn_t.proj.weight", "module.blocks_st.4.attn_t.proj.bias", "module.blocks_st.4.attn_t.qkv.weight", "module.blocks_st.4.attn_t.qkv.bias", "module.blocks_st.4.norm2_s.weight", "module.blocks_st.4.norm2_s.bias", "module.blocks_st.4.norm2_t.weight", "module.blocks_st.4.norm2_t.bias", "module.blocks_st.4.mlp_s.fc1.weight", "module.blocks_st.4.mlp_s.fc1.bias", "module.blocks_st.4.mlp_s.fc2.weight", "module.blocks_st.4.mlp_s.fc2.bias", "module.blocks_st.4.mlp_t.fc1.weight", "module.blocks_st.4.mlp_t.fc1.bias", "module.blocks_st.4.mlp_t.fc2.weight", "module.blocks_st.4.mlp_t.fc2.bias", "module.blocks_ts.0.norm1_s.weight", "module.blocks_ts.0.norm1_s.bias", "module.blocks_ts.0.norm1_t.weight", "module.blocks_ts.0.norm1_t.bias", "module.blocks_ts.0.attn_s.proj.weight", "module.blocks_ts.0.attn_s.proj.bias", "module.blocks_ts.0.attn_s.qkv.weight", "module.blocks_ts.0.attn_s.qkv.bias", "module.blocks_ts.0.attn_t.proj.weight", "module.blocks_ts.0.attn_t.proj.bias", "module.blocks_ts.0.attn_t.qkv.weight", "module.blocks_ts.0.attn_t.qkv.bias", "module.blocks_ts.0.norm2_s.weight", "module.blocks_ts.0.norm2_s.bias", "module.blocks_ts.0.norm2_t.weight", "module.blocks_ts.0.norm2_t.bias", "module.blocks_ts.0.mlp_s.fc1.weight", "module.blocks_ts.0.mlp_s.fc1.bias", "module.blocks_ts.0.mlp_s.fc2.weight", "module.blocks_ts.0.mlp_s.fc2.bias", "module.blocks_ts.0.mlp_t.fc1.weight", "module.blocks_ts.0.mlp_t.fc1.bias", "module.blocks_ts.0.mlp_t.fc2.weight", "module.blocks_ts.0.mlp_t.fc2.bias", "module.blocks_ts.1.norm1_s.weight", "module.blocks_ts.1.norm1_s.bias", "module.blocks_ts.1.norm1_t.weight", "module.blocks_ts.1.norm1_t.bias", "module.blocks_ts.1.attn_s.proj.weight", "module.blocks_ts.1.attn_s.proj.bias", "module.blocks_ts.1.attn_s.qkv.weight", "module.blocks_ts.1.attn_s.qkv.bias", "module.blocks_ts.1.attn_t.proj.weight", "module.blocks_ts.1.attn_t.proj.bias", "module.blocks_ts.1.attn_t.qkv.weight", "module.blocks_ts.1.attn_t.qkv.bias", "module.blocks_ts.1.norm2_s.weight", "module.blocks_ts.1.norm2_s.bias", "module.blocks_ts.1.norm2_t.weight", "module.blocks_ts.1.norm2_t.bias", "module.blocks_ts.1.mlp_s.fc1.weight", "module.blocks_ts.1.mlp_s.fc1.bias", "module.blocks_ts.1.mlp_s.fc2.weight", "module.blocks_ts.1.mlp_s.fc2.bias", "module.blocks_ts.1.mlp_t.fc1.weight", "module.blocks_ts.1.mlp_t.fc1.bias", "module.blocks_ts.1.mlp_t.fc2.weight", "module.blocks_ts.1.mlp_t.fc2.bias", "module.blocks_ts.2.norm1_s.weight", "module.blocks_ts.2.norm1_s.bias", "module.blocks_ts.2.norm1_t.weight", "module.blocks_ts.2.norm1_t.bias", "module.blocks_ts.2.attn_s.proj.weight", "module.blocks_ts.2.attn_s.proj.bias", "module.blocks_ts.2.attn_s.qkv.weight", "module.blocks_ts.2.attn_s.qkv.bias", "module.blocks_ts.2.attn_t.proj.weight", "module.blocks_ts.2.attn_t.proj.bias", "module.blocks_ts.2.attn_t.qkv.weight", "module.blocks_ts.2.attn_t.qkv.bias", "module.blocks_ts.2.norm2_s.weight", "module.blocks_ts.2.norm2_s.bias", "module.blocks_ts.2.norm2_t.weight", "module.blocks_ts.2.norm2_t.bias", "module.blocks_ts.2.mlp_s.fc1.weight", "module.blocks_ts.2.mlp_s.fc1.bias", "module.blocks_ts.2.mlp_s.fc2.weight", "module.blocks_ts.2.mlp_s.fc2.bias", "module.blocks_ts.2.mlp_t.fc1.weight", "module.blocks_ts.2.mlp_t.fc1.bias", "module.blocks_ts.2.mlp_t.fc2.weight", "module.blocks_ts.2.mlp_t.fc2.bias", "module.blocks_ts.3.norm1_s.weight", "module.blocks_ts.3.norm1_s.bias", "module.blocks_ts.3.norm1_t.weight", "module.blocks_ts.3.norm1_t.bias", "module.blocks_ts.3.attn_s.proj.weight", "module.blocks_ts.3.attn_s.proj.bias", "module.blocks_ts.3.attn_s.qkv.weight", "module.blocks_ts.3.attn_s.qkv.bias", "module.blocks_ts.3.attn_t.proj.weight", "module.blocks_ts.3.attn_t.proj.bias", "module.blocks_ts.3.attn_t.qkv.weight", "module.blocks_ts.3.attn_t.qkv.bias", "module.blocks_ts.3.norm2_s.weight", "module.blocks_ts.3.norm2_s.bias", "module.blocks_ts.3.norm2_t.weight", "module.blocks_ts.3.norm2_t.bias", "module.blocks_ts.3.mlp_s.fc1.weight", "module.blocks_ts.3.mlp_s.fc1.bias", "module.blocks_ts.3.mlp_s.fc2.weight", "module.blocks_ts.3.mlp_s.fc2.bias", "module.blocks_ts.3.mlp_t.fc1.weight", "module.blocks_ts.3.mlp_t.fc1.bias", "module.blocks_ts.3.mlp_t.fc2.weight", "module.blocks_ts.3.mlp_t.fc2.bias", "module.blocks_ts.4.norm1_s.weight", "module.blocks_ts.4.norm1_s.bias", "module.blocks_ts.4.norm1_t.weight", "module.blocks_ts.4.norm1_t.bias", "module.blocks_ts.4.attn_s.proj.weight", "module.blocks_ts.4.attn_s.proj.bias", "module.blocks_ts.4.attn_s.qkv.weight", "module.blocks_ts.4.attn_s.qkv.bias", "module.blocks_ts.4.attn_t.proj.weight", "module.blocks_ts.4.attn_t.proj.bias", "module.blocks_ts.4.attn_t.qkv.weight", "module.blocks_ts.4.attn_t.qkv.bias", "module.blocks_ts.4.norm2_s.weight", "module.blocks_ts.4.norm2_s.bias", "module.blocks_ts.4.norm2_t.weight", "module.blocks_ts.4.norm2_t.bias", "module.blocks_ts.4.mlp_s.fc1.weight", "module.blocks_ts.4.mlp_s.fc1.bias", "module.blocks_ts.4.mlp_s.fc2.weight", "module.blocks_ts.4.mlp_s.fc2.bias", "module.blocks_ts.4.mlp_t.fc1.weight", "module.blocks_ts.4.mlp_t.fc1.bias", "module.blocks_ts.4.mlp_t.fc2.weight", "module.blocks_ts.4.mlp_t.fc2.bias", "module.norm.weight", "module.norm.bias", "module.pre_logits.fc.weight", "module.pre_logits.fc.bias", "module.head.weight", "module.head.bias", "module.ts_attn.0.weight", "module.ts_attn.0.bias", "module.ts_attn.1.weight", "module.ts_attn.1.bias", "module.ts_attn.2.weight", "module.ts_attn.2.bias", "module.ts_attn.3.weight", "module.ts_attn.3.bias", "module.ts_attn.4.weight", "module.ts_attn.4.bias". 

I guess there is a mismatch between the checkpoint and the configuration file (and maybe code?). I am sure that I downloaded the checkpoint from the link in the inference.MD. Could you please double-check?

Note: I tried to load the checkpoint with all other configuration files in /configs/pose3d, none worked

Here is the code I am running
import os
import argparse
import torch
import torch.nn as nn
import os, sys
sys.path.append(os.getcwd())
from lib.utils.tools import *
from lib.utils.learning import *


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--config", type=str, default="configs/pose3d/MB_ft_h36m_global_lite.yaml",
                        help="Path to the config file.")
    parser.add_argument('-e', '--evaluate', default='checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin',
                        type=str, metavar='FILENAME', help='checkpoint to evaluate (file name)')
    # parser.add_argument('-j', '--json_path', type=str, help='alphapose detection result json path')
    # parser.add_argument('-v', '--vid_path', type=str, help='video path')
    parser.add_argument('-o', '--out_path', type=str, help='output path')
    parser.add_argument('--pixel', action='store_true', help='align with pixle coordinates')
    parser.add_argument('--focus', type=int, default=None, help='target person id')
    parser.add_argument('--clip_len', type=int, default=243, help='clip length for network input')
    opts = parser.parse_args()
    return opts


opts = parse_args()
args = get_config(opts.config)

model_backbone = load_backbone(args)

print('Loading checkpoint', opts.evaluate)
checkpoint = torch.load(opts.evaluate, map_location="cpu")
model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)

In-the-wild Inference input format

Hello and thanks for sharing your code.
May I please ask about the structure of the .json file needed for the In-the-wild Inference for 3D pose estimation? I want to use 2d estimations of another network other than AlphaPose and was not sure how to structure my 2d poses so it's compatible with your code.
Thanks in advance for your help.

Usage

Hi, thanks for the release, looks very cool!
Can you please give me a hint on how I can utilize your pre-trained model to inference on my own video?
Given a video, do I need to run 2D pose estimation first before I can use MotionBERT? Or do you already provide that?
How should I generate 3D points on my video?
How can I get the motion embedding? I tried:

E = MotionBERT.get_representation(x)

but get_representation does not exist!
Thank you!

If you could just give me high level hints I would appreciate it! Thanks!

Prediction Scale

Hi,
I have questions about the dimensions of the predicted poses both in inference and evaluation code.
I noticed that the predictions of the network in the evaluation function in train.py are being multiplied by a factor and I traced it back to data['test']['2.5d_factor'] in h36m_sh_conf_cam_source_final.pkl. Could you please help me understand how these factors are being calculated?
Does this mean that the outputs of the network are not expected to have the correct scale of a human (in meters) and only the relative pose is the goal? especially in inference, I noticed that when running the inference code if I plot the outputs I notice a change in the dimensions of the person (even when applying that I guess comes from this, even when using the MB_ft_h36m model with rootrel set to True.

In general, it would be really appreciated if you could help me understand the scale of the output and how I can convert it to meters.

Thanks in advance for your help.

Velocity loss in the paper

Thank you for your great work. I would like to ask about the loss function mentioned in the paper and the part about speed loss. What is the meaning of adding speed loss.

wild video infer too slow

Hello, may I ask, I deduce that my own video speed is very slow, 10s video takes more than ten minutes, and there are some warning messages, may I ask why?

MAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (923, 924) to (928, 928) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisib

le by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).

0% | ▋ | 1/296 [00:00 & lt; 04:16, 1.15 it/s] [

swscaler @ 000001df1adc4300] Warning: data is not aligned! This can lead to a speed loss

My testloader Settings are as follows: testloader_params = {

'batch_size': 1,

'shuffle': False,

'num_workers': 0,

'pin_memory': True,

'prefetch_factor': 2,

'persistent_workers': False,

'drop_last': False

}

About Comparison of Model Architectural Designs

Hello, thank you for your work.
image
In my opinion, it is inappropriate to compare methods (a) and (f). From what I understand, method (a) contains one S-T block in one module whereas method (f) contains S-T, T-S blocks.
That is, I think method (f) has twice as many parameters as method (a). So I think it would be more appropriate to compare by setting the depth to 10 in method (a).
thank you!

About the hips coordinates to world

Hi, I got some real-time 3d pose result and visualize in open3d, it looks good:

ezgif-4-a234c576ce

However, I am wondering how to mapping the hips cooridinates to realworld, I am currently +0.65 for the z axis, but not aligned well, looks like it should be some value in normalized height hips to height. Do u know what exactly value it is?

How to accelerate model infer speed

Hi,I use the script ‘infer_wild.sh’ to infer 3d pose. I have a gpu and can I use it or use other method to accelerate model rendering speed? I found the the GPU utilization rate is very low.
image

Broken link pyskl

The link of pyskl in the action.md is broken. Is it possible to find any new link for this, or some similar reference?

how to get the action recognition result for custom videos

Hey thanks for this wonderful work, the performance of 2D-3D recontruction is just eye-openning. I am just wondering whether the action recognition inference code for custom video is released yet, I can only find the evaluation code for action recognition which is meant for NTU-RGBD dataset.

Something about train.

Thank you for your great work. I have a question for you as follows. I see that there are three training sections in the doc folder, which are pretrain, scratch and finetune. Is there any connection between these three? If I focus only on 2D key points to 3D key points, which one should I focus on? Thank you very much and look forward to your answer.

about the root trans in 3d pose and mesh

Hello, this is really an impressive work, clean and fantastic. however, I got some questions wanna ask for help:

  1. From the demo videos, the translations in 3d keypoints task relatively stable, but mesh looks like bumppy in vertical direction (height), is there a way to make use of all of their strength to make a multi-modality like model to make mesh trans more accurate?
  2. Since looks like 3d pose result really good, but most situations we need rotations of the body (like SMPL), is there a way could get the rotations like SMPL from 3d pose directly?

Mesh with HybrIK

Will you provide the code of mesh recovery using HybrIK (report in Table 3)? I’d appreciate it if you can release the code related to this part.

demonstration of pose estimation

Hello, thanks to your wonderful work, i recently try to use motionbert, but it seems like it can only output some information like MPJPE. just wonder if i want to demo real time video pose estimation just like your animation in the cover, how shoulld i do? thanks you.

Pre-Training Time

Hi,

Thank you for your great work! How long does the pretraining take on 8 V100 machines? Thanks!

about the speed

even though the model is not big, but the speed is quite slow, about 1s per frame, is that normal?

About the conf input

Hello, I notice that the input can be with or without conf, but didn't saw any ablation on this part, if it uses conf, then it highly couple with the pose model itself (some models might didn't produce relatively high scores), does there any add conf or not the final metrices accordingly?

Input keypoint structure

Hi, the documentation says to use the H36M keypoint format or the Halpe 26 keypoints. Since these two formats differ and I’m trying to use YOLOv7 to extract the 2D poses, which keypoints and ordering does MotionBERT expect? Is there an example json available? Thank you😊

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.