walter0807 / motionbert Goto Github PK
View Code? Open in Web Editor NEW[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"
License: Apache License 2.0
[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"
License: Apache License 2.0
I run this code:
!python3 infer_wild_mesh.py --vid_path ./4.mp4 --json_path ./alphapose-results.json --out_path /content/MotionBERT
I have saved the best_epoch here:
/content/MotionBERT/checkpoint/mesh/FT_MB_release_MB_ft_pw3d/best_epoch.bin
Traceback (most recent call last):
File "/content/MotionBERT/infer_wild_mesh.py", line 64, in
smpl = SMPL(args.data_root, batch_size=1).cuda()
File "/content/MotionBERT/lib/utils/utils_smpl.py", line 62, in init
super(SMPL, self).init(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/smplx/body_models.py", line 133, in init
assert osp.exists(smpl_path), 'Path {} does not exist!'.format(
AssertionError: Path data/mesh does not exist!
Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin Traceback (most recent call last): File "/content/MotionBERT/infer_wild.py", line 45, in <module> model_backbone.load_state_dict(checkpoint['model_pos'], strict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DSTformer: Missing key(s) in state_dict: "temp_embed", "pos_embed", ... Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", ...
I had tried to solve this problem, according to this blog.
https://blog.csdn.net/yangwangnndd/article/details/100207686
however, there are more problems following.
I followed this guide
https://github.com/Walter0807/MotionBERT/blob/main/docs/inference.md
I use this 3dpose model
https://onedrive.live.com/?authkey=%21ALuKCr9wihi87bI&id=A5438CD242871DF0%21190&cid=A5438CD242871DF0
Hi, I looked through LCN and all data processing scripts, but none of them ever mentioned 2.5d-related fields. I wonder how is 2.5d_factor
in the pkl file is calculated?
Hi! Thank you so much for sharing this code. Can you please include the license information so that we know the restrictions/limitations if there are any?
hello @Walter0807 , I wanna finetune pose3d task for my own dataset, what data format should I prepare, like what's inside .pkl file. Now I have a 2D skeleton video and the json file from Alphapose, what should I do next.
Sorry for keeping bothering you.
How can I run the model with image
Hi!
I was just wondering if you have some results on the speed and if this model (in the Lite variant) would be suitable for a real-time 3d pose estimation problem?
Thanks
Thanks for your great work! When will you release the code?
Hello, I recently found mb seems not very good at half body mesh regression, just wonder if you have tested it before and what could be the reason caused this?
for image based model like PARE, SPIN, it can imaginary on blinded part, but for mb, it totally failed at this scene.
I want to generate synthetic_noise.pth for different skeleton type
Thank you for your great work, and I have a question about one of the formulas in the paper. ◦ denotes element-wise production. But I didn't know ◦, what that symbol meant, could you tell me about it ?
Hi,
You mentioned the fine-tuning part layers in your paper, but the code is fine-tuning the entire model, which is costly to calculate. May I ask what part layers refers to?
Hi, I use the training code and find something wrong.
https://github.com/Walter0807/MotionBERT/blob/ec48976542ba746fd1b48054502be03888fbab86/train.py#LL340C89-L340C89 The test dataloader include AMASS, and in dataset_motion_3d.py, 2d input use motion_file['data_input'] which is None when generating AMASS testset. So, how to use AMASS testset or just use H3.6M testset? Looking forward to your reply, thx!
Hello, may I ask if the GRAB and SOMA in the AMASS training set are not used in the training of the pre trained model? If they are used, code tools/compress_amass.py seems to be incorrect?
Hello, this is really an impressive work!
I have a question about how to use hybridk's 17 point output for mesh inference?I found that the output data of hybrik is inconsistent with the orientation of MB.
The 243-length model training is too much computational and memory consumption for me. Can you provide pre-train model with a smaller sequence length (e.g.: 27, 81)? Thanks!
Hi, just post some FBX (not rending, in real 3D) demos here, the result is impressive:
Clip_len 24
Clip_len 48
the video I tested is a very challenge one, still get some nice result!
Just still have one issue, the poses might blink in middle of frames. Do u got any thoughts? What's more, what's the best clip len here for realtime applications? (we can't using too big clip len here in realtime)
Hello, I have met a difficulty and do not know how to solve it. My error is as follows. May I ask what happened and how to correct it? I'm using a Windows10 computer. Thank you so much!!
(motionbert) F:\DeepLearning\MotionBERT\MotionBERT-main> python infer_wild.py --vid_path video/me.mp4 --json_path video_json/vis_me.mp4.json --out_path output
Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin
0% | | 0/2 [00:00<?, ?it/s]L
oading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin
0% | | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "< string>" , line 1, in < module>
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 289, in run_path
return _run_module_code(code, init_globals, run_name,
File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\DeepLearning\MotionBERT\MotionBERT-main\infer_wild.py", line 70, in < module>
for batch_input in tqdm(test_loader):
File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\tqdm\std.py", line 1178, in iter
for obj in iterable:
File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter
self._iterator = self._get_iterator()
File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init
w.start()
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if name == 'main':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
I get the coco 17 key-points or any other key-point format of my own custom data, and I know I should convert the coco format to human3.6, but how? The definition between coco and human 3.6 is different especially for the body. Is there any way to convert the format between these datasets?
I want to reproduce the result of "3D Pose (H36M-SH, scratch), 39.1mm", but I only can get 40.0mm. So I want to know what is the PyTorch version you used to train the model?
Dose the network output the camera coordinate of 3D poses?
The 3D coordinates I received are pixel values, can you help me how to convert them into values corresponding to 3D space?
Hi,
Thanks for the great work!
I tried to follow the instructions in docs/inference.MD
and got the following error while loading the checkpoint:
(motionbert) H4dr1en@H4dr1en MotionBERT % /opt/miniconda3/envs/motionbert/bin/python /Users/H4dr1en/projects/MotionBERT/infer_wild_test.py
Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin
Traceback (most recent call last):
File "/Users/H4dr1en/projects/MotionBERT/infer_wild_test.py", line 37, in <module>
model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)
File "/opt/miniconda3/envs/motionbert/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DSTformer:
Missing key(s) in state_dict: "temp_embed", "pos_embed", "joints_embed.weight", "joints_embed.bias", "blocks_st.0.norm1_s.weight", "blocks_st.0.norm1_s.bias", "blocks_st.0.norm1_t.weight", "blocks_st.0.norm1_t.bias", "blocks_st.0.attn_s.proj.weight", "blocks_st.0.attn_s.proj.bias", "blocks_st.0.attn_s.qkv.weight", "blocks_st.0.attn_s.qkv.bias", "blocks_st.0.attn_t.proj.weight", "blocks_st.0.attn_t.proj.bias", "blocks_st.0.attn_t.qkv.weight", "blocks_st.0.attn_t.qkv.bias", "blocks_st.0.norm2_s.weight", "blocks_st.0.norm2_s.bias", "blocks_st.0.norm2_t.weight", "blocks_st.0.norm2_t.bias", "blocks_st.0.mlp_s.fc1.weight", "blocks_st.0.mlp_s.fc1.bias", "blocks_st.0.mlp_s.fc2.weight", "blocks_st.0.mlp_s.fc2.bias", "blocks_st.0.mlp_t.fc1.weight", "blocks_st.0.mlp_t.fc1.bias", "blocks_st.0.mlp_t.fc2.weight", "blocks_st.0.mlp_t.fc2.bias", "blocks_st.1.norm1_s.weight", "blocks_st.1.norm1_s.bias", "blocks_st.1.norm1_t.weight", "blocks_st.1.norm1_t.bias", "blocks_st.1.attn_s.proj.weight", "blocks_st.1.attn_s.proj.bias", "blocks_st.1.attn_s.qkv.weight", "blocks_st.1.attn_s.qkv.bias", "blocks_st.1.attn_t.proj.weight", "blocks_st.1.attn_t.proj.bias", "blocks_st.1.attn_t.qkv.weight", "blocks_st.1.attn_t.qkv.bias", "blocks_st.1.norm2_s.weight", "blocks_st.1.norm2_s.bias", "blocks_st.1.norm2_t.weight", "blocks_st.1.norm2_t.bias", "blocks_st.1.mlp_s.fc1.weight", "blocks_st.1.mlp_s.fc1.bias", "blocks_st.1.mlp_s.fc2.weight", "blocks_st.1.mlp_s.fc2.bias", "blocks_st.1.mlp_t.fc1.weight", "blocks_st.1.mlp_t.fc1.bias", "blocks_st.1.mlp_t.fc2.weight", "blocks_st.1.mlp_t.fc2.bias", "blocks_st.2.norm1_s.weight", "blocks_st.2.norm1_s.bias", "blocks_st.2.norm1_t.weight", "blocks_st.2.norm1_t.bias", "blocks_st.2.attn_s.proj.weight", "blocks_st.2.attn_s.proj.bias", "blocks_st.2.attn_s.qkv.weight", "blocks_st.2.attn_s.qkv.bias", "blocks_st.2.attn_t.proj.weight", "blocks_st.2.attn_t.proj.bias", "blocks_st.2.attn_t.qkv.weight", "blocks_st.2.attn_t.qkv.bias", "blocks_st.2.norm2_s.weight", "blocks_st.2.norm2_s.bias", "blocks_st.2.norm2_t.weight", "blocks_st.2.norm2_t.bias", "blocks_st.2.mlp_s.fc1.weight", "blocks_st.2.mlp_s.fc1.bias", "blocks_st.2.mlp_s.fc2.weight", "blocks_st.2.mlp_s.fc2.bias", "blocks_st.2.mlp_t.fc1.weight", "blocks_st.2.mlp_t.fc1.bias", "blocks_st.2.mlp_t.fc2.weight", "blocks_st.2.mlp_t.fc2.bias", "blocks_st.3.norm1_s.weight", "blocks_st.3.norm1_s.bias", "blocks_st.3.norm1_t.weight", "blocks_st.3.norm1_t.bias", "blocks_st.3.attn_s.proj.weight", "blocks_st.3.attn_s.proj.bias", "blocks_st.3.attn_s.qkv.weight", "blocks_st.3.attn_s.qkv.bias", "blocks_st.3.attn_t.proj.weight", "blocks_st.3.attn_t.proj.bias", "blocks_st.3.attn_t.qkv.weight", "blocks_st.3.attn_t.qkv.bias", "blocks_st.3.norm2_s.weight", "blocks_st.3.norm2_s.bias", "blocks_st.3.norm2_t.weight", "blocks_st.3.norm2_t.bias", "blocks_st.3.mlp_s.fc1.weight", "blocks_st.3.mlp_s.fc1.bias", "blocks_st.3.mlp_s.fc2.weight", "blocks_st.3.mlp_s.fc2.bias", "blocks_st.3.mlp_t.fc1.weight", "blocks_st.3.mlp_t.fc1.bias", "blocks_st.3.mlp_t.fc2.weight", "blocks_st.3.mlp_t.fc2.bias", "blocks_st.4.norm1_s.weight", "blocks_st.4.norm1_s.bias", "blocks_st.4.norm1_t.weight", "blocks_st.4.norm1_t.bias", "blocks_st.4.attn_s.proj.weight", "blocks_st.4.attn_s.proj.bias", "blocks_st.4.attn_s.qkv.weight", "blocks_st.4.attn_s.qkv.bias", "blocks_st.4.attn_t.proj.weight", "blocks_st.4.attn_t.proj.bias", "blocks_st.4.attn_t.qkv.weight", "blocks_st.4.attn_t.qkv.bias", "blocks_st.4.norm2_s.weight", "blocks_st.4.norm2_s.bias", "blocks_st.4.norm2_t.weight", "blocks_st.4.norm2_t.bias", "blocks_st.4.mlp_s.fc1.weight", "blocks_st.4.mlp_s.fc1.bias", "blocks_st.4.mlp_s.fc2.weight", "blocks_st.4.mlp_s.fc2.bias", "blocks_st.4.mlp_t.fc1.weight", "blocks_st.4.mlp_t.fc1.bias", "blocks_st.4.mlp_t.fc2.weight", "blocks_st.4.mlp_t.fc2.bias", "blocks_ts.0.norm1_s.weight", "blocks_ts.0.norm1_s.bias", "blocks_ts.0.norm1_t.weight", "blocks_ts.0.norm1_t.bias", "blocks_ts.0.attn_s.proj.weight", "blocks_ts.0.attn_s.proj.bias", "blocks_ts.0.attn_s.qkv.weight", "blocks_ts.0.attn_s.qkv.bias", "blocks_ts.0.attn_t.proj.weight", "blocks_ts.0.attn_t.proj.bias", "blocks_ts.0.attn_t.qkv.weight", "blocks_ts.0.attn_t.qkv.bias", "blocks_ts.0.norm2_s.weight", "blocks_ts.0.norm2_s.bias", "blocks_ts.0.norm2_t.weight", "blocks_ts.0.norm2_t.bias", "blocks_ts.0.mlp_s.fc1.weight", "blocks_ts.0.mlp_s.fc1.bias", "blocks_ts.0.mlp_s.fc2.weight", "blocks_ts.0.mlp_s.fc2.bias", "blocks_ts.0.mlp_t.fc1.weight", "blocks_ts.0.mlp_t.fc1.bias", "blocks_ts.0.mlp_t.fc2.weight", "blocks_ts.0.mlp_t.fc2.bias", "blocks_ts.1.norm1_s.weight", "blocks_ts.1.norm1_s.bias", "blocks_ts.1.norm1_t.weight", "blocks_ts.1.norm1_t.bias", "blocks_ts.1.attn_s.proj.weight", "blocks_ts.1.attn_s.proj.bias", "blocks_ts.1.attn_s.qkv.weight", "blocks_ts.1.attn_s.qkv.bias", "blocks_ts.1.attn_t.proj.weight", "blocks_ts.1.attn_t.proj.bias", "blocks_ts.1.attn_t.qkv.weight", "blocks_ts.1.attn_t.qkv.bias", "blocks_ts.1.norm2_s.weight", "blocks_ts.1.norm2_s.bias", "blocks_ts.1.norm2_t.weight", "blocks_ts.1.norm2_t.bias", "blocks_ts.1.mlp_s.fc1.weight", "blocks_ts.1.mlp_s.fc1.bias", "blocks_ts.1.mlp_s.fc2.weight", "blocks_ts.1.mlp_s.fc2.bias", "blocks_ts.1.mlp_t.fc1.weight", "blocks_ts.1.mlp_t.fc1.bias", "blocks_ts.1.mlp_t.fc2.weight", "blocks_ts.1.mlp_t.fc2.bias", "blocks_ts.2.norm1_s.weight", "blocks_ts.2.norm1_s.bias", "blocks_ts.2.norm1_t.weight", "blocks_ts.2.norm1_t.bias", "blocks_ts.2.attn_s.proj.weight", "blocks_ts.2.attn_s.proj.bias", "blocks_ts.2.attn_s.qkv.weight", "blocks_ts.2.attn_s.qkv.bias", "blocks_ts.2.attn_t.proj.weight", "blocks_ts.2.attn_t.proj.bias", "blocks_ts.2.attn_t.qkv.weight", "blocks_ts.2.attn_t.qkv.bias", "blocks_ts.2.norm2_s.weight", "blocks_ts.2.norm2_s.bias", "blocks_ts.2.norm2_t.weight", "blocks_ts.2.norm2_t.bias", "blocks_ts.2.mlp_s.fc1.weight", "blocks_ts.2.mlp_s.fc1.bias", "blocks_ts.2.mlp_s.fc2.weight", "blocks_ts.2.mlp_s.fc2.bias", "blocks_ts.2.mlp_t.fc1.weight", "blocks_ts.2.mlp_t.fc1.bias", "blocks_ts.2.mlp_t.fc2.weight", "blocks_ts.2.mlp_t.fc2.bias", "blocks_ts.3.norm1_s.weight", "blocks_ts.3.norm1_s.bias", "blocks_ts.3.norm1_t.weight", "blocks_ts.3.norm1_t.bias", "blocks_ts.3.attn_s.proj.weight", "blocks_ts.3.attn_s.proj.bias", "blocks_ts.3.attn_s.qkv.weight", "blocks_ts.3.attn_s.qkv.bias", "blocks_ts.3.attn_t.proj.weight", "blocks_ts.3.attn_t.proj.bias", "blocks_ts.3.attn_t.qkv.weight", "blocks_ts.3.attn_t.qkv.bias", "blocks_ts.3.norm2_s.weight", "blocks_ts.3.norm2_s.bias", "blocks_ts.3.norm2_t.weight", "blocks_ts.3.norm2_t.bias", "blocks_ts.3.mlp_s.fc1.weight", "blocks_ts.3.mlp_s.fc1.bias", "blocks_ts.3.mlp_s.fc2.weight", "blocks_ts.3.mlp_s.fc2.bias", "blocks_ts.3.mlp_t.fc1.weight", "blocks_ts.3.mlp_t.fc1.bias", "blocks_ts.3.mlp_t.fc2.weight", "blocks_ts.3.mlp_t.fc2.bias", "blocks_ts.4.norm1_s.weight", "blocks_ts.4.norm1_s.bias", "blocks_ts.4.norm1_t.weight", "blocks_ts.4.norm1_t.bias", "blocks_ts.4.attn_s.proj.weight", "blocks_ts.4.attn_s.proj.bias", "blocks_ts.4.attn_s.qkv.weight", "blocks_ts.4.attn_s.qkv.bias", "blocks_ts.4.attn_t.proj.weight", "blocks_ts.4.attn_t.proj.bias", "blocks_ts.4.attn_t.qkv.weight", "blocks_ts.4.attn_t.qkv.bias", "blocks_ts.4.norm2_s.weight", "blocks_ts.4.norm2_s.bias", "blocks_ts.4.norm2_t.weight", "blocks_ts.4.norm2_t.bias", "blocks_ts.4.mlp_s.fc1.weight", "blocks_ts.4.mlp_s.fc1.bias", "blocks_ts.4.mlp_s.fc2.weight", "blocks_ts.4.mlp_s.fc2.bias", "blocks_ts.4.mlp_t.fc1.weight", "blocks_ts.4.mlp_t.fc1.bias", "blocks_ts.4.mlp_t.fc2.weight", "blocks_ts.4.mlp_t.fc2.bias", "norm.weight", "norm.bias", "pre_logits.fc.weight", "pre_logits.fc.bias", "head.weight", "head.bias", "ts_attn.0.weight", "ts_attn.0.bias", "ts_attn.1.weight", "ts_attn.1.bias", "ts_attn.2.weight", "ts_attn.2.bias", "ts_attn.3.weight", "ts_attn.3.bias", "ts_attn.4.weight", "ts_attn.4.bias".
Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", "module.joints_embed.weight", "module.joints_embed.bias", "module.blocks_st.0.norm1_s.weight", "module.blocks_st.0.norm1_s.bias", "module.blocks_st.0.norm1_t.weight", "module.blocks_st.0.norm1_t.bias", "module.blocks_st.0.attn_s.proj.weight", "module.blocks_st.0.attn_s.proj.bias", "module.blocks_st.0.attn_s.qkv.weight", "module.blocks_st.0.attn_s.qkv.bias", "module.blocks_st.0.attn_t.proj.weight", "module.blocks_st.0.attn_t.proj.bias", "module.blocks_st.0.attn_t.qkv.weight", "module.blocks_st.0.attn_t.qkv.bias", "module.blocks_st.0.norm2_s.weight", "module.blocks_st.0.norm2_s.bias", "module.blocks_st.0.norm2_t.weight", "module.blocks_st.0.norm2_t.bias", "module.blocks_st.0.mlp_s.fc1.weight", "module.blocks_st.0.mlp_s.fc1.bias", "module.blocks_st.0.mlp_s.fc2.weight", "module.blocks_st.0.mlp_s.fc2.bias", "module.blocks_st.0.mlp_t.fc1.weight", "module.blocks_st.0.mlp_t.fc1.bias", "module.blocks_st.0.mlp_t.fc2.weight", "module.blocks_st.0.mlp_t.fc2.bias", "module.blocks_st.1.norm1_s.weight", "module.blocks_st.1.norm1_s.bias", "module.blocks_st.1.norm1_t.weight", "module.blocks_st.1.norm1_t.bias", "module.blocks_st.1.attn_s.proj.weight", "module.blocks_st.1.attn_s.proj.bias", "module.blocks_st.1.attn_s.qkv.weight", "module.blocks_st.1.attn_s.qkv.bias", "module.blocks_st.1.attn_t.proj.weight", "module.blocks_st.1.attn_t.proj.bias", "module.blocks_st.1.attn_t.qkv.weight", "module.blocks_st.1.attn_t.qkv.bias", "module.blocks_st.1.norm2_s.weight", "module.blocks_st.1.norm2_s.bias", "module.blocks_st.1.norm2_t.weight", "module.blocks_st.1.norm2_t.bias", "module.blocks_st.1.mlp_s.fc1.weight", "module.blocks_st.1.mlp_s.fc1.bias", "module.blocks_st.1.mlp_s.fc2.weight", "module.blocks_st.1.mlp_s.fc2.bias", "module.blocks_st.1.mlp_t.fc1.weight", "module.blocks_st.1.mlp_t.fc1.bias", "module.blocks_st.1.mlp_t.fc2.weight", "module.blocks_st.1.mlp_t.fc2.bias", "module.blocks_st.2.norm1_s.weight", "module.blocks_st.2.norm1_s.bias", "module.blocks_st.2.norm1_t.weight", "module.blocks_st.2.norm1_t.bias", "module.blocks_st.2.attn_s.proj.weight", "module.blocks_st.2.attn_s.proj.bias", "module.blocks_st.2.attn_s.qkv.weight", "module.blocks_st.2.attn_s.qkv.bias", "module.blocks_st.2.attn_t.proj.weight", "module.blocks_st.2.attn_t.proj.bias", "module.blocks_st.2.attn_t.qkv.weight", "module.blocks_st.2.attn_t.qkv.bias", "module.blocks_st.2.norm2_s.weight", "module.blocks_st.2.norm2_s.bias", "module.blocks_st.2.norm2_t.weight", "module.blocks_st.2.norm2_t.bias", "module.blocks_st.2.mlp_s.fc1.weight", "module.blocks_st.2.mlp_s.fc1.bias", "module.blocks_st.2.mlp_s.fc2.weight", "module.blocks_st.2.mlp_s.fc2.bias", "module.blocks_st.2.mlp_t.fc1.weight", "module.blocks_st.2.mlp_t.fc1.bias", "module.blocks_st.2.mlp_t.fc2.weight", "module.blocks_st.2.mlp_t.fc2.bias", "module.blocks_st.3.norm1_s.weight", "module.blocks_st.3.norm1_s.bias", "module.blocks_st.3.norm1_t.weight", "module.blocks_st.3.norm1_t.bias", "module.blocks_st.3.attn_s.proj.weight", "module.blocks_st.3.attn_s.proj.bias", "module.blocks_st.3.attn_s.qkv.weight", "module.blocks_st.3.attn_s.qkv.bias", "module.blocks_st.3.attn_t.proj.weight", "module.blocks_st.3.attn_t.proj.bias", "module.blocks_st.3.attn_t.qkv.weight", "module.blocks_st.3.attn_t.qkv.bias", "module.blocks_st.3.norm2_s.weight", "module.blocks_st.3.norm2_s.bias", "module.blocks_st.3.norm2_t.weight", "module.blocks_st.3.norm2_t.bias", "module.blocks_st.3.mlp_s.fc1.weight", "module.blocks_st.3.mlp_s.fc1.bias", "module.blocks_st.3.mlp_s.fc2.weight", "module.blocks_st.3.mlp_s.fc2.bias", "module.blocks_st.3.mlp_t.fc1.weight", "module.blocks_st.3.mlp_t.fc1.bias", "module.blocks_st.3.mlp_t.fc2.weight", "module.blocks_st.3.mlp_t.fc2.bias", "module.blocks_st.4.norm1_s.weight", "module.blocks_st.4.norm1_s.bias", "module.blocks_st.4.norm1_t.weight", "module.blocks_st.4.norm1_t.bias", "module.blocks_st.4.attn_s.proj.weight", "module.blocks_st.4.attn_s.proj.bias", "module.blocks_st.4.attn_s.qkv.weight", "module.blocks_st.4.attn_s.qkv.bias", "module.blocks_st.4.attn_t.proj.weight", "module.blocks_st.4.attn_t.proj.bias", "module.blocks_st.4.attn_t.qkv.weight", "module.blocks_st.4.attn_t.qkv.bias", "module.blocks_st.4.norm2_s.weight", "module.blocks_st.4.norm2_s.bias", "module.blocks_st.4.norm2_t.weight", "module.blocks_st.4.norm2_t.bias", "module.blocks_st.4.mlp_s.fc1.weight", "module.blocks_st.4.mlp_s.fc1.bias", "module.blocks_st.4.mlp_s.fc2.weight", "module.blocks_st.4.mlp_s.fc2.bias", "module.blocks_st.4.mlp_t.fc1.weight", "module.blocks_st.4.mlp_t.fc1.bias", "module.blocks_st.4.mlp_t.fc2.weight", "module.blocks_st.4.mlp_t.fc2.bias", "module.blocks_ts.0.norm1_s.weight", "module.blocks_ts.0.norm1_s.bias", "module.blocks_ts.0.norm1_t.weight", "module.blocks_ts.0.norm1_t.bias", "module.blocks_ts.0.attn_s.proj.weight", "module.blocks_ts.0.attn_s.proj.bias", "module.blocks_ts.0.attn_s.qkv.weight", "module.blocks_ts.0.attn_s.qkv.bias", "module.blocks_ts.0.attn_t.proj.weight", "module.blocks_ts.0.attn_t.proj.bias", "module.blocks_ts.0.attn_t.qkv.weight", "module.blocks_ts.0.attn_t.qkv.bias", "module.blocks_ts.0.norm2_s.weight", "module.blocks_ts.0.norm2_s.bias", "module.blocks_ts.0.norm2_t.weight", "module.blocks_ts.0.norm2_t.bias", "module.blocks_ts.0.mlp_s.fc1.weight", "module.blocks_ts.0.mlp_s.fc1.bias", "module.blocks_ts.0.mlp_s.fc2.weight", "module.blocks_ts.0.mlp_s.fc2.bias", "module.blocks_ts.0.mlp_t.fc1.weight", "module.blocks_ts.0.mlp_t.fc1.bias", "module.blocks_ts.0.mlp_t.fc2.weight", "module.blocks_ts.0.mlp_t.fc2.bias", "module.blocks_ts.1.norm1_s.weight", "module.blocks_ts.1.norm1_s.bias", "module.blocks_ts.1.norm1_t.weight", "module.blocks_ts.1.norm1_t.bias", "module.blocks_ts.1.attn_s.proj.weight", "module.blocks_ts.1.attn_s.proj.bias", "module.blocks_ts.1.attn_s.qkv.weight", "module.blocks_ts.1.attn_s.qkv.bias", "module.blocks_ts.1.attn_t.proj.weight", "module.blocks_ts.1.attn_t.proj.bias", "module.blocks_ts.1.attn_t.qkv.weight", "module.blocks_ts.1.attn_t.qkv.bias", "module.blocks_ts.1.norm2_s.weight", "module.blocks_ts.1.norm2_s.bias", "module.blocks_ts.1.norm2_t.weight", "module.blocks_ts.1.norm2_t.bias", "module.blocks_ts.1.mlp_s.fc1.weight", "module.blocks_ts.1.mlp_s.fc1.bias", "module.blocks_ts.1.mlp_s.fc2.weight", "module.blocks_ts.1.mlp_s.fc2.bias", "module.blocks_ts.1.mlp_t.fc1.weight", "module.blocks_ts.1.mlp_t.fc1.bias", "module.blocks_ts.1.mlp_t.fc2.weight", "module.blocks_ts.1.mlp_t.fc2.bias", "module.blocks_ts.2.norm1_s.weight", "module.blocks_ts.2.norm1_s.bias", "module.blocks_ts.2.norm1_t.weight", "module.blocks_ts.2.norm1_t.bias", "module.blocks_ts.2.attn_s.proj.weight", "module.blocks_ts.2.attn_s.proj.bias", "module.blocks_ts.2.attn_s.qkv.weight", "module.blocks_ts.2.attn_s.qkv.bias", "module.blocks_ts.2.attn_t.proj.weight", "module.blocks_ts.2.attn_t.proj.bias", "module.blocks_ts.2.attn_t.qkv.weight", "module.blocks_ts.2.attn_t.qkv.bias", "module.blocks_ts.2.norm2_s.weight", "module.blocks_ts.2.norm2_s.bias", "module.blocks_ts.2.norm2_t.weight", "module.blocks_ts.2.norm2_t.bias", "module.blocks_ts.2.mlp_s.fc1.weight", "module.blocks_ts.2.mlp_s.fc1.bias", "module.blocks_ts.2.mlp_s.fc2.weight", "module.blocks_ts.2.mlp_s.fc2.bias", "module.blocks_ts.2.mlp_t.fc1.weight", "module.blocks_ts.2.mlp_t.fc1.bias", "module.blocks_ts.2.mlp_t.fc2.weight", "module.blocks_ts.2.mlp_t.fc2.bias", "module.blocks_ts.3.norm1_s.weight", "module.blocks_ts.3.norm1_s.bias", "module.blocks_ts.3.norm1_t.weight", "module.blocks_ts.3.norm1_t.bias", "module.blocks_ts.3.attn_s.proj.weight", "module.blocks_ts.3.attn_s.proj.bias", "module.blocks_ts.3.attn_s.qkv.weight", "module.blocks_ts.3.attn_s.qkv.bias", "module.blocks_ts.3.attn_t.proj.weight", "module.blocks_ts.3.attn_t.proj.bias", "module.blocks_ts.3.attn_t.qkv.weight", "module.blocks_ts.3.attn_t.qkv.bias", "module.blocks_ts.3.norm2_s.weight", "module.blocks_ts.3.norm2_s.bias", "module.blocks_ts.3.norm2_t.weight", "module.blocks_ts.3.norm2_t.bias", "module.blocks_ts.3.mlp_s.fc1.weight", "module.blocks_ts.3.mlp_s.fc1.bias", "module.blocks_ts.3.mlp_s.fc2.weight", "module.blocks_ts.3.mlp_s.fc2.bias", "module.blocks_ts.3.mlp_t.fc1.weight", "module.blocks_ts.3.mlp_t.fc1.bias", "module.blocks_ts.3.mlp_t.fc2.weight", "module.blocks_ts.3.mlp_t.fc2.bias", "module.blocks_ts.4.norm1_s.weight", "module.blocks_ts.4.norm1_s.bias", "module.blocks_ts.4.norm1_t.weight", "module.blocks_ts.4.norm1_t.bias", "module.blocks_ts.4.attn_s.proj.weight", "module.blocks_ts.4.attn_s.proj.bias", "module.blocks_ts.4.attn_s.qkv.weight", "module.blocks_ts.4.attn_s.qkv.bias", "module.blocks_ts.4.attn_t.proj.weight", "module.blocks_ts.4.attn_t.proj.bias", "module.blocks_ts.4.attn_t.qkv.weight", "module.blocks_ts.4.attn_t.qkv.bias", "module.blocks_ts.4.norm2_s.weight", "module.blocks_ts.4.norm2_s.bias", "module.blocks_ts.4.norm2_t.weight", "module.blocks_ts.4.norm2_t.bias", "module.blocks_ts.4.mlp_s.fc1.weight", "module.blocks_ts.4.mlp_s.fc1.bias", "module.blocks_ts.4.mlp_s.fc2.weight", "module.blocks_ts.4.mlp_s.fc2.bias", "module.blocks_ts.4.mlp_t.fc1.weight", "module.blocks_ts.4.mlp_t.fc1.bias", "module.blocks_ts.4.mlp_t.fc2.weight", "module.blocks_ts.4.mlp_t.fc2.bias", "module.norm.weight", "module.norm.bias", "module.pre_logits.fc.weight", "module.pre_logits.fc.bias", "module.head.weight", "module.head.bias", "module.ts_attn.0.weight", "module.ts_attn.0.bias", "module.ts_attn.1.weight", "module.ts_attn.1.bias", "module.ts_attn.2.weight", "module.ts_attn.2.bias", "module.ts_attn.3.weight", "module.ts_attn.3.bias", "module.ts_attn.4.weight", "module.ts_attn.4.bias".
I guess there is a mismatch between the checkpoint and the configuration file (and maybe code?). I am sure that I downloaded the checkpoint from the link in the inference.MD
. Could you please double-check?
Note: I tried to load the checkpoint with all other configuration files in /configs/pose3d
, none worked
import os
import argparse
import torch
import torch.nn as nn
import os, sys
sys.path.append(os.getcwd())
from lib.utils.tools import *
from lib.utils.learning import *
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--config", type=str, default="configs/pose3d/MB_ft_h36m_global_lite.yaml",
help="Path to the config file.")
parser.add_argument('-e', '--evaluate', default='checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin',
type=str, metavar='FILENAME', help='checkpoint to evaluate (file name)')
# parser.add_argument('-j', '--json_path', type=str, help='alphapose detection result json path')
# parser.add_argument('-v', '--vid_path', type=str, help='video path')
parser.add_argument('-o', '--out_path', type=str, help='output path')
parser.add_argument('--pixel', action='store_true', help='align with pixle coordinates')
parser.add_argument('--focus', type=int, default=None, help='target person id')
parser.add_argument('--clip_len', type=int, default=243, help='clip length for network input')
opts = parser.parse_args()
return opts
opts = parse_args()
args = get_config(opts.config)
model_backbone = load_backbone(args)
print('Loading checkpoint', opts.evaluate)
checkpoint = torch.load(opts.evaluate, map_location="cpu")
model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)
Hello and thanks for sharing your code.
May I please ask about the structure of the .json file needed for the In-the-wild Inference for 3D pose estimation? I want to use 2d estimations of another network other than AlphaPose and was not sure how to structure my 2d poses so it's compatible with your code.
Thanks in advance for your help.
Hi, thanks for the release, looks very cool!
Can you please give me a hint on how I can utilize your pre-trained model to inference on my own video?
Given a video, do I need to run 2D pose estimation first before I can use MotionBERT? Or do you already provide that?
How should I generate 3D points on my video?
How can I get the motion embedding? I tried:
E = MotionBERT.get_representation(x)
but get_representation
does not exist!
Thank you!
If you could just give me high level hints I would appreciate it! Thanks!
Hi,
I have questions about the dimensions of the predicted poses both in inference and evaluation code.
I noticed that the predictions of the network in the evaluation function in train.py are being multiplied by a factor and I traced it back to data['test']['2.5d_factor']
in h36m_sh_conf_cam_source_final.pkl
. Could you please help me understand how these factors are being calculated?
Does this mean that the outputs of the network are not expected to have the correct scale of a human (in meters) and only the relative pose is the goal? especially in inference, I noticed that when running the inference code if I plot the outputs I notice a change in the dimensions of the person (even when applying that I guess comes from this, even when using the MB_ft_h36m
model with rootrel set to True.
In general, it would be really appreciated if you could help me understand the scale of the output and how I can convert it to meters.
Thanks in advance for your help.
Thank you for your great work. I would like to ask about the loss function mentioned in the paper and the part about speed loss. What is the meaning of adding speed loss.
Hello, may I ask, I deduce that my own video speed is very slow, 10s video takes more than ten minutes, and there are some warning messages, may I ask why?
MAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (923, 924) to (928, 928) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisib
le by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
0% | ▋ | 1/296 [00:00 & lt; 04:16, 1.15 it/s] [
swscaler @ 000001df1adc4300] Warning: data is not aligned! This can lead to a speed loss
My testloader Settings are as follows: testloader_params = {
'batch_size': 1,
'shuffle': False,
'num_workers': 0,
'pin_memory': True,
'prefetch_factor': 2,
'persistent_workers': False,
'drop_last': False
}
Hello, thank you for your work.
In my opinion, it is inappropriate to compare methods (a) and (f). From what I understand, method (a) contains one S-T block in one module whereas method (f) contains S-T, T-S blocks.
That is, I think method (f) has twice as many parameters as method (a). So I think it would be more appropriate to compare by setting the depth to 10 in method (a).
thank you!
Hi, I got some real-time 3d pose result and visualize in open3d, it looks good:
However, I am wondering how to mapping the hips cooridinates to realworld, I am currently +0.65 for the z axis, but not aligned well, looks like it should be some value in normalized height hips to height. Do u know what exactly value it is?
请问这是为什么呀
The link of pyskl in the action.md is broken. Is it possible to find any new link for this, or some similar reference?
Hello, how do I generate the motion_all. npy and id_all. npy from the training dataset InstaVariety?
Not found on the page https://github.com/Walter0807/MotionBERT/blob/main/docs/pretrain.md thank you!
Hey thanks for this wonderful work, the performance of 2D-3D recontruction is just eye-openning. I am just wondering whether the action recognition inference code for custom video is released yet, I can only find the evaluation code for action recognition which is meant for NTU-RGBD dataset.
Thank you for your great work. I have a question for you as follows. I see that there are three training sections in the doc folder, which are pretrain, scratch and finetune. Is there any connection between these three? If I focus only on 2D key points to 3D key points, which one should I focus on? Thank you very much and look forward to your answer.
Hello, this is really an impressive work, clean and fantastic. however, I got some questions wanna ask for help:
Will you provide the code of mesh recovery using HybrIK (report in Table 3)? I’d appreciate it if you can release the code related to this part.
Hello, thanks to your wonderful work, i recently try to use motionbert, but it seems like it can only output some information like MPJPE. just wonder if i want to demo real time video pose estimation just like your animation in the cover, how shoulld i do? thanks you.
Hi,
Thank you for your great work! How long does the pretraining take on 8 V100 machines? Thanks!
even though the model is not big, but the speed is quite slow, about 1s per frame, is that normal?
Hello, I notice that the input can be with or without conf, but didn't saw any ablation on this part, if it uses conf, then it highly couple with the pose model itself (some models might didn't produce relatively high scores), does there any add conf or not the final metrices accordingly?
Hi, the documentation says to use the H36M keypoint format or the Halpe 26 keypoints. Since these two formats differ and I’m trying to use YOLOv7 to extract the 2D poses, which keypoints and ordering does MotionBERT expect? Is there an example json available? Thank you😊
hello, does the MAED mesh head included in code mentioned in paper?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.