lisiyao21 / bailando Goto Github PK
View Code? Open in Web Editor NEWCode for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"
License: Other
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"
License: Other
Thank you for your perfect work. I try to use the Choreographic for music in the wild. But when I run the command, I didn't know how to set the node name. Can anyone else help me?
Thank you very much.
Great work!
When will the code be released?
download all original music pieces (wav) into './aist_plusplus_final/all_musics'.
I can't access, could you help me with it
What type of file is inside? How to generate it?
What does 'your node name' mean in your training and testing methods? I didn't understand it. Could you provide an example?
Thank you for your work!
I follow the steps of Choreographic for music in the wild.
I get the output videos of the model.
Problem:
But I find the person in the videos would disappear sometimes.And there is no sound in the video.
Q1 : Is there a way to solve the problem?
Q2: Will this kind of disappearance affect the location of 3d keypoints?
In
Bailando/extract_aist_features.py
Line 5 in 125b529
hey, I face this problem using the first step command
python -u main.py --config configs/sep_vqvae.yaml --train
THE OUTPUT IS
using SepVQVAE We use bottleneck! No motion regularization! We use bottleneck! No motion regularization! train with AIST++ dataset! test with AIST++ dataset! {'structure': {'name': 'SepVQVAE', 'up_half': {'levels': 1, 'downs_t': [3], 'strides_t': [2], 'emb_width': 512, 'l_bins': 512, 'l_mu': 0.99, 'commit': 0.02, 'hvqvae_multipliers ': [1], 'width': 512, 'depth': 3, 'm_conv': 1.0, 'dilation_growth_rate': 3, 'sample_length': 240, 'use_bottleneck': True, 'joint_channel': 3, 'vel': 1, 'acc': 1, 'vqvae_reverse _decoder_dilation': True, 'dilation_cycle': None}, 'down_half': {'levels': 1, 'downs_t': [3], 'strides_t': [2], 'emb_width': 512, 'l_bins': 512, 'l_mu': 0.99, 'commit': 0.02, ' hvqvae_multipliers': [1], 'width': 512, 'depth': 3, 'm_conv': 1.0, 'dilation_growth_rate': 3, 'sample_length': 240, 'use_bottleneck': True, 'joint_channel': 3, 'vel': 1, 'acc': 1, 'vqvae_reverse_decoder_dilation': True, 'dilation_cycle': None}, 'use_bottleneck': True, 'joint_channel': 3, 'l_bins': 512}, 'loss_weight': {'mse_weight': 1}, 'optimizer': {'type': 'Adam', 'kwargs': {'lr': 3e-05, 'betas': [0.5, 0.999], 'weight_decay': 0}, 'schedular_kwargs': {'milestones': [100, 200], 'gamma': 0.1}}, 'data': {'name': 'aist', 'tra in_dir': 'data/aistpp_train_wav', 'test_dir': 'data/aistpp_test_full_wav', 'seq_len': 240, 'data_type': 'None'}, 'testing': {'height': 540, 'width': 960, 'ckpt_epoch': 500}, 'e xpname': 'sep_vqvae', 'epoch': 500, 'batch_size': 32, 'save_per_epochs': 20, 'test_freq': 20, 'log_per_updates': 1, 'seed': 42, 'rotmat': False, 'cuda': True, 'global_vel': Fal se, 'ds_rate': 8, 'move_train': 40, 'sample_code_length': 150, 'sample_code_rate': 16, 'analysis_sequence': [[126, 81]], 'config': 'configs/sep_vqvae.yaml', 'train': True, 'eva l': False, 'visgt': False, 'anl': False, 'sample': False} 07/03/2022 03:06:44 Epoch: 1 Traceback (most recent call last): File "main.py", line 56, in <module> main() File "main.py", line 40, in main agent.train() File "/home/fuyang/project/Bailando/motion_vqvae.py", line 107, in train 'loss': loss.item(), **ValueError: only one element tensors can be converted to Python scalars**
My environment is
pytorch 1.11.0+cu102
8xGPU NVIDIA TITAN Xp(12196MiB)
Hi, thanks for sharing your code!
I download the data.zip that you released. However, I got an error here. I check the data and I found that aistpp_music_feat_7.5fps/mJB4.json is empty. Is it a mistake?
Hi, I have an error when run _prepro_aistpp_music.py. It seems that the size of feature is not correct.How to solve this?Thanks! @lisiyao21
Thanks for sharing your wonderful work!
I wonder how you get the specific number, 15360 * 2, as the sampling rate for 60 FPS. Can you elaborate how specific rate is obtained through calculation?
Another concern is with beats extractions using librosa, in _prepro_aistpp_music.py, I found that onset_env, onset_beat, tempogram are all-0s. Is this correct?
Great work!!
Follow Readme , it output the video.
But the video that generated from actor_critic model is 2D .
How to drive a 3D character?
Your pre-processed data lost gPO_sBM_cAll_d10_mPO0_ch06, which made me confused.
I noticed that you mentioned the supplementary file many times, including the detailed derivation of L_AC, the structures of the convolutional encoder and decoders, and so on...
Where can I find it? I have searched on your website but not found it.
您好!
在Choreographic for music in the wild这个步骤中,我运行了强化学习代码,这个train指令。
在 ./experiments/actor_critic_for_demo/vis/videos 文件夹中仅存在每5轮进行一次testing的文件夹,但是这个文件夹点开以后是空的,这是为什么呢?我要继续怎么做
在 .\experiments\actor_critic_for_demo_paper\vis\imgs中倒是生成了图片,但是最终并未合成视频。
难道是运行完强化学习的train指令后,在运行eval指令,然后生成吗?
我是小白,不太了解这个方面的知识
I want to generate character animation, how can I do?
Hi @lisiyao21
Thank you for release code!
where is the definition of "from .utils.torch_utils import assert_shape"
Line 4 in 27fe2b6
According to the code you provided, I can only get the dancing stickman (skeleton), how to get the dancing character animation. Is there any relevant code or tutorial about Mixamo?
Thanks a lot.
Hello, thanks for your great paper and code!
When reading your evaluation code, I noticed maybe a mistake in utils/features/utils.py line 129.
Is the correct code:
v1 = (
positions[i + j][joint_idx]
- positions[i + j - 1][joint_idx]
) / frame_time
def calc_average_acceleration(
positions, i, joint_idx, sliding_window, frame_time
):
current_window = 0
average_acceleration = np.zeros(len(positions[0][joint_idx]))
for j in range(-sliding_window, sliding_window + 1):
if i + j - 1 < 0 or i + j + 1 >= len(positions):
continue
v2 = (
positions[i + j + 1][joint_idx] - positions[i + j][joint_idx]
) / frame_time
v1 = (
positions[i + j][joint_idx]
- positions[i + j - 1][joint_idx] / frame_time
)
average_acceleration += (v2 - v1) / frame_time
current_window += 1
return np.linalg.norm(average_acceleration / current_window)
You have completed a very good model!
I also achieved very good results when I was working on your model. But there are still some questions that are not very clear. Are you experiencing gradient explosion when implementing the Actor-Critic Learning module? My model still converged at the first epoch, and it did have some improvement compared to GPT. However, during the subsequent iterations, L_AC increased significantly and could not continue to converge. And the visualization results also became very strange.
Looking forward for your reply!
ailando/models/vqvae_root.py", line 7, in
from .utils.logger import average_metrics
ModuleNotFoundError: No module named 'models.utils.logger'
Bailando/utils/functional.py", line 130, in img2video
music_names = sorted(os.listdir(audio_dir))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'
I want to compare the generated results with the ground truth.
It seems that the code also support visualize ground truth by passing the visgt parameter.
However when I call the script using the visgt parameter, it seems like that the code is not fully implemented:
the last two parameters of visualizeAndWrite function is not set correctly.
How should I set these two parameters (especially the last quants para) to make the function execute correctly?
Hi,thank you for your work again!
It really inspires me and bring me interest in deep learning!
amazing job!
Problem:
I found the generated music dance videos in the same style which may not coordinated with my music(‘青花瓷-jay_chou’).
I supposed it may be caused by the starting pose code.But I can not find how to choose and where to set it.
Q1: Is there a way to change the ‘Starting pose codes’ which mentioned in your paper?
Q2: How to choose the starting pose codes? Is there a table I can find explicitly mapping the starting pose codes to dance style
Thank you again!
Aleeyanger
here (坚果云) : 没有坚果云下载链接
Hi authors,
Thank you for your fantastic work!
I have a small question: In Table 1, FID_k, FID_g of groundtruth are reported. I am a little bit confused with this. Do they mean to compute FID_k and FID_g between the two same sets of groundtruth data? In other words, why FID_k and FID_g of the groundtruth are not 0?
Thank you,
Best
Hi @lisiyao21 ,
Thank you very much for your valuable code!
I set "rotmat: True", Use SMPL, but get the error as follow:
Hi Siyao,
I'd like to run the code by myself, if I don't use slurm manager, could you tell me the pure command to run the code?
Hi there,
When I run the second step as your instructions, I met the "out of memory" problem.
I tried to debug the problem and found it is because the music_data is float64 and the memory is consumed rapidly when converting the list music_data to music_np. (in File "utils/functional.py").
Have you ever met the same problem like me?
Is it possible to use float32 for training data(music_np) without decreasing the performance of the final model?
BTW: There are 120G of memory in my computer.
Hi,
There is no link to the processed data, since you said that "Otherwise, directly download our preprocessed feature from here as ./data folder if you don't wish to process the data."
Can you add the data link? Thanks!
Hello, Im fairly new to working with github repositories and such. But I want to use the pretrained model of the bailandopp. The instruction are pretty daunting. Is there anyone who can provide thorough instruction on how to set it up to run on Google Collab. Or can get a instruction set?
Hi, siyao! I run the command python extract_aist_features.py
to extract the (kinetic & manual) features of all AIST++ motions. However, I met with a warning:
WARNING: You are using a SMPL model, with only 10 shape coefficients.
Do you know the reason?
Hi, Siyao~
Thanks for releasing and cleaning the code!!
May I ask why in the pre-processing part, the audio (music) features are extracted twice and with different sampling rates?
Precisely, in _prepro_aistpp.py, the audio features are extracted with the sampling rate 15360*2
While in _prepro_aistpp_music.py, the audio features are extracted with the sampling rate 15360*2/8
Where is the gt_root file?
I meet error at Step 1 by running python -u main.py --config configs/sep_vqvae.yaml --train
Traceback (most recent call last):
File "main.py", line 56, in <module>
main()
File "main.py", line 40, in main
agent.train()
File "/share/yanzhen/Bailando/motion_vqvae.py", line 94, in train
loss.backward()
File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 166, in backward
grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 67, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs
After print the loss, it looks like tensor([0.2667, 0.2735, 0.2687, 0.2584, 0.2701, 0.2697, 0.2571, 0.2658], device='cuda:0', grad_fn=<GatherBackward>)
, so do I need to take a mean or sum operation?
However, even if I take a mean operation, the training still seems problematic. The loss decreases normally, while in eval stage, the output quants are all zero. Any suggestion?
The training log is attached for reference.
For me, a beginner of DL,
sh srun.sh configs/sep_vqvae.yaml train [your node name] 1
what dose the '[your node name]' mean?
can you give me a more specific command?
Thank you a lot!
Thanks you for sharing the code.
thank you very much
Hi siyao,
Thanks for your great work. I have a question.
When I train in step 3 (Train motion GPT),an error occurs---"AttributeError: 'MCTall' object has no attribute 'training_data' ".
And I check the "cc_motion_gpt.yaml", found the "need_not_train_data: True", which causes the "def _build_train_loader(self):" not work. Is that correct?Or should I change "need_not_train_data" to "false"?
When training the CrossCondGPT2 using cc_motion_gpt.yaml
config and loading the AIST++ data using load_data_aist
[line 538] in ./utils/functional.py
, I'm confusing to the for loop in line 572.
Because of 8 downsample rate, the np_music feature current read is at 7.5fps, while the np_dance is at 60fps. However, in the for loop, the seq_len
is associate with np_music, and the step for range(...)
is move(which is 8 in the config yaml), the step for np_music is i/music_sample_rate (the music_sample_rate I think is also 8), the step for np_dance is 8. As a result, the music training sample is actually step by 1 frame in the range(...)
loop. If the music sample step by 1 frame, we should get len(np_music)
training sample. But the step in range(....)
is 8, len(np_music) - len(np_music)//8
training sample is left behind.
In a nutshell, I think the split function in for loop doesn't match.
When following the steps outlined in the documentation to prepare the AIST++ dataset, after downloading the annotations and unzipping them into the './aist_plusplus_final/' folder, and then running the './prepare_aistpp_data.sh' script, the resulting number of files in the 'data/aistpp_train_wav' directory is 980. However, your paper mentions 952 files intended for training. This discrepancy requires clarification and investigation.
Hello, thank you for your wonderful work!
I have two questions about implementing your work. I have something I want to confirm before proceeding. If I use a pretrained model, I can directly perform the choreographic stage for music in the wild without an evaluation phase, right?
Hi, thanks for sharing the codebase of the Bailando project. Could you also share the code for the IK process and rotmat so we can transform the outputs into a bvh or fbx file? Right now the solutions I found are not compatible with the output format of the Actor Critic model. The code that was commented out in utils/functional.py also didn't work. I'd really appreciate if you can share some code of how your team did it! Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.