lisiyao21 / bailando Goto Github PK

View Code? Open in Web Editor NEW

381.0 381.0 59.0 7.56 MB

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

License: Other

Python 98.90% Shell 1.10%

bailando's People

Contributors

Stargazers

Watchers

bailando's Issues

Problem with the node name.

Thank you for your perfect work. I try to use the Choreographic for music in the wild. But when I run the command, I didn't know how to set the node name. Can anyone else help me?

Thank you very much.

Code release date

Great work!
When will the code be released?

The music is already invalid

download all original music pieces (wav) into './aist_plusplus_final/all_musics'.

I can't access, could you help me with it

pred_root = '/mnt/lustressd/lisiyao1/dance_experiements/experiments/music_ccgpt2_ac_ba_1e-5_freeze_droupout_beta0.5/vis/pkl/ep000005'

What type of file is inside? How to generate it?

About [your node name]

What does 'your node name' mean in your training and testing methods? I didn't understand it. Could you provide an example?

The generated person in video disappear in the video sometimes?

Thank you for your work!
I follow the steps of Choreographic for music in the wild.
I get the output videos of the model.
Problem:
But I find the person in the videos would disappear sometimes.And there is no sound in the video.
Q1 : Is there a way to solve the problem?
Q2: Will this kind of disappearance affect the location of 3d keypoints?

Suggestion on import aist_plusplus

Bailando/extract_aist_features.py

Line 5 in 125b529

from aist_plusplus.loader import AISTDataset

suggest as :
from aistplusplus_api.aist_plusplus.loader import AISTDataset

ValueError: only one element tensors can be converted to Python scalars

hey, I face this problem using the first step command
python -u main.py --config configs/sep_vqvae.yaml --train
THE OUTPUT IS
using SepVQVAE We use bottleneck! No motion regularization! We use bottleneck! No motion regularization! train with AIST++ dataset! test with AIST++ dataset! {'structure': {'name': 'SepVQVAE', 'up_half': {'levels': 1, 'downs_t': [3], 'strides_t': [2], 'emb_width': 512, 'l_bins': 512, 'l_mu': 0.99, 'commit': 0.02, 'hvqvae_multipliers ': [1], 'width': 512, 'depth': 3, 'm_conv': 1.0, 'dilation_growth_rate': 3, 'sample_length': 240, 'use_bottleneck': True, 'joint_channel': 3, 'vel': 1, 'acc': 1, 'vqvae_reverse _decoder_dilation': True, 'dilation_cycle': None}, 'down_half': {'levels': 1, 'downs_t': [3], 'strides_t': [2], 'emb_width': 512, 'l_bins': 512, 'l_mu': 0.99, 'commit': 0.02, ' hvqvae_multipliers': [1], 'width': 512, 'depth': 3, 'm_conv': 1.0, 'dilation_growth_rate': 3, 'sample_length': 240, 'use_bottleneck': True, 'joint_channel': 3, 'vel': 1, 'acc': 1, 'vqvae_reverse_decoder_dilation': True, 'dilation_cycle': None}, 'use_bottleneck': True, 'joint_channel': 3, 'l_bins': 512}, 'loss_weight': {'mse_weight': 1}, 'optimizer': {'type': 'Adam', 'kwargs': {'lr': 3e-05, 'betas': [0.5, 0.999], 'weight_decay': 0}, 'schedular_kwargs': {'milestones': [100, 200], 'gamma': 0.1}}, 'data': {'name': 'aist', 'tra in_dir': 'data/aistpp_train_wav', 'test_dir': 'data/aistpp_test_full_wav', 'seq_len': 240, 'data_type': 'None'}, 'testing': {'height': 540, 'width': 960, 'ckpt_epoch': 500}, 'e xpname': 'sep_vqvae', 'epoch': 500, 'batch_size': 32, 'save_per_epochs': 20, 'test_freq': 20, 'log_per_updates': 1, 'seed': 42, 'rotmat': False, 'cuda': True, 'global_vel': Fal se, 'ds_rate': 8, 'move_train': 40, 'sample_code_length': 150, 'sample_code_rate': 16, 'analysis_sequence': [[126, 81]], 'config': 'configs/sep_vqvae.yaml', 'train': True, 'eva l': False, 'visgt': False, 'anl': False, 'sample': False} 07/03/2022 03:06:44 Epoch: 1 Traceback (most recent call last): File "main.py", line 56, in <module> main() File "main.py", line 40, in main agent.train() File "/home/fuyang/project/Bailando/motion_vqvae.py", line 107, in train 'loss': loss.item(), **ValueError: only one element tensors can be converted to Python scalars**
My environment is
pytorch 1.11.0+cu102
8xGPU NVIDIA TITAN Xp(12196MiB)

Some file missing in data.zip

Hi, thanks for sharing your code!
I download the data.zip that you released. However, I got an error here. I check the data and I found that aistpp_music_feat_7.5fps/mJB4.json is empty. Is it a mistake?

error

Hi, I have an error when run _prepro_aistpp_music.py. It seems that the size of feature is not correct.How to solve this?Thanks! @lisiyao21

Issues with data preprocess

Thanks for sharing your wonderful work!
I wonder how you get the specific number, 15360 * 2, as the sampling rate for 60 FPS. Can you elaborate how specific rate is obtained through calculation?

Another concern is with beats extractions using librosa, in _prepro_aistpp_music.py, I found that onset_env, onset_beat, tempogram are all-0s. Is this correct?

why only can see the lower half body?

When I run 'python main.py --config configs/sep_vqvae.yaml --eval' using you pretrained model, I can only see the lower half body.

[

How to drive 3D character

Great work!!

Follow Readme , it output the video.
But the video that generated from actor_critic model is 2D .
How to drive a 3D character?

data lost

Your pre-processed data lost gPO_sBM_cAll_d10_mPO0_ch06, which made me confused.

Supplementary File

I noticed that you mentioned the supplementary file many times, including the detailed derivation of L_AC, the structures of the convolutional encoder and decoders, and so on...
Where can I find it? I have searched on your website but not found it.

When will release the 'Choreographic for music in the wild' function

强化学习生成的视频在哪里呢？

您好！
在Choreographic for music in the wild这个步骤中，我运行了强化学习代码，这个train指令。
在 ./experiments/actor_critic_for_demo/vis/videos 文件夹中仅存在每5轮进行一次testing的文件夹，但是这个文件夹点开以后是空的，这是为什么呢？我要继续怎么做

在 .\experiments\actor_critic_for_demo_paper\vis\imgs中倒是生成了图片，但是最终并未合成视频。

难道是运行完强化学习的train指令后，在运行eval指令，然后生成吗？

我是小白，不太了解这个方面的知识

Can you explain dimension of actor model output is 72 which means skeleton rotation or position without root offset(Hips or pelvis)?

I want to generate character animation, how can I do?

where is the definition of "from .utils.torch_utils import assert_shape"

Hi @lisiyao21
Thank you for release code!
where is the definition of "from .utils.torch_utils import assert_shape"

Bailando/models/encdec.py

Line 4 in 27fe2b6

from .utils.torch_utils import assert_shape

How to show the dancing character via Mixamo?

According to the code you provided, I can only get the dancing stickman (skeleton), how to get the dancing character animation. Is there any relevant code or tutorial about Mixamo?

Thanks a lot.

About kinetic feature extracting.

Hello, thanks for your great paper and code!
When reading your evaluation code, I noticed maybe a mistake in utils/features/utils.py line 129.
Is the correct code：
v1 = (
positions[i + j][joint_idx]
- positions[i + j - 1][joint_idx]
) / frame_time

def calc_average_acceleration(
positions, i, joint_idx, sliding_window, frame_time
):
current_window = 0
average_acceleration = np.zeros(len(positions[0][joint_idx]))
for j in range(-sliding_window, sliding_window + 1):
if i + j - 1 < 0 or i + j + 1 >= len(positions):
continue
v2 = (
positions[i + j + 1][joint_idx] - positions[i + j][joint_idx]
) / frame_time
v1 = (
positions[i + j][joint_idx]
- positions[i + j - 1][joint_idx] / frame_time
)
average_acceleration += (v2 - v1) / frame_time
current_window += 1
return np.linalg.norm(average_acceleration / current_window)

Doubts about the Bailando model

You have completed a very good model!

I also achieved very good results when I was working on your model. But there are still some questions that are not very clear. Are you experiencing gradient explosion when implementing the Actor-Critic Learning module? My model still converged at the first epoch, and it did have some improvement compared to GPT. However, during the subsequent iterations, L_AC increased significantly and could not continue to converge. And the visualization results also became very strange.

Looking forward for your reply!

ModuleNotFoundError: No module named 'models.utils.logger'

ailando/models/vqvae_root.py", line 7, in
from .utils.logger import average_metrics
ModuleNotFoundError: No module named 'models.utils.logger'

No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'

Bailando/utils/functional.py", line 130, in img2video
music_names = sorted(os.listdir(audio_dir))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/lustre/share/lisiyao1/original_videos/aistpp-audio/'

Visualize gt error.

I want to compare the generated results with the ground truth.
It seems that the code also support visualize ground truth by passing the visgt parameter.
However when I call the script using the visgt parameter, it seems like that the code is not fully implemented:
the last two parameters of visualizeAndWrite function is not set correctly.
How should I set these two parameters (especially the last quants para) to make the function execute correctly?

Why can I only see the lower half of my body?

When I use the pre-trained model to evaluate.I can only see the lower half of my body like this.

How can I use code to calculate the beat alignment of video files

Is there a way to change the ‘Starting pose codes’？

Hi，thank you for your work again!
It really inspires me and bring me interest in deep learning!
amazing job!
Problem：
I found the generated music dance videos in the same style which may not coordinated with my music（‘青花瓷-jay_chou’).
I supposed it may be caused by the starting pose code.But I can not find how to choose and where to set it.

Q1: Is there a way to change the ‘Starting pose codes’ which mentioned in your paper？
Q2: How to choose the starting pose codes? Is there a table I can find explicitly mapping the starting pose codes to dance style

Thank you again!
Aleeyanger

No 坚果云 link

here (坚果云) ：没有坚果云下载链接

about the data process

Hello siyao!
I'm reading your code and i'm confused about the 'align' function in '_prepro_aistpp.py'.
To make the length(time) of music equal to that of dance, you throw the extra feature away. Is that reasonable? Why not do a uniform sampling?
Sorry for bothering you.

The meanings of FID_k and FID_g of groudtruth?

Hi authors,

Thank you for your fantastic work!
I have a small question: In Table 1, FID_k, FID_g of groundtruth are reported. I am a little bit confused with this. Do they mean to compute FID_k and FID_g between the two same sets of groundtruth data? In other words, why FID_k and FID_g of the groundtruth are not 0?

Thank you,
Best

Use SMPl , get error

Hi @lisiyao21 ,

Thank you very much for your valuable code!
I set "rotmat: True", Use SMPL, but get the error as follow:

The pure command to run the code

Hi Siyao,

I'd like to run the code by myself, if I don't use slurm manager, could you tell me the pure command to run the code?

Out of Memory

Hi there,
When I run the second step as your instructions, I met the "out of memory" problem.
I tried to debug the problem and found it is because the music_data is float64 and the memory is consumed rapidly when converting the list music_data to music_np. (in File "utils/functional.py").
Have you ever met the same problem like me?
Is it possible to use float32 for training data(music_np) without decreasing the performance of the final model?

BTW: There are 120G of memory in my computer.

Processed data

Hi,
There is no link to the processed data, since you said that "Otherwise, directly download our preprocessed feature from here as ./data folder if you don't wish to process the data."

Can you add the data link? Thanks!

Want to run on Google Collab.

Hello, Im fairly new to working with github repositories and such. But I want to use the pretrained model of the bailandopp. The instruction are pretty daunting. Is there anyone who can provide thorough instruction on how to set it up to run on Google Collab. Or can get a instruction set?

Warning when evaluating

Hi, siyao! I run the command python extract_aist_features.py to extract the (kinetic & manual) features of all AIST++ motions. However, I met with a warning:

WARNING: You are using a SMPL model, with only 10 shape coefficients.

Do you know the reason?

Why extracting audio features twice with different sampling rate?

Hi, Siyao~
Thanks for releasing and cleaning the code!!

May I ask why in the pre-processing part, the audio (music) features are extracted twice and with different sampling rates?

Precisely, in _prepro_aistpp.py, the audio features are extracted with the sampling rate 15360*2

While in _prepro_aistpp_music.py, the audio features are extracted with the sampling rate 15360*2/8

Where is the gt_root file?

training problem

I meet error at Step 1 by running python -u main.py --config configs/sep_vqvae.yaml --train

Traceback (most recent call last):
  File "main.py", line 56, in <module>
    main()
  File "main.py", line 40, in main
    agent.train()
  File "/share/yanzhen/Bailando/motion_vqvae.py", line 94, in train
    loss.backward()
  File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 166, in backward
    grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
  File "/root/anaconda3/envs/workspace/lib/python3.8/site-packages/torch/autograd/__init__.py", line 67, in _make_grads
    raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

After print the loss, it looks like tensor([0.2667, 0.2735, 0.2687, 0.2584, 0.2701, 0.2697, 0.2571, 0.2658], device='cuda:0', grad_fn=<GatherBackward>), so do I need to take a mean or sum operation?

However, even if I take a mean operation, the training still seems problematic. The loss decreases normally, while in eval stage, the output quants are all zero. Any suggestion?

The training log is attached for reference.

log.txt

@lisiyao21

about the run command

For me, a beginner of DL,
sh srun.sh configs/sep_vqvae.yaml train [your node name] 1
what dose the '[your node name]' mean?
can you give me a more specific command?
Thank you a lot!

Eval

Thanks you for sharing the code.

how to generate smpl motion file such as .fbx and blend the skeleton to dance, can you give a script to generate .fbx file?

thank you very much

Error when processing the data

Hi,

The index of the preprocessing code "_prepro_aistpp.py" seems to be wrong, cause there is no such file in the folder. Do you know how to fix it?

About "cc_motion_gpt.yaml"

Hi siyao,
Thanks for your great work. I have a question.
When I train in step 3 (Train motion GPT)，an error occurs---"AttributeError: 'MCTall' object has no attribute 'training_data' ".
And I check the "cc_motion_gpt.yaml", found the "need_not_train_data: True", which causes the "def _build_train_loader(self):" not work. Is that correct？Or should I change "need_not_train_data" to "false"?

Loading AIST++ data

When training the CrossCondGPT2 using cc_motion_gpt.yaml config and loading the AIST++ data using load_data_aist[line 538] in ./utils/functional.py, I'm confusing to the for loop in line 572.
Because of 8 downsample rate, the np_music feature current read is at 7.5fps, while the np_dance is at 60fps. However, in the for loop, the seq_len is associate with np_music, and the step for range(...) is move(which is 8 in the config yaml), the step for np_music is i/music_sample_rate (the music_sample_rate I think is also 8), the step for np_dance is 8. As a result, the music training sample is actually step by 1 frame in the range(...) loop. If the music sample step by 1 frame, we should get len(np_music) training sample. But the step in range(....) is 8, len(np_music) - len(np_music)//8 training sample is left behind.
In a nutshell, I think the split function in for loop doesn't match.

Different number of files after running the Data Preparation Script

When following the steps outlined in the documentation to prepare the AIST++ dataset, after downloading the annotations and unzipping them into the './aist_plusplus_final/' folder, and then running the './prepare_aistpp_data.sh' script, the resulting number of files in the 'data/aistpp_train_wav' directory is 980. However, your paper mentions 952 files intended for training. This discrepancy requires clarification and investigation.

Inquiries about customizing input motions and output format

Hello, thank you for your wonderful work!
I have two questions about implementing your work. I have something I want to confirm before proceeding. If I use a pretrained model, I can directly perform the choreographic stage for music in the wild without an evaluation phase, right?

I want to generate dance motion by conditioning input music and seed motion. How can I input my seed motion?
I'm also curious if it's possible to extract the output motion as a pkl file.

3D animation

Hi, thanks for sharing the codebase of the Bailando project. Could you also share the code for the IK process and rotmat so we can transform the outputs into a bvh or fbx file? Right now the solutions I found are not compatible with the output format of the Actor Critic model. The code that was commented out in utils/functional.py also didn't work. I'd really appreciate if you can share some code of how your team did it! Thanks!

lisiyao21 / bailando Goto Github PK

bailando's People

Contributors

Stargazers

Watchers

Forkers

bailando's Issues

Recommend Projects

Recommend Topics

Recommend Org