google-research / mint Goto Github PK

View Code? Open in Web Editor NEW

486.0 15.0 85.0 191 KB

Multi-modal Content Creation Model Training Infrastructure including the FACT model (AI Choreographer) implementation.

License: Apache License 2.0

Python 100.00%

mint's Introduction

AI Choreographer: Music Conditioned 3D Dance Generation with AIST++ [ICCV-2021].

Overview

This package contains the model implementation and training infrastructure of our AI Choreographer.

Get started

Pull the code

git clone https://github.com/liruilong940607/mint --recursive

Note here --recursive is important as it will automatically clone the submodule (orbit) as well.

Install dependencies

conda create -n mint python=3.7
conda activate mint
conda install protobuf numpy
pip install tensorflow absl-py tensorflow-datasets librosa

sudo apt-get install libopenexr-dev
pip install --upgrade OpenEXR
pip install tensorflow-graphics tensorflow-graphics-gpu

git clone https://github.com/arogozhnikov/einops /tmp/einops
cd /tmp/einops/ && pip install . -U

git clone https://github.com/google/aistplusplus_api /tmp/aistplusplus_api
cd /tmp/aistplusplus_api && pip install -r requirements.txt && pip install . -U

Note if you meet environment conflicts about numpy, you can try with pip install numpy==1.20.

Get the data

See the website

Get the checkpoint

Download from google drive here, and put them to the folder ./checkpoints/

Run the code

complie protocols

protoc ./mint/protos/*.proto

preprocess dataset into tfrecord

python tools/preprocessing.py \
    --anno_dir="/mnt/data/aist_plusplus_final/" \
    --audio_dir="/mnt/data/AIST/music/" \
    --split=train
python tools/preprocessing.py \
    --anno_dir="/mnt/data/aist_plusplus_final/" \
    --audio_dir="/mnt/data/AIST/music/" \
    --split=testval

run training

python trainer.py --config_path ./configs/fact_v5_deeper_t10_cm12.config --model_dir ./checkpoints

Note you might want to change the batch_size in the config file if you meet OUT-OF-MEMORY issue.

run testing and evaluation

# caching the generated motions (seed included) to `./outputs`
python evaluator.py --config_path ./configs/fact_v5_deeper_t10_cm12.config --model_dir ./checkpoints
# calculate FIDs
python tools/calculate_scores.py

Citation

@inproceedings{li2021dance,
  title={AI Choreographer: Music Conditioned 3D Dance Generation with AIST++},
  author={Ruilong Li and Shan Yang and David A. Ross and Angjoo Kanazawa},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year = {2021}
}

mint's People

Contributors

Stargazers

Watchers

Forkers

liruilong940607 peterzhousz steelbin arogozhnikov linecode alan-ai-learner dut3062796s haxine hephaex davidhefan vtalker luckily-lzy soulcoder91 zchen1997 joeyouss dramatic1 closefile human2b pawn1999 oxygenman guhailin caijianfei guytevet em-biet-code baldrlector jason870509 isabella232 xiaozhuo12138 leon0427 jimmy0124244 parth-patel97 maxmax2016 zhangsanfeng86 wikieden xinfushe aravind-shankar peter05010402 1783883121 umesato shizukanaskytree vvright runner0353 sunotsue conansherry bzyfengjie wj-fifth kosukefukazawa seungyoon-lee tigerhix sophiewu7 zhaolingao hyperperson levieee adiprad toney dotwasi hkgbar usecapilot wang-sanity 1165048017 jiwei-dot leishengsheng nick-202305 hassanamin994 jasonhao518 user2187395112 hcynomo 542774114 shixin257 forkhug hankinghu dzqdzq 5l1v3r1 totorolin geng-lee jzho987 selfbriefs flybyflow xiongxiao

mint's Issues

protoc error

when run " protoc ./mint/protos/*.proto ", it output "Missing output directives."

Is there any comparison with ChoreoMaster?

ChoreoMaster:
https://netease-gameai.github.io/ChoreoMaster/

No module named "orbit"

This error always occurs! Is there anything wrong about installation of orbit?

Root translation wrong after 2 seconds

Hi,

Congratulations for such a great work.

We've been trying to recover the generated animations (in the output .npy) files to an fbx, and we are almost there.
Since you are normalizing the translation of the root, we are multiplying it again by the scale of our character in order to use.
However, this only seems to work for the first 2 seconds for every generated motion, after what the root translation gets kind of exagerated and wrong.

Any idea why this could be happening?

Thanks,

about the crossmodal_train.txt

I'm sorry I meet a problem.
where can I get this file of crossmodal_train.txt
In this code, it seems in /mnt/data/AIST/music/, but i can't find where to download .

Where is the code that can generate video using generated smpl pose para?

Now, I use a new music to generate a new dance. I got a sequence of smpl pose(720, 24, 9), and the global motion(720,3). So, next, how can I use the generated smpl pose and global motion to render a dance video?

Evaluation stuck at initialized model.

I0928 11:08:09.961607 139665222619776 controller.py:391] restoring or initializing model...
restoring or initializing model...
I0928 11:08:09.962239 139665222619776 controller.py:397] initialized model.
initialized model.

CUDA, Tensorflow, nvcc versions?

Hello,

which CUDA, tensorflow, nvcc versions are you using? I am having problems with tensorflow. It does not recognise my GPUs and it should be related to version compatibility.

the classification of dance genres

Does the project code include the dance genres classification process? If yes, which part is it?

Inference - some output angles seem wrong

Visualizing the output .npy files of evaluator.py according to README with the provided checkpoints seems like some of the angles (e.g. shoulders) are wrong, maybe flipped. In this clip, green is the original sample from AIST++ and red is MINT inference.

To visualize, I implement the opposite operation described here, then used Blender's SMPLX addon to visualize. Here's my code:

    rotations = mint_data[:seq_len, 9:] # trim first 9 entries according to https://github.com/google-research/mint - mint/tools/preprocessing.py +161
    rotations = rotations.reshape([-1, 3, 3])
    rotations = R.from_matrix(rotations).as_rotvec().reshape([seq_len, (joint_dim-1)//9, 3])
    body_pose = rotations[:, :NUM_SMPLX_BODYJOINTS]  # FIXME - not sure about that (trimming last 3 joints from smpl 24, to smplx 21)

how can i use my own audio input?

how to get the music data?

how to get the music data which used in the preprocessing "--audio_dir="/mnt/data/AIST/music/" ?

# Original dim 219, after padding 6 translation, 225

Hello,

Thanks for providing the model.
I am not sure how to pad the 6 translation to 225.

/Wenjie

How to transfer .npy file to video

Thanks for your fancy work. I run evaluator.py file and the output file is .npy. Could you please give me some suggestions about transferring the output file to video?

In calculate_beat_scores.py, what should be the result_files?

Hi. After evaluation, I just tried to run calculate_beat_scores.py. However, the default result_files is '/mnt/data/aist_paper_results/*.pkl' which doesn't work. Is there anyone could tell me how to generate the motion and replace the default result files? Thank you very much!

Can mint support the smpl scale prediction?

Can mint support the smpl scale prediction? Code shows that mint doesn't support smpl scale prediction, but the official demo has diffierent scales (different distance).

https://google.github.io/aichoreographer/assets/gen_results_8.mp4

crossmodal_val.txt

OSError: /mint/data/aist_plusplus_final/splits/crossmodal_val.txt not found.

Is there some relation between audio input and motion input? Could you please provide some inference code for other custom audio inputs?

freezing motion video

@liruilong940607 Hi, Ruilong, i run the evaluation code with the pretrained model you provided on google drive, however the result video like this, the first two seconds motion from the seed motion, and then got freezing motion in the following seconds. I don't know why. Could you please give me some suggestions?

How to drive the 3D character?

Hi @liruilong940607 ,

Thanks for your great work!!!

How to drive the 3D character?

Resume training from latest checkpoint

How can I resume training from a checkpoint, or is it already configured to train from the latest checkpoint file? Thanks.

I have no protos dic

using "protoc ./mint/protos/*.proto", I got a error:"Missing output directives."

bvh_writer instruction

Hi Ruilong,

Is it possible that providing instructions on bvh_writer?
I didn't find the skeleton_csv_filename and joints_to_ignore_csv_filename.

Best,
Wenjie

How can I get the music dic？

where can I download the musics（not the videos）

Is there a pytorch version implementation?

How can I get the music dic？

where can I download the musics（not the videos）

No successful evaluation is run

I0930 20:10:24.557036 139652333998720 controller.py:277] eval | step: 214501 | running complete evaluation...
eval | step: 214501 | running complete evaluation...
I0930 20:10:27.645429 139652333998720 controller.py:290] eval | step: 214501 | eval time: 3.1 sec | output: {}
eval | step: 214501 | eval time: 3.1 sec | output: {}

No evaluation is actually conducted...

Visulization with 3D character

Hi, thanks for your fancy work. I'm new in 3D visualization. Just curious how you visualize the generate 3D motion with character from Mixamo. Do you use blender or something? Could you possibly refer some helpful websites or something like that?

Thanks in advance.

Something Wrong with fast_processing in input_util.py?

mint/mint/utils/inputs_util.py

Lines 101 to 103 in b8f8bdf

 example["audio_input"] = example["audio_sequence"][start:start + 

 audio_input_length, :] 

 example["audio_input"].set_shape([audio_input_length, audio_dim])

It should be:

example["audio_input"] = example["audio_sequence"][start * audio_sample_rate:start * audio_sample_rate + 
                                                   audio_input_length, :]

where can I set multi GPUs run

where can I set multi GPUs to run ?
I print 'nvidia-smi' on my conputer and fond GPUs have not been used

Where is the aist_features directory?

I'm trying to run the calculate_fid_scores.py and here is the error.
The error shows that the stack of real_features is empty.

It seems that I need to load the real_features from the ./data/aist_features/*_kinetic.npy and ./data/aist_features/*_manual.npy, but I didn't find it in my repository?

I'd like to know where is the aist_features directory or how to generate *_kinetic.npy and ./data/aist_features/*_manual.npy from the real data?

loss逐步减小，但FID_k却逐渐增大？(tensorflow & pytorch)

本人用Pytorch复刻了一版，也增加了valid过程，结果发现，FID_k最好的是第21个epoch，为101，后面越训练，FID_k的值越大，波动比较大，也不知道什么原因。 loss值降到0.0002左右，基本就不收敛了，FID_k的值能达到7000多, FID_g的值25左右。
使用原作者tensorflow原代码，从头开始训练，loss值降到0.0001后，基本就不收敛了，FID_k的值达到700多，FID_g也就30多，也是完全复现不了原作者放出的训练好的模型。 @liruilong940607

TF: 2.3
cuda: 10.1

pytorch: 1.9.1
cuda: 10.1

阶段进展更新了，pytorch复刻版训练已经收敛，复现了原作者的指标参数，解决方法如下：

1、每一层的初始化方法与TF版保持一致，注意检查每一层的默认初始化方法
2、由于训练集只有952个视频，如果多个GPU同时训练，batch_size设置为32，那么每个epoch只包含几个迭代，建议方法是加载列表后，把列表复制10倍或20倍，这样每个epoch的迭代次数就变为原来的10倍或20倍
3、训练足够多的epoch，如果训练的时候，迭代器中把训练数据列表复制了10倍或20倍，那么loss要收敛到0.00011级别，至少训练800个epoch，0-200 epoch是lr=1e-4，200-500 epoch时lr=1e-5，500-800 是1e-6

另外，我在最后一个CrossTransformer后面，最后一个fc前面，增加了一个2层双向的LSTM，训练下来，loss收敛到0.00011左右，FIDk最小可以达到22.3，比原作者给出的指标还小，但这个指标小，并不代表着生成的舞蹈动作很好，效果如原作者给出的模型类似，只有少数几个效果还行，其他的都不怎么样。

_get_tempo function in preprocessing.py might be wrong with m**0.wav files

I calculate tempo of mBR0, 105, with librosa, which is different to the “int(audio_name[3]) * 10 + 80” code.

How to export the saved_model correctly?

How to export the saved_model correctly? There are some error in exported saved_model. See below.

saved_model_cli show --dir=savedmodel/214501 --all
2021-12-21 22:36:19.753846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

Defined Functions:

# Pad the input motion translation from 3-dim to 9-dim.

Hi,

I was wondering why pad the translation from 3 to 9?

Best,
Wenjie

Is there a version use cpu?

About the eval result

my eval result using the given checkpoint has only a few second valid dance.
after a few second, the result became totally nonsence.
is this the real case or did I do something wrong?

Correct repo to `git clone`

Hi, is this information in the README correct?

git clone https://github.com/liruilong940607/mint --recursive

Perhaps, it's git clone https://github.com/google-research/mint --recursive?

Thanks

It seems the command has missing options, what is the full command?
^_^

how to use the bvh_writer.py

Could you tell me how to use the bvh_writer.py? There is not any introduction or demo for it. Thanks!

IndexError: Read less bytes than requested

when I ran the training code as the instructions, this problem occured.

	example["audio_input"] = example["audio_sequence"][start:start +
	audio_input_length, :]
	example["audio_input"].set_shape([audio_input_length, audio_dim])