moorethreads / moore-animateanyone Goto Github PK

View Code? Open in Web Editor NEW

3.1K 36.0 237.0 14.09 MB

Character Animation (AnimateAnyone, Face Reenactment)

License: Apache License 2.0

Python 100.00%

moore-animateanyone's People

Contributors

Stargazers

Watchers

Forkers

chnxindong princetrunks xiaohu2015 sdbds cocktailpeanut dezigns333 neromous alakia edwinkestler wweevv-johndpope kuoenterprises pipazoul thbmelatoe fenghuayumo shiliuawa anthonyyuan claudycindy azure-arc-0 kerwinchina 406345 wucongquan inf800 quocanh34 adamlwalker buckedunicorn patitimoner tony-xubiao zhoushiwei peibinchen jessehao123 eaglebh rminz fingerx eltociear garbe-github-support darkseed yuhuofei jameynakama hhy5277 camenduru xiusdk hdluffy jmwdpk cicimmmmm dongxiaoke ingeniousfrog xhysdjkdsjsk2021 laurance0001 ariafyy lcenarthas plasmalife ilyamk canslove tayl054 macguyversmusic nudatemobe soon14 saxoji dhruvr kimwoonggon ai-jie01 jingx8885 dirten fsinbad ywl-zhufeng ole-e-ole peter65374 duke24k npjd bisory-j b3r1itzx jmaigc keyman9848 wonsgit positioner cylonspace leaderyangzi edustack chenyu-inspirai quatmo liyuquan zealot09 fang-zhang f901107 zhangxujinsh lycsqq chasemonsteraway allwavemedia kustomzone ailabteam yanxg davidko3 shubham123a zsxkib jinwook-shim smartjoy-tech lihuibng wipwai yanhn hadryan

moore-animateanyone's Issues

The test Demo result is inconsistent with display result

I generated the video using
1.image: config/ref_images/anyone-3.png
2.pose video: config/pose_videos/anyone-video-5_kps.mp4

However, the face effect on the video is inconsistent with the face effect shown

The result is:
https://github.com/MooreThreads/Moore-AnimateAnyone/assets/98437692/d7bafef8-d13c-4f67-b966-587b9d20de3d

请问是怎么拿到animate anyone的效果的？

3D Computation Clarity in unet_3d, resnet_3d, and transform_3d

I have a question: unet_3d, resnet_3d, and transform_3d only deal with dimensional transformations, yet there is no indication anywhere that 3D computation is necessary.

different transforms in preprocess training data

I noticed that different transformation operations are used for pose and image.
For pose:
self.cond_transform = transforms.Compose(
[
transforms.RandomResizedCrop(
self.img_size,
scale=self.img_scale,
ratio=self.img_ratio,
interpolation=transforms.InterpolationMode.BILINEAR,
),
transforms.ToTensor(),
]
)
For image:
self.transform = transforms.Compose(
[
transforms.RandomResizedCrop(
self.img_size,
scale=self.img_scale,
ratio=self.img_ratio,
interpolation=transforms.InterpolationMode.BILINEAR,
),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
Why does pose not require the final normalization step?

poetry setup

move to poetry.

The generated video is only 2s long

I tested several pose videos.The generated video is only 2 seconds long. How to modify the code?

How vram required? Run in rtx 3060 12gb?

I got decent results, but some of the samples were wrong, Could you provide us the training code?

Thank you for sharing the inference code and models @lixunsong ; the results are good, but further optimization is needed for stable outcomes. I would highly appreciate it if you could open source the training code as well.

20240116T1501.mp4

Cannot run example scripts. OOM Error

Thank you for your great work. When I directly run your provided command, it gives "RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)"

Exact settings for example videos?

https://github.com/MooreThreads/Moore-AnimateAnyone#%EF%B8%8F-examples

Also any preconditioning and post editting?

TypeError: 'weights_only' is an invalid keyword argument for Unpickler()

vid2pose.py: No module named 'src'

Traceback (most recent call last):
File "E:\AI\Moore-AnimateAnyone\tools\vid2pose.py", line 1, in
from src.dwpose import DWposeDetector
ModuleNotFoundError: No module named 'src'

Question on datasets

Congratulations on achieving such amazing results!!!
Both cartoons and real person can make smooth motion, so I have a question on which type of datasets did you use during training, like ubc datasets or dataset from tiktok?

RuntimeError: PytorchStreamReader failed reading file data/383: invalid header or archive is corrupted

How to run on a single pose image?

I try to run the inference on a single input pose image, but it will cause uc_mask error and cannot run.

max length 128

Is there anyway to increase the length of the output video longer than 4sec?

Resources required for training?

Hello,

Thank you so much for releasing the training code. What is the GPU VRAM required for training? Say if one wants to train it using single A100 (40GB) how long will it take to get very good results?

config.json

OSError: Error no file named config.json found in directory ./pretrained_weights/stable-diffusion-v1-5/.

I get this error, but all files are fine and also the config.json

We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like ./pretrained_weights/sd-vae-ft-mse is not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode

ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings'

Traceback (most recent call last):
  File "Moore-AnimateAnyone\app.py", line 16, in <module>
    from src.models.unet_2d_condition import UNet2DConditionModel
  File "Moore-AnimateAnyone\src\models\unet_2d_condition.py", line 40, in <module>
    from .unet_2d_blocks import (
  File "Moore-AnimateAnyone\src\models\unet_2d_blocks.py", line 15, in <module>
    from .transformer_2d import Transformer2DModel
  File "Moore-AnimateAnyone\src\models\transformer_2d.py", line 7, in <module>
    from diffusers.models.embeddings import CaptionProjection
ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings'

Traing code error

Many thanks for releasing the training code.

However, when following the environment setting as well as data preparation and then running the command of stage 1 training, I got the error in the following screenshot. Is there anything wrong?

Looking forward to your reply. Thank you again !

Guidance for Fine-tuning Moore AnimateAnyone with a Small Dataset

Hello Moore AnimateAnyone team,

I've been exploring your remarkable project and am interested in applying it to a specific domain by fine-tuning the pre-trained model on a small, domain-specific dataset. I would appreciate some guidance on the best practices for fine-tuning the model effectively. My questions are as follows:

Model Weight Initialization: For fine-tuning, is it recommended to initialize the model with the provided pre-trained weights and then continue training on the new dataset? If so, could you provide an example or guidance on loading the pre-trained weights correctly before starting the fine-tuning process?
Two-Stage Training Process: The training process for the model is described as two-stage. Should fine-tuning on a new dataset also follow this two-stage approach, or are there any modifications or considerations we should be aware of for fine-tuning?
Data Preparation and Augmentation: For fine-tuning on a small dataset, are there any specific data preparation or augmentation techniques you recommend to prevent overfitting and ensure the model generalizes well to the new domain?
Hyperparameter Adjustments: Are there any specific hyperparameters (e.g., learning rate, batch size) that you suggest tweaking for fine-tuning as opposed to training from scratch?
Evaluation during Fine-tuning: What are the best practices for evaluating the model during the fine-tuning process to ensure that it's adapting well to the new dataset without forgetting the knowledge gained during pre-training?

Any guidance, examples, or additional resources you could provide would be greatly appreciated. Fine-tuning deep learning models can be nuanced, and insights from the creators would be invaluable.

Thank you for your work on this innovative project and for your support to the community.

torch

i have tried alll torch version which version work?

mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)

Mac (MPS) support?

Any plans to support MPS? Thanks!

Getting model inference results is too slow

Any tips for speeding up model inference? I used a 3090ti graphics card, but it took up to 10 minutes to generate a 4 second video.

Gradio Demo doesn't work

The demo at https://huggingface.co/spaces/xunsong/Moore-AnimateAnyone fails, even when using the example inputs.

When clicking the animate button, we only see the word "Error". Chrome reports this message:

iframeResizer.contentWindow.js:171 [iFrameSizer][iFrameResizer0] No tagged elements (data-iframe-height) found on page

colab demo

someone be so kind as to make a google colab to test this thanks :)

脸部扭曲

脸部细节都刻画了，导致了生成图片脸部扭曲
官方实现中，没有包含脸部细节

Can you provide a more powerful model?

Thank you very much for providing your valuable code and model. The current model test results may need further optimization.
In addition, can you open source the training code? I would be very grateful.

Enhancing the Fidelity of Generated Animations in Moore-AnimateAnyone

Dear Moore-AnimateAnyone Contributors,

I hope this message finds you well. I have been thoroughly exploring the capabilities of the Moore-AnimateAnyone repository and am deeply impressed by the strides made in animating still images with such remarkable results. The demo hosted on HuggingFace Spaces is particularly indicative of the potential this technology holds.

However, upon delving into the examples provided and running my own tests, I have observed certain limitations that I believe, if addressed, could significantly elevate the quality of the animations produced. I would like to propose a few enhancements that could potentially mitigate these issues and refine the overall animation process.

Background Artifacts: The presence of artifacts in animations, especially when the reference image has a clean background, can be quite distracting. Could we consider implementing a more robust background detection and preservation algorithm to maintain the integrity of the original image?
Scale Mismatch: The suboptimal results due to scale mismatch between the reference image and keypoints are noticeable. While the paper suggests preprocessing techniques, their implementation is not yet apparent in the current version. Could we prioritise the integration of these preprocessing techniques to improve the handling of scale variations?
Motion Subtleties: The flickering and jittering in animations with subtle motions or static scenes detract from the fluidity of the animation. Would it be possible to introduce a smoothing mechanism or a motion threshold to ensure that only significant movements are translated into the animation sequence?

I understand that these enhancements may involve considerable research and development efforts, but I believe they could be instrumental in pushing the boundaries of what Moore-AnimateAnyone can achieve. Additionally, these improvements could be pivotal in the deployment of this technology on the MoBi MaLiang AIGC platform, ensuring a more polished and professional output for end-users.

I am keen to follow the progress of this project and am more than willing to contribute to discussions or testing, should you find my feedback of value.

Thank you for your dedication to this innovative project, and I look forward to your thoughts on the potential for these enhancements.

Best regards,
yihong1120

【抱团取暖】加入 AnimateAnyone 交流群～

感谢 Repo 大佬开源！大佬求勿删，纯技术交流，转发一个数字人 AIGC 技术交流群，群里日常讨论 Animate Anyone 在内的各项数字人基础技术，有各种资源分享、一键安装脚本等，小白友好，交流活跃，一起提高开发、学习效率。

Unable to load weights from checkpoint file

every time i run this I get the following error:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 109, in load_state_dict
    return torch.load(checkpoint_file, map_location="cpu")
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1028, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1246, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 122, in load_state_dict
    raise ValueError(
ValueError: Unable to locate the file ./pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin which is necessary to load this pretrained model. Make sure you have saved the model properly.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\routes.py", line 534, in predict
    output = await route_utils.call_process_api(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1554, in process_api
    result = await self.call_function(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1192, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 659, in wrapper
    response = f(*args, **kwargs)
  File "C:\AI\AnimateAnyone\Moore-AnimateAnyone\app.py", line 52, in animate
    reference_unet = UNet2DConditionModel.from_pretrained(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 800, in from_pretrained
    state_dict = load_state_dict(model_file, variant=variant)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 127, in load_state_dict
    raise OSError(
OSError: Unable to load weights from checkpoint file for './pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin' at './pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

any solutions?

Is it right for these flicker result?

Very good job! I run you code in Colab，use anyone-video-2 kpts in your lib, just choose my reference img, but the results seem to no good, can you check it?

20240108-211948_anyone-video-2_784x512_3_0644.mp4

+._anyone-video-2_784x512_3_0728.mp4

Gradio crashes part way through

But the iterations continue.
Once completed a load of noise , this was length 32 , from a 25 frame source video

20240113T1143.mp4

Safetensor models

Is there an easy way to use safetensor models with the pipeline?
I have a few merges I would like to try.

Wonderful work, any plans on releasing training code?

How can i create a motion sequence video based on a video recored by myself

As the title mentioned , I want to use other motion sequences to test the model

Here is a function that is not used?

In src/pipelines/utils.py, the fuction set_tensor_interpolation_method doesn't look like it's ever been used. I searched for this function name globally in VSCode and found that it only appeared once, when it was defined.
Then I found out that the variable tensor_interpolation will only be modified in this function, which means that the return value get_tensor_interpolation_method function is always None, if I understand correctly.
The function get_tensor_interpolation_method used when building the Pose2VideoPipeline, and I'm not sure if this will affect the results.

OSError: /content/Moore-AnimateAnyone/pretrainedweights/image_encoder does not appear to have a file named config.json. Checkout 'https://huggingface.co//content/Moore-AnimateAnyone/pretrainedweights/image_encoder/None' for available files.

As the title says. Which config file is needed there?

training error in stage 1

File "Moore-AnimateAnyone/src/models/mutual_self_attention.py", line 180, in hacked_basic_transformer_inner_forward
norm_hidden_states[_uc_mask],
IndexError: The shape of the mask [2] at index 0 does not match the shape of the indexed tensor [3, 9216, 320] at index 0
Steps: 1%|▎ | 249/30000 [06:30<12:57:21, 1.57s/it, lr=1e-5, step_loss=0.107]

Questions about training dataset

Such an open source effort with amazing results!
I have some questions about training data. What is the approximate amount of video data used to train the model?

overfit phenomenon?

Thanks for your great work. Have you ever encounter the phenomenon of overfit?

vid2pose - import error

Hello!

Could you tell me why I'm getting error when launch vid2pose modul. I use commend that you provided.
python tools/vid2pose.py --video_path my_path/to_file.mp4

Console log:
Traceback (most recent call last):
  File "/content/Moore-AnimateAnyone/tools/vid2pose.py", line 1, in <module>
    from src.dwpose import DWposeDetector
ModuleNotFoundError: No module named 'src

back of controlnet_aux.util

great!
When processing data, the absence of controlnet_aux.util leads to the inability to run

Some weights of the model checkpoint were not used when initializing "UNet2DConditionModel" ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']

When i try to load from safetensors i found some training weight problem.
Althrough it works in diffusers,but it had some hidden danger

vae models downloading are missed in `download_weights.py`

Moore-AnimateAnyone/tools/download_weights.py

Line 106 in 6d0c2da

prepare_base_model()

You guys forgot to call prepare_vae() in main function

ReferenceAttentionControl

With unconditional generation during training, should reference embedding concat to the normal_hidden_states?

Question about training code

Can you release your training code?

Weights of the stage one

Hello, thanks for your open weights. However, I am wondering to use some features of the result from the first training stage, will you share these weights?
I also wondering the ability of keeping id and generating high-quality image about the first-stage result, would you share me with this experience?