Giter VIP home page Giter VIP logo

moore-animateanyone's People

Contributors

lixunsong avatar npjd avatar songtao-liu-mt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moore-animateanyone's Issues

【抱团取暖】加入 AnimateAnyone 交流群~

感谢 Repo 大佬开源!大佬求勿删,纯技术交流,转发一个数字人 AIGC 技术交流群,群里日常讨论 Animate Anyone 在内的各项数字人基础技术,有各种资源分享、一键安装脚本等,小白友好,交流活跃,一起提高开发、学习效率。
SCR-20240420-odtl

Can you provide a more powerful model?

Thank you very much for providing your valuable code and model. The current model test results may need further optimization.
In addition, can you open source the training code? I would be very grateful.

Traing code error

Many thanks for releasing the training code.

However, when following the environment setting as well as data preparation and then running the command of stage 1 training, I got the error in the following screenshot. Is there anything wrong?

image

Looking forward to your reply. Thank you again !

Here is a function that is not used?

In src/pipelines/utils.py, the fuction set_tensor_interpolation_method doesn't look like it's ever been used. I searched for this function name globally in VSCode and found that it only appeared once, when it was defined.
Then I found out that the variable tensor_interpolation will only be modified in this function, which means that the return value get_tensor_interpolation_method function is always None, if I understand correctly.
The function get_tensor_interpolation_method used when building the Pose2VideoPipeline, and I'm not sure if this will affect the results.

torch

i have tried alll torch version which version work?

Safetensor models

Is there an easy way to use safetensor models with the pipeline?
I have a few merges I would like to try.

vid2pose - import error

Hello!

Could you tell me why I'm getting error when launch vid2pose modul. I use commend that you provided.
python tools/vid2pose.py --video_path my_path/to_file.mp4

Console log:
Traceback (most recent call last):
  File "/content/Moore-AnimateAnyone/tools/vid2pose.py", line 1, in <module>
    from src.dwpose import DWposeDetector
ModuleNotFoundError: No module named 'src

Guidance for Fine-tuning Moore AnimateAnyone with a Small Dataset

Hello Moore AnimateAnyone team,

I've been exploring your remarkable project and am interested in applying it to a specific domain by fine-tuning the pre-trained model on a small, domain-specific dataset. I would appreciate some guidance on the best practices for fine-tuning the model effectively. My questions are as follows:

  1. Model Weight Initialization: For fine-tuning, is it recommended to initialize the model with the provided pre-trained weights and then continue training on the new dataset? If so, could you provide an example or guidance on loading the pre-trained weights correctly before starting the fine-tuning process?

  2. Two-Stage Training Process: The training process for the model is described as two-stage. Should fine-tuning on a new dataset also follow this two-stage approach, or are there any modifications or considerations we should be aware of for fine-tuning?

  3. Data Preparation and Augmentation: For fine-tuning on a small dataset, are there any specific data preparation or augmentation techniques you recommend to prevent overfitting and ensure the model generalizes well to the new domain?

  4. Hyperparameter Adjustments: Are there any specific hyperparameters (e.g., learning rate, batch size) that you suggest tweaking for fine-tuning as opposed to training from scratch?

  5. Evaluation during Fine-tuning: What are the best practices for evaluating the model during the fine-tuning process to ensure that it's adapting well to the new dataset without forgetting the knowledge gained during pre-training?

Any guidance, examples, or additional resources you could provide would be greatly appreciated. Fine-tuning deep learning models can be nuanced, and insights from the creators would be invaluable.

Thank you for your work on this innovative project and for your support to the community.

different transforms in preprocess training data

I noticed that different transformation operations are used for pose and image.
For pose:
self.cond_transform = transforms.Compose(
[
transforms.RandomResizedCrop(
self.img_size,
scale=self.img_scale,
ratio=self.img_ratio,
interpolation=transforms.InterpolationMode.BILINEAR,
),
transforms.ToTensor(),
]
)
For image:
self.transform = transforms.Compose(
[
transforms.RandomResizedCrop(
self.img_size,
scale=self.img_scale,
ratio=self.img_ratio,
interpolation=transforms.InterpolationMode.BILINEAR,
),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
Why does pose not require the final normalization step?

Unable to load weights from checkpoint file

every time i run this I get the following error:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 109, in load_state_dict
    return torch.load(checkpoint_file, map_location="cpu")
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1028, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1246, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 122, in load_state_dict
    raise ValueError(
ValueError: Unable to locate the file ./pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin which is necessary to load this pretrained model. Make sure you have saved the model properly.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\routes.py", line 534, in predict
    output = await route_utils.call_process_api(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1554, in process_api
    result = await self.call_function(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1192, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\utils.py", line 659, in wrapper
    response = f(*args, **kwargs)
  File "C:\AI\AnimateAnyone\Moore-AnimateAnyone\app.py", line 52, in animate
    reference_unet = UNet2DConditionModel.from_pretrained(
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 800, in from_pretrained
    state_dict = load_state_dict(model_file, variant=variant)
  File "C:\Users\henso\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 127, in load_state_dict
    raise OSError(
OSError: Unable to load weights from checkpoint file for './pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin' at './pretrained_weights/stable-diffusion-v1-5/unet\diffusion_pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

any solutions?

脸部扭曲

image

脸部细节都刻画了,导致了生成图片脸部扭曲
官方实现中,没有包含脸部细节

Enhancing the Fidelity of Generated Animations in Moore-AnimateAnyone

Dear Moore-AnimateAnyone Contributors,

I hope this message finds you well. I have been thoroughly exploring the capabilities of the Moore-AnimateAnyone repository and am deeply impressed by the strides made in animating still images with such remarkable results. The demo hosted on HuggingFace Spaces is particularly indicative of the potential this technology holds.

However, upon delving into the examples provided and running my own tests, I have observed certain limitations that I believe, if addressed, could significantly elevate the quality of the animations produced. I would like to propose a few enhancements that could potentially mitigate these issues and refine the overall animation process.

  1. Background Artifacts: The presence of artifacts in animations, especially when the reference image has a clean background, can be quite distracting. Could we consider implementing a more robust background detection and preservation algorithm to maintain the integrity of the original image?

  2. Scale Mismatch: The suboptimal results due to scale mismatch between the reference image and keypoints are noticeable. While the paper suggests preprocessing techniques, their implementation is not yet apparent in the current version. Could we prioritise the integration of these preprocessing techniques to improve the handling of scale variations?

  3. Motion Subtleties: The flickering and jittering in animations with subtle motions or static scenes detract from the fluidity of the animation. Would it be possible to introduce a smoothing mechanism or a motion threshold to ensure that only significant movements are translated into the animation sequence?

I understand that these enhancements may involve considerable research and development efforts, but I believe they could be instrumental in pushing the boundaries of what Moore-AnimateAnyone can achieve. Additionally, these improvements could be pivotal in the deployment of this technology on the MoBi MaLiang AIGC platform, ensuring a more polished and professional output for end-users.

I am keen to follow the progress of this project and am more than willing to contribute to discussions or testing, should you find my feedback of value.

Thank you for your dedication to this innovative project, and I look forward to your thoughts on the potential for these enhancements.

Best regards,
yihong1120

Gradio crashes part way through

But the iterations continue.
Once completed a load of noise , this was length 32 , from a 25 frame source video

20240113T1143.mp4

Resources required for training?

Hello,

Thank you so much for releasing the training code. What is the GPU VRAM required for training? Say if one wants to train it using single A100 (40GB) how long will it take to get very good results?

overfit phenomenon?

Thanks for your great work. Have you ever encounter the phenomenon of overfit?

ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings'

Traceback (most recent call last):
  File "Moore-AnimateAnyone\app.py", line 16, in <module>
    from src.models.unet_2d_condition import UNet2DConditionModel
  File "Moore-AnimateAnyone\src\models\unet_2d_condition.py", line 40, in <module>
    from .unet_2d_blocks import (
  File "Moore-AnimateAnyone\src\models\unet_2d_blocks.py", line 15, in <module>
    from .transformer_2d import Transformer2DModel
  File "Moore-AnimateAnyone\src\models\transformer_2d.py", line 7, in <module>
    from diffusers.models.embeddings import CaptionProjection
ImportError: cannot import name 'CaptionProjection' from 'diffusers.models.embeddings' 

max length 128

Is there anyway to increase the length of the output video longer than 4sec?

Cannot run example scripts. OOM Error

Thank you for your great work. When I directly run your provided command, it gives "RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x1024 and 768x320)"
1705137120120

Questions about training dataset

Such an open source effort with amazing results!
I have some questions about training data. What is the approximate amount of video data used to train the model?

training error in stage 1

File "Moore-AnimateAnyone/src/models/mutual_self_attention.py", line 180, in hacked_basic_transformer_inner_forward
norm_hidden_states[_uc_mask],
IndexError: The shape of the mask [2] at index 0 does not match the shape of the indexed tensor [3, 9216, 320] at index 0
Steps: 1%|▎ | 249/30000 [06:30<12:57:21, 1.57s/it, lr=1e-5, step_loss=0.107]

Is it right for these flicker result?

Very good job! I run you code in Colab,use anyone-video-2 kpts in your lib, just choose my reference img, but the results seem to no good, can you check it?

20240108-211948_anyone-video-2_784x512_3_0644.mp4
+._anyone-video-2_784x512_3_0728.mp4

vid2pose.py: No module named 'src'

Traceback (most recent call last):
File "E:\AI\Moore-AnimateAnyone\tools\vid2pose.py", line 1, in
from src.dwpose import DWposeDetector
ModuleNotFoundError: No module named 'src'

image

colab demo

someone be so kind as to make a google colab to test this thanks :)

ReferenceAttentionControl

With unconditional generation during training, should reference embedding concat to the normal_hidden_states?

Question on datasets

Congratulations on achieving such amazing results!!!
Both cartoons and real person can make smooth motion, so I have a question on which type of datasets did you use during training, like ubc datasets or dataset from tiktok?

config.json

OSError: Error no file named config.json found in directory ./pretrained_weights/stable-diffusion-v1-5/.

Weights of the stage one

Hello, thanks for your open weights. However, I am wondering to use some features of the result from the first training stage, will you share these weights?
I also wondering the ability of keeping id and generating high-quality image about the first-stage result, would you share me with this experience?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.