zejun-yang / aniportrait Goto Github PK

View Code? Open in Web Editor NEW

4.1K 4.1K 517.0 54.82 MB

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

License: Apache License 2.0

Python 100.00%

aniportrait's People

Contributors

Stargazers

Watchers

Forkers

dvhuang johndpope beviswong dreamtalecore neighbory af-74413592 taichuai sarperkilic leemengtw xiusdk drwaish purepolymath sdbds kustomzone camenduru pinkinginging splinter21 duya123 anask63 majiajue jiamsu painebenjamin allenxuejian naiqixiao jags111 teneous huangchao0821 zhouzhq7 diyism suryatmodulus angive a414166402 m1ndb0ts mayur-ml blizaine yacineali74 paperwave bluewhiteheart ecafe8 c1a1o1 aliang-cv lcxuzhen banglevan awmanw hzw19 wuzhongdehua wumenggen hubin858130 tummyyu goy0695 lplzyp fingerx artigee gary109 mlpolaris dongpeng66 andypinxinliu anshengqiang zhoumz123 jeffchy kenny-hash hfengzhi maxmax2016 json-zhangbo binglanla g711ab feilaoda tuchsanai chunhualiu xymfei abdoiiii pineking fightseed yumianhuli2 andox927 qdphp0532 lihuibng zouwuhe dotweb3 zhaopufeng fredwe yuan505 wailovet kellhuang navezjt f901107 hyeonjungham opensorceycw zcfrank1st ysuws1314 adambear muhammed001 readytodance sanshaoyeyang nemonameless ramstorageai yimi072 algorithmlover2016 liubo0902 erjihaoshi

aniportrait's Issues

Audio driven推理出来的视频为噪声

Audio driven推理官方的demo，float16精度下，生成视频数据为nan；将精度改为float32后，生成视频数据不为nan，但是视频是噪声。能生成landmark视频。

It's a exciting node!When will it support Comfyui?

No Webui

Is there a plan for Aniportrait to have webui?
It would be nice if this will work with Gradio, Auto 1111 or Pinokio

AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?

File "C:\AniPortrait-main\venv\Lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 353, in
main()
File "C:\AniPortrait-main\venv\Lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\AniPortrait-main\venv\Lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 112, in get_requires_for_build_wheel
backend = _build_backend()
^^^^^^^^^^^^^^^^
File "C:\AniPortrait-main\venv\Lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 77, in build_backend
obj = import_module(mod_path)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\importlib_init.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1387, in _gcd_import
File "", line 1360, in _find_and_load
File "", line 1310, in _find_and_load_unlocked
File "", line 488, in _call_with_frames_removed
File "", line 1387, in _gcd_import
File "", line 1360, in _find_and_load
File "", line 1331, in _find_and_load_unlocked
File "", line 935, in load_unlocked
File "", line 995, in exec_module
File "", line 488, in call_with_frames_removed
File "C:\Users\buwad\AppData\Local\Temp\pip-build-env-aa6vx29p\overlay\Lib\site-packages\setuptools_init.py", line 16, in
import setuptools.version
File "C:\Users\buwad\AppData\Local\Temp\pip-build-env-aa6vx29p\overlay\Lib\site-packages\setuptools\version.py", line 1, in
import pkg_resources
File "C:\Users\buwad\AppData\Local\Temp\pip-build-env-aa6vx29p\overlay\Lib\site-packages\pkg_resources_init.py", line 2172, in
register_finder(pkgutil.ImpImporter, find_on_path)

Data preparation error

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1711775950.511664 83072 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x300c W0000 00:00:1711775950.512722 83072 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. I0000 00:00:1711775950.569749 83072 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x300c

请问这是怎么回事呢？是Mediapipe或TensorFlow的底层实现有问题吗？我是按照readme配置的环境，前面的三个推理都能正常运行。但运行 python -m scripts.preprocess_dataset 来处理hdtf数据集时时出现如上所示的报错，网上搜不到类似的情况和解决方法，只好冒昧来请教作者，非常感谢

no model audio2pose.pt

ffmpeg required

~/AniPortrait# python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512 -L 64
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
/usr/local/lib/python3.10/dist-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711601336.882610 1144 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1711601336.993011 1252 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 550.54.15), renderer: Tesla T4/PCIe/SSE2
W0000 00:00:1711601337.025092 1144 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I0000 00:00:1711601337.098167 1144 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1711601337.154676 1262 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 550.54.15), renderer: Tesla T4/PCIe/SSE2
pose video has 1794 frames, with 30 fps
/root/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute in_channels directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
num_channels_latents = self.denoising_unet.in_channels
100%|████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [09:03<00:00, 21.74s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:10<00:00, 6.07it/s]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/AniPortrait/scripts/pose2vid.py", line 199, in
main()
File "/root/AniPortrait/scripts/pose2vid.py", line 189, in main
ffmpeg.input(pose_video_path).output(audio_output, acodec='copy').run()
File "/usr/local/lib/python3.10/dist-packages/ffmpeg/_run.py", line 313, in run
process = run_async(
File "/usr/local/lib/python3.10/dist-packages/ffmpeg/_run.py", line 284, in run_async
return subprocess.Popen(
File "/usr/lib/python3.10/subprocess.py", line 971, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

考不考虑训练一个LCM-LoRA？或者做个turbo模型？

如题。

是否支持流式输出

lips move too fast

I am trying to using audio2vid to generate some audio driven video and I found the lips move too fast and it looks not so nature.
I wondered that if it's possible to smooth the lips and made the video more nature?
Thanks for your great work and looks good for the video generated.

00000000_6_512x512_3_0841.mp4

No PositionNet

Hi I check that in diffusers.models.embeddings version 0.24.0 has no Positionet function inside. Can you suggest me about how to fix this error?

ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings'

vid2pose错误

安装并执行audio2vid成功，但在执行vid2pose时报错：

# python -m scripts.vid2pose --video_path 2fe655a671ecb61f93d54d6ad2729951.mp4
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1711714173.841578   36856 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
  0%|                                                                                         | 0/1065 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "E:\wsp\ML\AniPortrait\scripts\vid2pose.py", line 42, in <module>
    lmks = face_result['lmks'].astype(np.float32)
TypeError: 'NoneType' object is not subscriptable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\wsp\ML\AniPortrait\env\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "E:\wsp\ML\AniPortrait\env\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "E:\wsp\ML\AniPortrait\scripts\vid2pose.py", line 46, in <module>
    pose_img = kps_results[-1]
IndexError: list index out of range

no file named config.json and xFormers can't load C++/CUDA extensions

Download and configure the model strictly according to the instructions in the readme.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Device and system information：

System: Windows 11
GPU: RTX 4090 24Gb

nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76                 Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+

Error Message:

python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512 -L 64
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.2.2+cu121)
    Python  3.10.11 (you have 3.10.6)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
D:\Program Files\Python316\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
Traceback (most recent call last):
  File "D:\Program Files\Python316\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Program Files\Python316\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\code\open\AniPortrait\scripts\pose2vid.py", line 199, in <module>
    main()
  File "D:\code\open\AniPortrait\scripts\pose2vid.py", line 60, in main
    reference_unet = UNet2DConditionModel.from_pretrained(
  File "D:\Program Files\Python316\lib\site-packages\diffusers\models\modeling_utils.py", line 712, in from_pretrained
    config, unused_kwargs, commit_hash = cls.load_config(
  File "D:\Program Files\Python316\lib\site-packages\diffusers\configuration_utils.py", line 365, in load_config
    raise EnvironmentError(
OSError: Error no file named config.json found in directory ./pretrained_model/stable-diffusion-v1-5.

Question about the VFHQ dataset

Fantastic work! I have a question about the dataset. Which version of VFHQ dataset did you use, the raw training set w/o resize or the resized 512*512 version? Thanks.

no file named config.json

Someone has already mentioned this issue before:
#2

Unable to find ./pretrained_model/stable-diffusion-v1-5/config.json

The explanation given is:

While downloading the config files from HugginFace it prepend the config file names with folder name.
Renamed the same and it worked.

However, there is no file similar to config.json in the root directory of the stable-diffusion-v1-5 project: there is neither xxx_config.json, nor config_xxx.json nor xxx_config_xxx.json

And there is only one file model_index.json in the stable-diffusion-v1-5 root directory, so is model_index.json the config.json in theAniPortrait project?

看起来没有运行在GPU?

Ubuntu22.04
4090 24G
python: 3.10.6
cuda: 11.7

执行命令为:python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512, 速度很慢, 看到底下有一行INFO: Created TensorFlow Lite XNNPACK delegate for CPU. 请问是哪里除了问题?

(aniPortrait) bob@echo:/data/svr/AniPortrait$ python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
/home/bob/anaconda3/envs/aniPortrait/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711713685.522386  303509 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1711713685.646712  303665 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 535.54.03), renderer: NVIDIA GeForce RTX 4090/PCIe/SSE2
W0000 00:00:1711713685.647129  303509 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I0000 00:00:1711713685.679296  303509 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1711713685.858304  303714 gl_context.cc:357] GL version: 3.2 (OpenGL ES 3.2 NVIDIA 535.54.03), renderer: NVIDIA GeForce RTX 4090/PCIe/SSE2
pose video has 1794 frames, with 30 fps
/data/svr/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
  num_channels_latents = self.denoising_unet.in_channels
  4%|██▉                                                                      | 1/25 [01:36<38:32, 96.34s/it]

video2video，源video需要和ref_image对齐吗？

提供参考video和参考图片，生成新的视频。
1）可以是video2video，这里有什么要求吗？
2）分2步，先video2pose，得到pose序列；再pose2video。

不管哪种方法，都有一个问题，就是“参考video和参考图片可以任意搭配吗？二者需要有什么对齐关系吗？比如上下位置要一致？”

从网络上随意找的动图素材（.mp4），选择代码中给的ref_image。但效果大打折扣。

audio2vid 画面不动

python -m scripts.audio2vid --config ./configs/prompts/animation_audio.yaml -W 512 -H 512 -L 64

无报错，正常输出李云龙lyl的视频。只能输出前面不到2s的视频，后面的画面就静止不动了，声音还在继续。

参考图输入尺寸要求

基于参考图和人脸mark序列生成视频，对于参考图尺寸是否有要求？如果尺寸过大，resize到512*512去推理生成，人脸脸型会扭曲

配置有要求吗？24G显存+32G内存跑不起来

如题

The audio-driven video image is flickering

Thank you for your excellent work! The flickering in the demo video driven by audio is quite severe. Is there any solution to optimize it?

How much GPU ram it requires?

I wonder if you have a rough estimate of required GPU memory. Trying to figure this out before running the code.
Thanks!

Does it work on CPU?

It's not on topic. but I started using a computer in the year 2000 and developed optimization methods. From 2013 to 2021 I dedicated myself to becoming a beta tester. I'm retired for a while. It is very important to have CPU support in projects. mainly because it can be optimized better on the CPU since the times are measured in comparison with the GPU. If it works well on CPU it works much better on GPU. although it takes more time for the CPU to do the job. I don't know if this artificial intelligence has CPU support. If he doesn't have it, I think they should give it to him. Nowadays with artificial intelligence it is very easy to program.

ffmpeg._run.Error: ffmpeg error: Output file does not contain any stream

python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512 -L 64

运行失败：
ffmpeg error
Output file does not contain any stream

nternally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  hidden_states = F.scaled_dot_product_attention(
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [01:33<00:00,  3.75s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:02<00:00, 30.31it/s] 
ffmpeg version 2023-07-19-git-efa6cec759-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable
-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynt
h --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzv
bi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enabl
e-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblens
fun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3
d11va --enable-dxva2 --enable-libvpl --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --e
nable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 -
-enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      58. 14.100 / 58. 14.100
  libavcodec     60. 22.100 / 60. 22.100
  libavformat    60. 10.100 / 60. 10.100
  libavdevice    60.  2.101 / 60.  2.101
  libavfilter     9.  8.102 /  9.  8.102
  libswscale      7.  3.100 /  7.  3.100
  libswresample   4. 11.100 /  4. 11.100
  libpostproc    57.  2.100 / 57.  2.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './configs/inference/pose_videos/solo_pose.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf60.16.100
  Duration: 00:00:59.80, start: 0.000000, bitrate: 890 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 512x512, 887 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Output #0, adts, to 'audio_from_video.aac':
[out#0/adts @ 000001f345c96500] Output file does not contain any stream
Error opening output file audio_from_video.aac.
Error opening output files: Invalid argument
Traceback (most recent call last):
  File "D:\Program Files\Python316\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Program Files\Python316\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\code\open\AniPortrait\scripts\pose2vid.py", line 199, in <module>
    main()
  File "D:\code\open\AniPortrait\scripts\pose2vid.py", line 189, in main
    ffmpeg.input(pose_video_path).output(audio_output, acodec='copy').run()
  File "D:\code\open\AniPortrait\venv\lib\site-packages\ffmpeg\_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

Audio2video any hints on lip sync

I am using audio2video script with a song, i.e. audio contains music and voice. The produced video doesn't have any lip movement. On the face mesh I can see small lip movement, but not on the image. Any hints on improving this?

Could it be because of default pose_temp?

Mac 下载 decord库报错

大家用的都是什么系统啊，Linux还是windows啊，我想在mac上部署运行一下，第一步下载依赖就给我拦下了。看说明说需要降低到python3.8才可以，但是这个项目使用python3.10.😭

Face reenacment inference error

CUDA_LAUNCH_BLOCKING=1 python3 -m scripts.vid2vid --config ./configs/prompts/animation_f
acereenac.yaml -W 512 -H 512
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1711701676.709502 1816 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:84) egl_initializedUnable to initialize EGL
W0000 00:00:1711701676.709832 1816 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
I0000 00:00:1711701676.725945 1816 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:84) egl_initializedUnable to initialize EGL
source video has 1202 frames, with 25 fps
0%| | 0/25 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [96,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [97,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [98,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [99,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [100,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [101,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [102,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [103,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [104,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [105,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [106,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [107,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [108,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [109,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [110,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [111,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [112,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [113,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [114,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [115,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [116,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [117,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [118,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [119,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [120,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [121,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [122,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [123,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [124,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [125,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [126,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [132,0,0], thread: [127,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [32,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [33,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [34,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [35,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [36,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [37,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [38,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [39,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [40,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [41,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [42,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [43,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [44,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [45,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [46,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [47,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [48,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [49,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [50,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [51,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [52,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [53,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [54,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [55,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [56,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [57,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [58,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [59,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [60,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [61,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [62,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2676,0,0], thread: [63,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
0%| | 0/25 [00:27<?, ?it/s]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/scripts/vid2vid.py", line 233, in
main()
File "/workspace/scripts/vid2vid.py", line 200, in main
video = pipe(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/src/pipelines/pipeline_pose2vid_long.py", line 532, in call
torch.cat([pose_cond_tensor[:, :, c] for c in context])
File "/workspace/src/pipelines/pipeline_pose2vid_long.py", line 532, in
torch.cat([pose_cond_tensor[:, :, c] for c in context])
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

推理结果为噪声

          >  solo_solo_pose1_512x512_3_1438_noaudio.mp4

我们复现了您反馈的问题。触发条件没有正确加载我们预训练权重，导致模型参数跟我们的任务不匹配。您可以重新下载我们的预训练模型，并确保其位于正确路径下，详情参考README.md。 https://huggingface.co/ZJYang/AniPortrait/tree/main

感谢您的回复，请问复现的这段视频是哪部分模型参数没有加载正确呢？我参考了本repo中的预训练模型目录，我的目录如下，看上去和您给的readme一致：

Originally posted by @fredkingdom in #35 (comment)

非常棒，比丑陋的阿里强太多了

Program stucks when running inference scripts

Hi~

Environment: Python3.10, cuda11.8 and others follow requirements.txt

IDE：VSCode Terminal, SSH to Ubuntu Server 

Command: python -m scripts.vid2vid --config ./configs/prompts/animation_facereenac.yaml -W 512 -H 512 -L 64

Stucks in

# AniPortrait/src/utils/face_landmark.py(line 3164)
return cls( # <--
    task_info.generate_graph_config(
        enable_flow_limiting=options.running_mode
        == _RunningMode.LIVE_STREAM
    ),
    options.running_mode,
    packets_callback if options.result_callback else None,
)

# anaconda3/envs/ani/lib/python3.10/site-packages/mediapipe/tasks/python/vision/core/base_vision_task_api.py  (line 70)
self._runner = _TaskRunner.create(graph_config, packet_callback) # <--

Audio driven 可以生成独立的视频吗

python -m scripts.audio2vid --config ./configs/prompts/animation_audio.yaml -W 512 -H 512

替换自己的图片和音频后，输出结果是3个视频拼接的，我想生成图片和音频的最终视频，另外如何生成更高清的视频呢，参数是什么？感谢

OSError: Error no file named config.json found in directory ./pretrained_model/stable-diffusion-v1-5.

Hello,

I have installed as mentioned in Readme.

I have not made any changes in the yaml file and trying to run using the default config and getting this error.
There is no config.json file
https://huggingface.co/runwayml/stable-diffusion-v1-5

(venv) C:\sd\AniPortrait>python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512 -L 64
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\sd\AniPortrait\scripts\pose2vid.py", line 199, in <module>
    main()
  File "C:\sd\AniPortrait\scripts\pose2vid.py", line 60, in main
    reference_unet = UNet2DConditionModel.from_pretrained(
  File "C:\sd\AniPortrait\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 712, in from_pretrained
    config, unused_kwargs, commit_hash = cls.load_config(
  File "C:\sd\AniPortrait\venv\lib\site-packages\diffusers\configuration_utils.py", line 365, in load_config
    raise EnvironmentError(
OSError: Error no file named config.json found in directory ./pretrained_model/stable-diffusion-v1-5.

执行推理报错！

您好，按照github上给的pretrain_model目录放置好文件后，执行推理时报错，是否需要git lfs 下载所有的文件而不是仅仅下载目录列出的文件呢？报错信息时加载stable-diffusion-v1-5 这个目录的模型时报错。报错信息如下：
pretrained_base_model_path:./pretrained_model/stable-diffusion-v1-5
Traceback (most recent call last):
File "/root/miniconda3/envs/aniport/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 109, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/root/miniconda3/envs/aniport/lib/python3.10/site-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/root/miniconda3/envs/aniport/lib/python3.10/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/envs/aniport/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/aniport/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/autodl-tmp/AniPortrait/scripts/pose2vid.py", line 202, in
main()
File "/root/autodl-tmp/AniPortrait/scripts/pose2vid.py", line 62, in main
reference_unet = UNet2DConditionModel.from_pretrained(
File "/root/miniconda3/envs/aniport/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 800, in from_pretrained
state_dict = load_state_dict(model_file, variant=variant)
File "/root/miniconda3/envs/aniport/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 116, in load_state_dict
raise OSError(
OSError: You seem to have cloned a repository without having git-lfs installed. Please install git-lfs and run git lfs install followed by git lfs pull in the folder you cloned.

how to generate pose_temp.npy

Thanks for you great work.
I didn't found any place to generate pose_temp.npy which I want to use my own video to generate the pose.
Would you like to let me know how to generate pose_temp.npy?
Thanks a lot.

pip install onnxruntime-gpu 失败

pip install onnxruntime-gpu

Changes
1.17.1
Release Notes : https://github.com/Microsoft/onnxruntime/releases/tag/v1.17.1
1.17.0
Release Notes : https://github.com/Microsoft/onnxruntime/releases/tag/v1.17.0
1.16.0
Release Notes : https://github.com/Microsoft/onnxruntime/releases/tag/v1.16.0
1.15.0
Release Notes : https://github.com/Microsoft/onnxruntime/releases/tag/v1.15.0

以上版本安装都失败了. MAC M3 MAX上安装的

只生成了前几秒，后面的失败

生成的结果只显示4s，后面的都不显示。

下面是terminal的输出，请问是什么原因带来的？
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight'] .conda/envs/aniportrait/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1711607888.991068 24604 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x300c W0000 00:00:1711607888.991566 24604 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. I0000 00:00:1711607889.027265 24604 task_runner.cc:85] GPU suport is not available: INTERNAL: ; RET_CHECK failure (mediapipe/gpu/gl_context_egl.cc:77) display != EGL_NO_DISPLAYeglGetDisplay() returned error 0x300c pose video has 1794 frames, with 30 fps workspace/AniPortrait/src/pipelines/pipeline_pose2vid_long.py:408: FutureWarning: Accessing config attributein_channelsdirectly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'. num_channels_latents = self.denoising_unet.in_channels 100%|████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:07<00:00, 3.47it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 62.69it/s] ffmpeg version n4.2.8-4-ga1b534b Copyright (c) 2000-2022 the FFmpeg developers built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~16.04) configuration: --disable-static --enable-shared --enable-libx264 --enable-gpl libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './configs/inference/pose_videos/solo_pose.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.20.100 Duration: 00:00:59.86, start: 0.000000, bitrate: 1024 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 512x512, 887 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default) Metadata: handler_name : SoundHandler Output #0, adts, to 'audio_from_video.aac': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.29.100 Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default) Metadata: handler_name : SoundHandler Stream mapping: Stream #0:1 -> #0:0 (copy) Press [q] to stop, [?] for help size= 960kB time=00:00:59.81 bitrate= 131.5kbits/s speed=5.48e+03x video:0kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.869552% ffmpeg version n4.2.8-4-ga1b534b Copyright (c) 2000-2022 the FFmpeg developers built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~16.04) configuration: --disable-static --enable-shared --enable-libx264 --enable-gpl libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output/20240328/1438--seed_42-256x256/solo_solo_pose_256x256_3_1438_noaudio.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf60.3.100 Duration: 00:00:00.53, start: 0.000000, bitrate: 511 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 776x260, 496 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler [aac @ 0x17e6780] Estimating duration from bitrate, this may be inaccurate Input #1, aac, from 'audio_from_video.aac': Duration: 00:01:04.14, bitrate: 122 kb/s Stream #1:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 122 kb/s Stream mapping: Stream #0:0 -> #0:0 (copy) Stream #1:0 -> #0:1 (aac (native) -> aac (native)) Press [q] to stop, [?] for help Output #0, mp4, to 'output/20240328/1438--seed_42-256x256/solo_solo_pose_256x256_3_1438.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.29.100 Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 776x260, q=2-31, 496 kb/s, 30 fps, 30 tbr, 15360 tbn, 15360 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s Metadata: encoder : Lavc58.54.100 aac frame= 16 fps= 15 q=-1.0 Lsize= 987kB time=00:00:59.86 bitrate= 135.1kbits/s speed=57.2x video:32kB audio:943kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.217777% [aac @ 0x18190c0] Qavg: 546.818

solo_solo_pose_512x512_3_1231.mp4

Support for LCM-LoRA + AnimateDiff-Lightning?

Thanks for the model and pipelines, superb work!

As the technique is based on SD1.5 + AnimateDiff, I was wondering if it would be possible to create a super fast version based on the LCM-LoRA for SD1.5 plus Animate Diff Lightning (https://huggingface.co/ByteDance/AnimateDiff-Lightning)

为什么会有鹿火

11111

model loading problem when running pose2vid.py

The following error occurs when loading the pre-trained model：

Traceback (most recent call last):
File "/ssd-sata1/anaconda3/envs/Ani/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/ssd-sata1/anaconda3/envs/Ani/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/ssd-sata1/AniPortrait/scripts/pose2vid.py", line 199, in
main()
File "/ssd-sata1/AniPortrait/scripts/pose2vid.py", line 89, in main
torch.load(config.denoising_unet_path, map_location="cpu"),
File "/ssd-sata1/anaconda3/envs/Ani/lib/python3.9/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/ssd-sata1/anaconda3/envs/Ani/lib/python3.9/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
File "/ssd-sata1/anaconda3/envs/Ani/lib/python3.9/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/ssd-sata1/anaconda3/envs/Ani/lib/python3.9/site-packages/torch/serialization.py", line 1112, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage)._typed_storage()._untyped_storage
RuntimeError: PytorchStreamReader failed reading file data/2: invalid header or archive is corrupted

感谢EMO就是多余了

在ai赛道，阿里什么都没做。只会信手捻花别人的开源，然后改进，闭源发布，简直就是开源社区的毒瘤，一点贡献都没有，尽恶心人。跟腾讯比。简直差了十万八千里，我都怀疑，阿里是不是被爷爷送给孙子了。做事风格尽显孙子样

UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.

最后一步运行报错了，本人是windows系统，AMD显卡以及驱动。
最后这里报错：没有找到英伟达的驱动。
我查了一下AMD的GPU是不支持CUDA的

我是不是寄了
$2T{YFFU~NDSDNEJH57%EN_I$

WARNING[XFORMERS]:

按照给出的流程，最后一步运行报错：

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.2.2+cu121)
Python 3.10.11 (you have 3.10.14)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details

系统：windows
找到的解决方案：TheLastBen/fast-stable-diffusion#2615

按照解决方案执行之后仍然无法运行。有大佬知道原因吗？

# python -m scripts.audio2vid --config ./configs/prompts/animation_audio.yaml -W 512 -H 512
Some weights of the model checkpoint at ./pretrained_model/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ./pretrained_model/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "E:\wsp\ML\AniPortrait\env\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "E:\wsp\ML\AniPortrait\env\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "E:\wsp\ML\AniPortrait\scripts\audio2vid.py", line 225, in <module>
    main()
  File "E:\wsp\ML\AniPortrait\scripts\audio2vid.py", line 70, in main
    reference_unet = UNet2DConditionModel.from_pretrained(
  File "E:\wsp\ML\AniPortrait\env\lib\site-packages\diffusers\models\modeling_utils.py", line 712, in from_pretrained
    config, unused_kwargs, commit_hash = cls.load_config(
  File "E:\wsp\ML\AniPortrait\env\lib\site-packages\diffusers\configuration_utils.py", line 365, in load_config
    raise EnvironmentError(
OSError: Error no file named config.json found in directory ./pretrained_model/stable-diffusion-v1-5.