Giter VIP home page Giter VIP logo

opentalker / sadtalker Goto Github PK

View Code? Open in Web Editor NEW
10.9K 147.0 2.0K 92.16 MB

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Home Page: https://sadtalker.github.io/

License: Other

Python 97.13% Shell 1.48% Jupyter Notebook 1.34% Batchfile 0.05%
audio-driven-talking-face cvpr2023 talking-head deep-fake deep-fakes image-animation talking-face talking-face-generation talking-heads

sadtalker's Introduction

    Open In Colab   Hugging Face Spaces   sd webui-colab  
Replicate Discord

Wenxuan Zhang *,1,2Xiaodong Cun *,2Xuan Wang 3Yong Zhang 2Xi Shen 2
Yu Guo1 Ying Shan 2   Fei Wang 1

1 Xi'an Jiaotong University   2 Tencent AI Lab   3 Ant Group  

CVPR 2023

sadtalker

TL;DR:       single portrait image 🙎‍♂️      +       audio 🎤       =       talking head video 🎞.


Highlights

  • The license has been updated to Apache 2.0, and we've removed the non-commercial restriction

  • SadTalker has now officially been integrated into Discord, where you can use it for free by sending files. You can also generate high-quailty videos from text prompts. Join: Discord

  • We've published a stable-diffusion-webui extension. Check out more details here. Demo Video

  • Full image mode is now available! More details...

still+enhancer in v0.0.1 still + enhancer in v0.0.2 input image @bagbag1815
still_e_n.mp4
full_body_2.bus_chinese_enhanced.mp4
  • Several new modes (Still, reference, and resize modes) are now available!

  • We're happy to see more community demos on bilibili, YouTube and X (#sadtalker).

Changelog

The previous changelog can be found here.

  • [2023.06.12]: Added more new features in WebUI extension, see the discussion here.

  • [2023.06.05]: Released a new 512x512px (beta) face model. Fixed some bugs and improve the performance.

  • [2023.04.15]: Added a WebUI Colab notebook by @camenduru: sd webui-colab

  • [2023.04.12]: Added a more detailed WebUI installation document and fixed a problem when reinstalling.

  • [2023.04.12]: Fixed the WebUI safe issues becasue of 3rd-party packages, and optimized the output path in sd-webui-extension.

  • [2023.04.08]: In v0.0.2, we added a logo watermark to the generated video to prevent abuse. This watermark has since been removed in a later release.

  • [2023.04.08]: In v0.0.2, we added features for full image animation and a link to download checkpoints from Baidu. We also optimized the enhancer logic.

To-Do

We're tracking new updates in issue #280.

Troubleshooting

If you have any problems, please read our FAQs before opening an issue.

1. Installation.

Community tutorials: 中文Windows教程 (Chinese Windows tutorial) | 日本語コース (Japanese tutorial).

Linux/Unix

  1. Install Anaconda, Python and git.

  2. Creating the env and install the requirements.

git clone https://github.com/OpenTalker/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

### Coqui TTS is optional for gradio demo. 
### pip install TTS

Windows

A video tutorial in chinese is available here. You can also follow the following instructions:

  1. Install Python 3.8 and check "Add Python to PATH".
  2. Install git manually or using Scoop: scoop install git.
  3. Install ffmpeg, following this tutorial or using scoop: scoop install ffmpeg.
  4. Download the SadTalker repository by running git clone https://github.com/Winfredy/SadTalker.git.
  5. Download the checkpoints and gfpgan models in the downloads section.
  6. Run start.bat from Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started.

macOS

A tutorial on installing SadTalker on macOS can be found here.

Docker, WSL, etc

Please check out additional tutorials here.

2. Download Models

You can run the following script on Linux/macOS to automatically download all the models:

bash scripts/download_models.sh

We also provide an offline patch (gfpgan/), so no model will be downloaded when generating.

Pre-Trained Models

GFPGAN Offline Patch

Model Details

Model explains:

New version
Model Description
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/SadTalker_V0.0.2_256.safetensors packaged sadtalker checkpoints of old version, 256 face render).
checkpoints/SadTalker_V0.0.2_512.safetensors packaged sadtalker checkpoints of old version, 512 face render).
gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan.
Old version
Model Description
checkpoints/auido2exp_00300-model.pth Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat Face landmark model used in dilb.
checkpoints/BFM 3DMM library file.
checkpoints/hub Face detection models used in face alignment.
gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan.

The final folder will be shown as:

image

3. Quick Start

Please read our document on best practices and configuration tips

WebUI Demos

Online Demo: HuggingFace | SDWebUI-Colab | Colab

Local WebUI extension: Please refer to WebUI docs.

Local gradio demo (recommanded): A Gradio instance similar to our Hugging Face demo can be run locally:

## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app_sadtalker.py

You can also start it more easily:

  • windows: just double click webui.bat, the requirements will be installed automatically.
  • Linux/Mac OS: run bash webui.sh to start the webui.

CLI usage

Animating a portrait image from default config:
python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --enhancer gfpgan 

The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

Full body/image Generation:

Using --still to generate a natural full body video. You can add enhancer to improve the quality of the generated video.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --still \
                    --preprocess full \
                    --enhancer gfpgan 

More examples and configuration and tips can be founded in the >>> best practice documents <<<.

Citation

If you find our work useful in your research, please consider citing:

@article{zhang2022sadtalker,
  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
  journal={arXiv preprint arXiv:2211.12194},
  year={2022}
}

Acknowledgements

Facerender code borrows heavily from zhanglonghao's reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, we also used the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.

We also use the following 3rd-party libraries:

Extensions:

Related Works

Disclaimer

This is not an official product of Tencent.

1. Please carefully read and comply with the open-source license applicable to this code before using it. 
2. Please carefully read and comply with the intellectual property declaration applicable to this code before using it.
3. This open-source code runs completely offline and does not collect any personal information or other data. If you use this code to provide services to end-users and collect related data, please take necessary compliance measures according to applicable laws and regulations (such as publishing privacy policies, adopting necessary data security strategies, etc.). If the collected data involves personal information, user consent must be obtained (if applicable). Any legal liabilities arising from this are unrelated to Tencent.
4. Without Tencent's written permission, you are not authorized to use the names or logos legally owned by Tencent, such as "Tencent." Otherwise, you may be liable for legal responsibilities.
5. This open-source code does not have the ability to directly provide services to end-users. If you need to use this code for further model training or demos, as part of your product to provide services to end-users, or for similar use, please comply with applicable laws and regulations for your product or service. Any legal liabilities arising from this are unrelated to Tencent.
6. It is prohibited to use this open-source code for activities that harm the legitimate rights and interests of others (including but not limited to fraud, deception, infringement of others' portrait rights, reputation rights, etc.), or other behaviors that violate applicable laws and regulations or go against social ethics and good customs (including providing incorrect or false information, spreading pornographic, terrorist, and violent information, etc.). Otherwise, you may be liable for legal responsibilities.

LOGO: color and font suggestion: ChatGPT, logo font: Montserrat Alternates .

All the copyrights of the demo images and audio are from community users or the generation from stable diffusion. Feel free to contact us if you would like use to remove them.

sadtalker's People

Contributors

andchir avatar bhavybansal24 avatar chenxwh avatar drv-agwl avatar eltociear avatar fakerybakery avatar johndpope avatar johnephillips avatar kelvinf97 avatar monk-after-90s avatar ribasoka0 avatar teamclouday avatar thegenerativegeneration avatar vantang avatar vinthony avatar winfredy avatar zqq-judy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sadtalker's Issues

关于自定义音频

我这边想通过调用国内的语音合成,从文字生成语音,现阶段测试选的标贝的。将音频输入进去之后报错
微信图片_20230330171502
报错log如下
Traceback (most recent call last):
File "inference.py", line 133, in
main(args)
File "inference.py", line 72, in main
coeff_path = audio_to_coeff.generate(batch, save_dir, pose_style)
File "E:\PycharmProjects\SadTalker_Git\src\test_audio2coeff.py", line 74, in generate
results_dict_pose = self.audio2pose_model.test(batch)
File "E:\PycharmProjects\SadTalker_Git\src\audio2pose_models\audio2pose.py", line 85, in test
batch = self.netG.test(batch)
File "E:\PycharmProjects\SadTalker_Git\src\audio2pose_models\cvae.py", line 49, in test
return self.decoder(batch)
File "E:\Anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "E:\PycharmProjects\SadTalker_Git\src\audio2pose_models\cvae.py", line 139, in forward
x_out = self.MLP(x_in) # bs layer_sizes[-1]
File "E:\Anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
input = module(input)
File "E:\Anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "E:\Anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x220 and 262x128)
仔细翻看了readme,并未发现对音频文件有说明,劳烦大佬了

error with VideoCapture

I followed the instructions to install and generate a video, but the following error occurred How can I resolve this issue? The video itself is output and complete.

./checkpoints\epoch_20.pth
./checkpoints\auido2pose_00140-model.pth
./checkpoints\auido2exp_00300-model.pth
./checkpoints\facevid2vid_00189-model.pth.tar
./checkpoints\mapping_00229-model.pth.tar
[ERROR:[email protected]] global cap_ffmpeg_impl.hpp:1223 open Could not find decoder for codec_id=61
[ERROR:[email protected]] global cap_ffmpeg_impl.hpp:1272 open VIDEOIO/FFMPEG: Failed to initialize VideoCapture

pydub.exceptions.CouldntDecodeError: Decoding failed. ffmpeg returned error code: 1

I generate 2D face from a single Image, the audio size is 11499622.

Python 3.8.16

pip list:

Package            Version
------------------ ------------
audioread          3.0.0
blurhash           1.1.4
boost              0.1
certifi            2022.12.7
charset-normalizer 2.1.1
cmake              3.26.0
decorator          5.1.1
dlib               19.24.0
face-alignment     1.3.5
ffmpy              0.3.0
greenlet           2.0.2
idna               3.4
imageio            2.19.3
imageio-ffmpeg     0.4.7
joblib             1.1.0
kornia             0.6.8
librosa            0.6.0
llvmlite           0.31.0
Mastodon.py        1.8.0
networkx           3.0
numba              0.48.0
numpy              1.23.4
opencv-python      4.7.0.72
packaging          23.0
Pillow             9.3.0
pip                23.0.1
pydub              0.25.1
python-dateutil    2.8.2
python-magic       0.4.27
PyWavelets         1.4.1
PyYAML             6.0
requests           2.28.1
resampy            0.3.1
scikit-image       0.19.3
scikit-learn       1.1.3
scipy              1.5.3
setuptools         65.6.3
six                1.16.0
SQLAlchemy         2.0.6
threadpoolctl      3.1.0
tifffile           2023.3.15
torch              1.12.1+cu113
torchaudio         0.12.1+cu113
torchvision        0.13.1+cu113
tqdm               4.65.0
typing_extensions  4.4.0
urllib3            1.26.13
wheel              0.38.4
yacs               0.1.8

errors:

Traceback (most recent call last):
  File "inference.py", line 98, in <module>
    main(args)
  File "inference.py", line 75, in main
    animate_from_coeff.generate(data, save_dir)
  File "/home/ubuntu/SadTalker/facerender/animate.py", line 154, in generate
    sound = AudioSegment.from_mp3(audio_path)
  File "/home/ubuntu/anaconda3/envs/sadtalker/lib/python3.8/site-packages/pydub/audio_segment.py", line 796, in from_mp3
    return cls.from_file(file, 'mp3', parameters=parameters)
  File "/home/ubuntu/anaconda3/envs/sadtalker/lib/python3.8/site-packages/pydub/audio_segment.py", line 773, in from_file
    raise CouldntDecodeError(
pydub.exceptions.CouldntDecodeError: Decoding failed. ffmpeg returned error code: 1

Output from ffmpeg/avlib:

ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
  configuration: --prefix=/home/ubuntu/anaconda3/envs/sadtalker --cc=/tmp/build/80754af9/ffmpeg_1587154242452/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
[mp3float @ 0x55dce8ccfc80] Header missing
    Last message repeated 162 times
[mp3 @ 0x55dce8cc5fc0] decoding for stream 0 failed
[mp3 @ 0x55dce8cc5fc0] Could not find codec parameters for stream 0 (Audio: mp3 (mp3float), 0 channels, fltp): unspecified frame size
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Input #0, mp3, from './face-zxc.wav':
  Duration: N/A, start: 0.000000, bitrate: N/A
    Stream #0:0: Audio: mp3, 0 channels, fltp
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s32le (native))

How to solve this problem?

colab book running issue

there is problem with

no module found gfpgan
how to clone gfpgan in notebook can you please explore it

no such directory examples/results
I have created directory same as need
but now also showing same error

TypeError: zoom() got an unexpected keyword argument 'grid_mode'

python inference.py --driven_audio japanese.wav --source_image art_1.png --result_dir .
checkpoints/epoch_20.pth
checkpoints/auido2pose_00140-model.pth
checkpoints/auido2exp_00300-model.pth
checkpoints/facevid2vid_00189-model.pth.tar
checkpoints/mapping_00229-model.pth.tar
landmark Det:: 100%|██████████████████████████████| 1/1 [00:04<00:00,  4.67s/it]
 3DMM Extraction In Video:: 100%|█████████████████| 1/1 [00:00<00:00,  6.51it/s]
Traceback (most recent call last):
  File "inference.py", line 98, in <module>
    main(args)
  File "inference.py", line 73, in main
    data = get_facerender_data(coeff_path, crop_pic_path, first_coeff_path, audio_path, 
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/generate_facerender_batch.py", line 20, in get_facerender_data
    source_image = transform.resize(source_image, (256, 256, 3))
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/.venv/lib/python3.8/site-packages/skimage/transform/_warps.py", line 186, in resize
    out = ndi.zoom(image, zoom_factors, order=order, mode=ndi_mode,
TypeError: zoom() got an unexpected keyword argument 'grid_mode'

Different style

Is that possible to adapt this for cartoon/anime faces or this is just for human faces?

是否可以支持视频?

看到国产框架真的很棒,请问是否可以通过调参来支持视频。或者是否有后续计划?

可能存在的bug

现象:我在使用过程中,在图片路径的开头加了 .\ 表示是从当前目录下读取文件夹中的图片,但是并未读取出正确的文件格式。

原因:在预处理文件 src\utils\preprocess.py 里的第6行,是取split后的列表中的第二个元素来判断文件格式。
如果读入路径中出现了 . 或者 .. 就会导致判断出错,从而进入视频读取流程,
例如,传入的图片路径为 .\imgSrc\1.png 此时依照第66行代码的逻辑,读出的后缀名为 "\imgSrc\1" ,并不为三个中的任何一种。

解决:把split后的数组索引 1 改成 -1 ,是否会好一些?

Keep getting this error

Traceback (most recent call last):
File "inference.py", line 9, in
from generate_batch import get_data
File "C:\Users\chlyw\Desktop\SadTalker\generate_batch.py", line 5, in
import librosa
File "C:\Users\chlyw.conda\envs\sadtalker\lib\site-packages\librosa_init_.py", line 12, in
from . import core
File "C:\Users\chlyw.conda\envs\sadtalker\lib\site-packages\librosa\core_init_.py", line 102, in
from .time_frequency import * # pylint: disable=wildcard-import
File "C:\Users\chlyw.conda\envs\sadtalker\lib\site-packages\librosa\core\time_frequency.py", line 10, in
from ..util.exceptions import ParameterError
File "C:\Users\chlyw.conda\envs\sadtalker\lib\site-packages\librosa\util_init_.py", line 67, in
from .utils import * # pylint: disable=wildcard-import
File "C:\Users\chlyw.conda\envs\sadtalker\lib\site-packages\librosa\util\utils.py", line 111, in
def valid_audio(y, mono=True):
File "C:\Users\chlyw.conda\envs\sadtalker\lib\site-packages\librosa\cache.py", line 49, in wrapper
if self.cachedir is not None and self.level >= level:
AttributeError: 'CacheManager' object has no attribute 'cachedir'

About training code

Great job! We want to train the network based on our own datasets, could you please tell us when you will release the training code of Audio2exp or Audio2Pose?

RuntimeError: Unable to open checkpoints/shape_predictor_68_face_landmarks.dat

Thanks for your great work!
Flollow the process, have put all checkpoint files in the checkpoints path, but got this error:

python3 inference.py --driven_audio assets/nhk.wav --source_image assets/guren3.png --result_dir assets
checkpoints/epoch_20.pth

Traceback (most recent call last):
File "inference.py", line 98, in
main(args)
File "inference.py", line 48, in main
preprocess_model = CropAndExtract(path_of_lm_croper, path_of_net_recon_model, dir_of_BFM_fitting, device)
File "/SadTalker/preprocess.py", line 45, in init
self.croper = Croper(path_of_lm_croper)
File "/SadTalker/croper.py", line 38, in init
self.predictor = dlib.shape_predictor(path_of_lm)
RuntimeError: Unable to open checkpoints/shape_predictor_68_face_landmarks.dat

GPU usage Settings

Hi, great work !
Computer 16G GPU, but the maximum can only use 6G, I want it to generate results faster, where is this value set? Look forward to your reply. Thank you~

感觉可以考虑加入codeformer

configure environment error

numpy1.23.5 required python>=3.8.But as you telled, you install python=3.7.Whether you installed numpy successfully?

About expression

as its said sadtalker so expression are in sad but if we want our output in joyful expression or any other expression rather than same sad expression so how we implement for this ?

and library also work for video's same expression just give audio so he convert in to lips sync and other thing like expression of video and other thing are same?

数据集问题

请问作者是用的普通话数据集训练的模型吗?

关于如何通过调参提高分辨率?

请问我该如何对参数进行调整从而提高输出画面得分辨率呢,除了使用超分,我尝试通过修改facerender.yaml但中得generator_params中的reshape_depth和reshape_channel,但是报错了,请问有什么方法吗

cannot unpack non-iterable NoneType object

This is an excellent repo. Thank you for sharing. I am having the following issue with some of the image files. Tried saving as PNG, JPG, also saving on Photoshop without the ICC profile. No success. What can be the reason?

examples/source_image/yetinew3.png
./checkpoints/epoch_20.pth
./checkpoints/auido2pose_00140-model.pth
./checkpoints/auido2exp_00300-model.pth
./checkpoints/facevid2vid_00189-model.pth.tar
./checkpoints/mapping_00229-model.pth.tar
libpng warning: iCCP: known incorrect sRGB profile
Traceback (most recent call last):
  File "inference.py", line 132, in <module>
    main(args)
  File "inference.py", line 64, in main
    first_coeff_path, crop_pic_path, original_size =  preprocess_model.generate(pic_path, first_frame_dir, args.preprocess)
  File "/content/SadTalker/src/utils/preprocess.py", line 85, in generate
    x_full_frames, crop, quad = self.croper.crop(x_full_frames, xsize=pic_size)
TypeError: cannot unpack non-iterable NoneType object

audio sampling rate

Hello, I want to ask you about the audio sampling rate. In the example you provided, there are 16k, 24k and 48k. How much should I convert the audio sampling rate? In addition, could you tell me how to make the dataset

在win10环境下,跑--enhancer gfpgan报错,请问是什么问题呢?

landmark Det:: 100%|█████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.25s/it]
3DMM Extraction In Video:: 100%|████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.19it/s]
Face Renderer:: 100%|██████████████████████████████████████████████████████████████████| 68/68 [00:14<00:00, 4.80it/s]
Traceback (most recent call last):
File "inference.py", line 109, in
main(args)
File "inference.py", line 78, in main
animate_from_coeff.generate(data, save_dir, enhancer=args.enhancer)
File "E:\ai\SadTalker\facerender\animate.py", line 149, in generate
enhanced_images = face_enhancer(result, method=enhancer)
File "E:\ai\SadTalker\utils\face_enhancer.py", line 32, in enhancer
restorer = GFPGANer(
File "C:\ProgramData\Anaconda3\envs\sadtalker\lib\site-packages\gfpgan\utils.py", line 79, in init
self.face_helper = FaceRestoreHelper(
File "C:\ProgramData\Anaconda3\envs\sadtalker\lib\site-packages\facexlib\utils\face_restoration_helper.py", line 103, in init
self.face_parse = init_parsing_model(model_name='parsenet', device=self.device, model_rootpath=model_rootpath)
File "C:\ProgramData\Anaconda3\envs\sadtalker\lib\site-packages\facexlib\parsing_init_.py", line 20, in init_parsing_model
load_net = torch.load(model_path, map_location=lambda storage, loc: storage)
File "C:\ProgramData\Anaconda3\envs\sadtalker\lib\site-packages\torch\serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "C:\ProgramData\Anaconda3\envs\sadtalker\lib\site-packages\torch\serialization.py", line 938, in _legacy_load
typed_storage._storage._set_from_file(
RuntimeError: unexpected EOF, expected 381073 more bytes. The file might be corrupted.

ubuntu Building llvmlite requires LLVM 7.0+ Be sure to set LLVM_CONFIG to the right executable path.

Requirement already satisfied: contourpy>=1.0.1 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from matplotlib->filterpy->facexlib==0.2.5->-r requirements3d.txt (line 17)) (1.0.6)
Requirement already satisfied: python-dateutil>=2.7 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from matplotlib->filterpy->facexlib==0.2.5->-r requirements3d.txt (line 17)) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from matplotlib->filterpy->facexlib==0.2.5->-r requirements3d.txt (line 17)) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from matplotlib->filterpy->facexlib==0.2.5->-r requirements3d.txt (line 17)) (4.38.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from requests->basicsr==1.4.2->-r requirements3d.txt (line 16)) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from requests->basicsr==1.4.2->-r requirements3d.txt (line 16)) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from requests->basicsr==1.4.2->-r requirements3d.txt (line 16)) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from requests->basicsr==1.4.2->-r requirements3d.txt (line 16)) (1.26.13)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (1.8.1)
Requirement already satisfied: markdown>=2.6.8 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (3.4.1)
Requirement already satisfied: protobuf<4,>=3.9.2 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (3.20.0)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (0.6.1)
Requirement already satisfied: absl-py>=0.4 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (1.3.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (2.14.1)
Requirement already satisfied: werkzeug>=1.0.1 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (2.2.2)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (0.4.6)
Requirement already satisfied: grpcio>=1.24.3 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (1.50.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (0.2.8)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (5.2.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from google-auth<3,>=1.6.3->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (4.9)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (1.3.1)
Requirement already satisfied: importlib-metadata>=4.4 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from markdown>=2.6.8->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (5.1.0)
Requirement already satisfied: zipp>=0.5 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (3.11.0)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (3.2.2)
Requirement already satisfied: MarkupSafe>=2.1.1 in /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages (from werkzeug>=1.0.1->tb-nightly->basicsr==1.4.2->-r requirements3d.txt (line 16)) (2.1.1)
Building wheels for collected packages: llvmlite
Building wheel for llvmlite (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/oem/miniconda3/envs/ldm2/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/setup.py'"'"'; file='"'"'/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-8kcoad95
cwd: /tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/
Complete output (16 lines):
running bdist_wheel
/home/oem/miniconda3/envs/ldm2/bin/python /tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/ffi/build.py
LLVM version... llvm-config: /home/oem/miniconda3/lib/libtinfo.so.6: no version information available (required by llvm-config)
14.0.0

Traceback (most recent call last):
File "/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/ffi/build.py", line 168, in
main()
File "/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/ffi/build.py", line 158, in main
main_posix('linux', '.so')
File "/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/ffi/build.py", line 120, in main_posix
raise RuntimeError(msg)
RuntimeError: Building llvmlite requires LLVM 7.0+ Be sure to set LLVM_CONFIG to the right executable path.
Read the documentation at http://llvmlite.pydata.org/ for more information about building llvmlite.

error: command '/home/oem/miniconda3/envs/ldm2/bin/python' failed with exit code 1

ERROR: Failed building wheel for llvmlite
Running setup.py clean for llvmlite
Failed to build llvmlite
Installing collected packages: llvmlite, scipy, numba, joblib, imageio, scikit-image, resampy, yacs, trimesh, librosa, kornia, imageio-ffmpeg, face-alignment, dlib-bin
Attempting uninstall: llvmlite
Found existing installation: llvmlite 0.39.1
Uninstalling llvmlite-0.39.1:
Successfully uninstalled llvmlite-0.39.1
Running setup.py install for llvmlite ... error
ERROR: Command errored out with exit status 1:
command: /home/oem/miniconda3/envs/ldm2/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/setup.py'"'"'; file='"'"'/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-ochopfnf/install-record.txt --single-version-externally-managed --compile --install-headers /home/oem/miniconda3/envs/ldm2/include/python3.9/llvmlite
cwd: /tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/
Complete output (21 lines):
running install
/home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
got version from file /tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/llvmlite/_version.py {'version': '0.31.0', 'full': 'fe7d985f6421d87f613bd414479d29d912771562'}
running build_ext
/home/oem/miniconda3/envs/ldm2/bin/python /tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/ffi/build.py
LLVM version... llvm-config: /home/oem/miniconda3/lib/libtinfo.so.6: no version information available (required by llvm-config)
14.0.0

Traceback (most recent call last):
  File "/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/ffi/build.py", line 168, in <module>
    main()
  File "/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/ffi/build.py", line 158, in main
    main_posix('linux', '.so')
  File "/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/ffi/build.py", line 120, in main_posix
    raise RuntimeError(msg)
RuntimeError: Building llvmlite requires LLVM 7.0+ Be sure to set LLVM_CONFIG to the right executable path.
Read the documentation at http://llvmlite.pydata.org/ for more information about building llvmlite.

error: command '/home/oem/miniconda3/envs/ldm2/bin/python' failed with exit code 1
----------------------------------------

Rolling back uninstall of llvmlite
Moving to /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/llvmlite
from /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/~lvmlite
Moving to /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/llvmlite-0.39.1-py3.9.egg-info
from /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/~lvmlite-0.39.1-py3.9.egg-info
ERROR: Command errored out with exit status 1: /home/oem/miniconda3/envs/ldm2/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/setup.py'"'"'; file='"'"'/tmp/pip-install-mi8t38g9/llvmlite_989da752628c422a8009813d14822a81/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-ochopfnf/install-record.txt --single-version-externally-managed --compile --install-headers /home/oem/miniconda3/envs/ldm2/include/python3.9/llvmlite Check the logs for full command output.

UPDATE

sudo apt-get install libllvm-14-ocaml-dev libllvm14 llvm-14 llvm-14-dev llvm-14-doc llvm-14-examples llvm-14-runtime

and I update .zshrc

export LD_LIBRARY_PATH=$HOME/.miniconda3/envs/ldm2/lib:$LD_LIBRARY_PATH
export LLVM_CONFIG=/usr/bin/llvm-config-14

but no joy.

Running setup.py clean for llvmlite
Failed to build llvmlite
Installing collected packages: llvmlite, scipy, numba, joblib, imageio, scikit-image, resampy, yacs, trimesh, librosa, kornia, imageio-ffmpeg, face-alignment, dlib-bin
Attempting uninstall: llvmlite
Found existing installation: llvmlite 0.39.1
Uninstalling llvmlite-0.39.1:
Successfully uninstalled llvmlite-0.39.1
Running setup.py install for llvmlite ... error
ERROR: Command errored out with exit status 1:
command: /home/oem/miniconda3/envs/ldm2/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/setup.py'"'"'; file='"'"'/tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-ubj4g4b4/install-record.txt --single-version-externally-managed --compile --install-headers /home/oem/miniconda3/envs/ldm2/include/python3.9/llvmlite
cwd: /tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/
Complete output (20 lines):
running install
/home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
got version from file /tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/llvmlite/_version.py {'version': '0.31.0', 'full': 'fe7d985f6421d87f613bd414479d29d912771562'}
running build_ext
/home/oem/miniconda3/envs/ldm2/bin/python /tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/ffi/build.py
LLVM version... 14.0.0

Traceback (most recent call last):
  File "/tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/ffi/build.py", line 168, in <module>
    main()
  File "/tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/ffi/build.py", line 158, in main
    main_posix('linux', '.so')
  File "/tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/ffi/build.py", line 120, in main_posix
    raise RuntimeError(msg)
RuntimeError: Building llvmlite requires LLVM 7.0+ Be sure to set LLVM_CONFIG to the right executable path.
Read the documentation at http://llvmlite.pydata.org/ for more information about building llvmlite.

error: command '/home/oem/miniconda3/envs/ldm2/bin/python' failed with exit code 1
----------------------------------------

Rolling back uninstall of llvmlite
Moving to /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/llvmlite
from /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/~lvmlite
Moving to /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/llvmlite-0.39.1-py3.9.egg-info
from /home/oem/miniconda3/envs/ldm2/lib/python3.9/site-packages/~lvmlite-0.39.1-py3.9.egg-info
ERROR: Command errored out with exit status 1: /home/oem/miniconda3/envs/ldm2/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/setup.py'"'"'; file='"'"'/tmp/pip-install-9jv2q5m2/llvmlite_ddde0f598ad9469b92939673cd0b4f4e/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-ubj4g4b4/install-record.txt --single-version-externally-managed --compile --install-headers /home/oem/miniconda3/envs/ldm2/include/python3.9/llvmlite Check the logs for full command output.

pip install lvmlite succeeds.

docker运行gpu配置问题

docker run -it --gpus all nvidia/cuda:11.4.0-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

pytorch throws mat mismatch when using short audio as input

python inference.py --driven_audio ./speech.wav --source_image face.png --batch_size 6 --result_dir ./examples/results
checkpoints\epoch_20.pth
checkpoints\auido2pose_00140-model.pth
checkpoints\auido2exp_00300-model.pth
checkpoints\facevid2vid_00189-model.pth.tar
checkpoints\mapping_00229-model.pth.tar
landmark Det:: 100%|█████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.67s/it]
3DMM Extraction In Video:: 100%|████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.86s/it]
Traceback (most recent call last):
File "inference.py", line 99, in
main(args)
File "inference.py", line 71, in main
coeff_path = audio_to_coeff.generate(batch, save_dir, pose_style)
File "D:\Workspace\projects\sadtalker\SadTalker\test_audio2coeff.py", line 75, in generate
results_dict_pose = self.audio2pose_model.test(batch)
File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\audio2pose.py", line 86, in test
batch = self.netG.test(batch)
File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\cvae.py", line 49, in test
return self.decoder(batch)
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Workspace\projects\sadtalker\SadTalker\audio2pose_models\cvae.py", line 139, in forward
x_out = self.MLP(x_in) # bs layer_sizes[-1]
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
input = module(input)
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Software\Anaconda\anaconda3\envs\sadtalker\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x196 and 262x128)

Below is my audio file:
speech.zip

About Pose

Thanks for your great work! I found that the generated head is moving around even if I set all the pose to be same with the first frame's pose. Is it possible to remove the transitions of the head?

眼睛动作的问题

嗨, 感谢分享那么棒的工作。

我发现生成的视频眼睛一般都没法完全闭上,所以稍微有点不自然,是否有改进方法呢?thanks!

使用cpu版本渲染时解码音频部分出现异常

完整异常栈:
Traceback (most recent call last): File "E:\learn\sadtalker\inference.py", line 141, in <module> main(args) File "E:\learn\sadtalker\inference.py", line 53, in main audio_to_coeff = Audio2Coeff(audio2pose_checkpoint, audio2pose_yaml_path, File "E:\learn\sadtalker\src\test_audio2coeff.py", line 35, in __init__ self.audio2pose_model = Audio2Pose(cfg_pose, wav2lip_checkpoint, device=device) File "E:\learn\sadtalker\src\audio2pose_models\audio2pose.py", line 15, in __init__ self.audio_encoder = AudioEncoder(wav2lip_checkpoint,device=device) File "E:\learn\sadtalker\src\audio2pose_models\audio_encoder.py", line 45, in __init__ wav2lip_state_dict = torch.load(wav2lip_checkpoint)['state_dict'] File "E:\venv\sadtalker\lib\site-packages\torch\serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "E:\venv\sadtalker\lib\site-packages\torch\serialization.py", line 930, in _legacy_load result = unpickler.load() File "E:\venv\sadtalker\lib\site-packages\torch\serialization.py", line 876, in persistent_load wrap_storage=restore_location(obj, location), File "E:\venv\sadtalker\lib\site-packages\torch\serialization.py", line 175, in default_restore_location result = fn(storage, location) File "E:\venv\sadtalker\lib\site-packages\torch\serialization.py", line 152, in _cuda_deserialize device = validate_cuda_device(location) File "E:\venv\sadtalker\lib\site-packages\torch\serialization.py", line 136, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on a CUDA ' RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

解决方法:1.AudioEncoder构造函数添加device,从外部传入。
2. torch.load(wav2lip_checkpoint)['state_dict'] 改为
wav2lip_state_dict = torch.load(wav2lip_checkpoint,map_location=torch.device(device))['state_dict']
顺带一提,#注释最后最后一位用\结尾的话会导致pychram的debug点不了下一行

I don't know where to put "w/ still mode"

Thanks! Great product, I am running on RTX 3090.

I noticed that you have added a version called "w/ still mode" which supposedly suppresses facial movements, and I want to install it immediately, but I couldn't find that command in the readme.

I don't know whether to add "w/ still mode" to "-expression_scale" or to the "--enhancer" section, and I don't know how to describe it, since there is no explanation of how to do so.

real time talking head

hi, great work ! do you think this can work in realtime interactive scenarios. like talking that are love/interactive

'CacheManager' object has no attribute 'cachedir'

Hitting this error when running an inference;

(.venv) (base) ➜  SadTalker git:(main) ✗ python inference.py --driver_audio japanese.wav --source_image art_1.png --result_dir .
Traceback (most recent call last):
  File "inference.py", line 9, in <module>
    from generate_batch import get_data
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/generate_batch.py", line 5, in <module>
    import librosa    
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/.venv/lib/python3.8/site-packages/librosa/__init__.py", line 12, in <module>
    from . import core
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/.venv/lib/python3.8/site-packages/librosa/core/__init__.py", line 102, in <module>
    from .time_frequency import *  # pylint: disable=wildcard-import
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/.venv/lib/python3.8/site-packages/librosa/core/time_frequency.py", line 10, in <module>
    from ..util.exceptions import ParameterError
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/.venv/lib/python3.8/site-packages/librosa/util/__init__.py", line 67, in <module>
    from .utils import *  # pylint: disable=wildcard-import
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/.venv/lib/python3.8/site-packages/librosa/util/utils.py", line 111, in <module>
    def valid_audio(y, mono=True):
  File "/media/user/home/04_MachineLearning/10_talkingheads/SadTalker/.venv/lib/python3.8/site-packages/librosa/cache.py", line 49, in wrapper
    if self.cachedir is not None and self.level >= level:
AttributeError: 'CacheManager' object has no attribute 'cachedir'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.