inferencer / lipsick Goto Github PK
View Code? Open in Web Editor NEW🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮
License: The Unlicense
🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮
License: The Unlicense
Thank you always for your amazing work! :)
I have a few questions, so I wanted to talk to you about them.
Hey there,
great project! Just managed to install it on my GCP. I inference my first video and wonder if its possible to train with my own specific video (~4 Minutes long) to get more realistic results of a that specific video/setting like in GeneFacePlusPlus?
Thanks in advance!
I've tried it multiple times with different videos at 25fps and different resolutions, it never works and give me the error raise ValueError("No faces found in the frame.") or the error 13 Permission denied, i have full permission of the folders and even changed then. I've installed all the requirements, what am i doing wrong?
Would you please add complete training code and scripts and guidelines ?
Looking for feedback from users
I'm running Windows 10 cuda 11.6 and using 50% of my dedicated 4gb vram and its running about 4fps
another user with twice the compute power was using max vram but was running slower with the same setup
everyone else doing ok?
Hi, thanks for your great work.
The degree of mouth opening has become smaller compared to the original video. Maybe consider using few shot instead of zero shot. Will the training code be open source? Use few epochs traing to fit more corner problems?
Thanks for working on this potentially amazing project. I tried the tool through Google Colab and got the error in the title. Are there any particular requirements for the video resolution, quality, etc.?
After the original frame is passed through the model the skin tone appears to be a little darker.
I tried cloning the hugging face space and calling it with python gradient client, but always get the error 403 forbidden url '
https://[username]-lipsick.hf.space/file=/home/user/app/asserts/inference_result/assets2FmanTalking_LIPSICK.mp4'
This is my code. I'm using a write key. When I swap the model out for other models, it can run successfully, but I just can't get this model to work. Anyone else facing the same issue?
from dotenv import load_dotenv
from gradio_client import Client, file
HF_TOKEN = os.getenv("HF_TOKEN")
client = Client("[username]/LipSick", hf_token=HF_TOKEN)
result = client.predict(
source_video=file(source_video_path),
driving_audio=file(source_audio_path),
api_name="/predict",
)
I was wondering if it would be possible to remove beard before sending to model then paste it on output video.
Hi @Inferencer
Hello @Inferencer.
This is a fantastic project - it's fast, and the lip movement seems to be accurate to the audio.
Right now, I'm experiencing some color mismatching in the frame box. I'm guessing it's using the rectangle landmark for each frame around the mouth or something similar? Is there a way to programmatically mask just the frame-by-frame modified part and edit only this part of the source video (eg mouth + chin + lower face)?
Guessing color mismatching could have something to do with the training of the current model not having seen enough to generalize? Interested in hearing any thoughts. I know for LipSync3D, they built a lighting normalization mechanism to make it work in any lighting with less dependency on the model's ability to learn to render the frame with lighting (https://www.youtube.com/watch?v=L1StbX9OznY). Perhaps this may be something useful we could leverage?
I might be able to have compute to help improving some of the models in the near future - where would this be the most useful?
I just pulled from main and started the gradio server, then uploaded video/audio and pressed the button. Here's an example I've produced:
https://github.com/Inferencer/LipSick/assets/17946756/1b721adb-6459-4448-a086-dfedeb5aed84
Just though I would share an example of the upcoming much needed face mask/ box removal feature, bear in mind custom frames are not used on this video so the skin quality is a bit off, it runs super fast but unfortunately to achieve this effect there is tracking of the lipsynced face which is going to slow things down, we can't re-use the tracking data from the original video due to chin's moving while speaking. right now I'm just adding features but will make everything more suitable for people who are offering this tool for SaaS so data is reused etc. I assume this will be implemented next week and I will close this issue when it is
If anybody can figure out why the collab is not working I would appreciate it, I recently updated the collab files but ran into some errors with the GPU usage, I have tried a couple of different tensorflow versions but I fear the issue could be related to an upgrade to Cuda 12.2 with no fallback support so its attempting to use an AMD GPU rather than a NVIDIA GPU on the T4? based on this post and the error logs is why that's my reasoning googlecolab/colabtools#4214
I am trying to get most realistic lip-syncing; I have tried different models but feel this is achievable with DINET.
I have integrated GFP-Gan and GPEN-Gan in the code such that it enhances full frames after lip-syncing (Will share that code soon) and sharing the output.
Other output is from a third-party service with is paid, any more suggestions as to how to get better LipSync.
Also, will it be possible to connect via telegram, email or any other platform @Inferencer , I have been working on this for a while and have some ideas I would like to discuss with you.
extracting frames from video: /tmp/gradio/2026283a8dead122868293158b9fa514f5073269/SaveIG.mp4 Traceback (most recent call last): File "/home/user/app/inference.py", line 91, in <module> ds_feature = DSModel.compute_audio_feature(opt.driving_audio_path) File "/home/user/app/utils/deep_speech.py", line 43, in compute_audio_feature audio_sample_rate, audio = wavfile.read(audio_path) File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/scipy/io/wavfile.py", line 650, in read file_size, is_big_endian = _read_riff_chunk(fid) File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/scipy/io/wavfile.py", line 521, in _read_riff_chunk raise ValueError(f"File format {repr(str1)} not understood. Only " ValueError: File format b'\x00\x00\x00\x1c' not understood. Only 'RIFF' and 'RIFX' supported. An error occurred: Command '['python', 'inference.py', '--source_video_path', '/tmp/gradio/2026283a8dead122868293158b9fa514f5073269/SaveIG.mp4', '--driving_audio_path', '/tmp/gradio/4a324a4a5cdf98a616bdb6c5b89d119ffc82e93d/New Recording 14.wav', '--pretrained_lipsick_path', './asserts/pretrained_lipsick.pth', '--deepspeech_model_path', './asserts/output_graph.pb', '--res_video_dir', './asserts/inference_result']' returned non-zero exit status 1.
The lip syncing effect is still acceptable, but the result of applying the processed frame images back onto the original face is not very good. Is it possible to continue optimizing it?
支持hubert么?中文效果如何?
Hi there,
I'm running into an issue when processing my video. Everything starts fine and it begins to process until I am hit with an "access denied" "ERRNO13". I'm running visual studio code as an administrator as well as python as an administrator.
I am using a .WAV file for the audio, and mp4 for the video.
I have also attached a screenshot of my folder structure which I believe to be ok. Any idea? Thank you so much.
(base) PS C:\Users\madis\LipSick> conda activate LipSick
(LipSick) PS C:\Users\madis\LipSick> python app.py
Running on local URL: http://127.0.0.1:7860
share=True
in launch()
.Extracting frames from video
Hang tight! We're processing your audio, which might take a little while depending on its length.
Tracking Face
Traceback (most recent call last):
File "C:\Users\madis\LipSick\inference.py", line 97, in
video_landmark_data = np.array([load_landmark_dlib(frame) for frame in video_frame_path_list])
File "C:\Users\madis\LipSick\inference.py", line 97, in
video_landmark_data = np.array([load_landmark_dlib(frame) for frame in video_frame_path_list])
File "C:\Users\madis\LipSick\inference.py", line 57, in load_landmark_dlib
raise ValueError("No faces found in the image.")
ValueError: No faces found in the image.
An error occurred: Command '['python', 'inference.py', '--source_video_path', 'C:\Users\madis\AppData\Local\Temp\gradio\7d6c826546791df8eb196725e6e44b5a3ba2b80e\UPLOADMP4.mp4', '--driving_audio_path', 'C:\Users\madis\AppData\Local\Temp\gradio\c33c344af7ad688d606d5f6b1c58cb495cf9795a\WAV.wav', '--pretrained_lipsick_path', './asserts/pretrained_lipsick.pth', '--deepspeech_model_path', './asserts/output_graph.pb', '--res_video_dir', './asserts/inference_result', '--custom_reference_frames', '5,5,5,5,5', '--custom_crop_radius', '0', '--auto_mask']' returned non-zero exit status 1.
Traceback (most recent call last):
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\queueing.py", line 527, in process_events
response = await route_utils.call_process_api(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\route_utils.py", line 261, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\blocks.py", line 1795, in process_api
data = await self.postprocess_data(fn_index, result["prediction"], state)
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\blocks.py", line 1625, in postprocess_data
outputs_cached = await processing_utils.async_move_files_to_cache(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\processing_utils.py", line 410, in async_move_files_to_cache
return await client_utils.async_traverse(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio_client\utils.py", line 1006, in async_traverse
new_obj[key] = await async_traverse(value, func, is_root)
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio_client\utils.py", line 1002, in async_traverse
return await func(json_obj)
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\processing_utils.py", line 384, in _move_to_cache
temp_file_path = await block.async_move_resource_to_block_cache(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\blocks.py", line 275, in async_move_resource_to_block_cache
temp_file_path = processing_utils.save_file_to_cache(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\processing_utils.py", line 188, in save_file_to_cache
temp_dir = hash_file(file_path)
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\processing_utils.py", line 120, in hash_file
with open(file_path, "rb") as f:
PermissionError: [Errno 13] Permission denied: 'C:\Users\madis\LipSick'
mediapipe的速度很快,可以试试,就是 frame_landmark的索引得调整。
Really love the work you are doing with this repo. This is by far the best OS tool for lip sync.
When I run this on my Hugging Face Space, this is what the error in the logs show:
2024-06-14 16:57:38.538093: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-14 16:57:38.682309: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-06-14 16:57:38.689257: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cv2/../../lib64:
2024-06-14 16:57:38.689283: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-06-14 16:57:38.722847: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-14 16:57:39.492782: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cv2/../../lib64:
2024-06-14 16:57:39.492930: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cv2/../../lib64:
2024-06-14 16:57:39.492947: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
extracting frames from video: /tmp/gradio/e0f0f22fd57f2316890eab0ee07fcdf2fa100e1f/whistleblower.mp4
Traceback (most recent call last):
File "/home/user/app/inference.py", line 88, in
video_size = extract_frames_from_video(opt.source_video_path, video_frame_dir)
File "/home/user/app/inference.py", line 51, in extract_frames_from_video
cv2.imwrite(result_path, frame)
cv2.error: OpenCV(4.10.0) /io/opencv/modules/imgcodecs/src/loadsave.cpp:798: error: (-215:Assertion failed) !_img.empty() in function 'imwrite'
An error occurred: Command '['python', 'inference.py', '--source_video_path', '/tmp/gradio/e0f0f22fd57f2316890eab0ee07fcdf2fa100e1f/whistleblower.mp4', '--driving_audio_path', '/tmp/gradio/7c956915b2484e2b1309eedd152dbda3a56faab1/elevenlabs.wav', '--pretrained_lipsick_path', './asserts/pretrained_lipsick.pth', '--deepspeech_model_path', './asserts/output_graph.pb', '--res_video_dir', './asserts/inference_result']' returned non-zero exit status 1.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.