Giter VIP home page Giter VIP logo

lipsick's Introduction

LipSickV2

Secondary GIF

Secondary GIF

lipsick's People

Contributors

gnome101 avatar inferencer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lipsick's Issues

Could you check if I'm implementing it correctly?

Thank you always for your amazing work! :)

I have a few questions, so I wanted to talk to you about them.

  1. The mouth seems to have a low resolution. I wonder if this is something I need to train to improve.
  2. When small and large faces appear alternately in the video, it looks like the video is glitching. I wonder if I'm doing something wrong here.
  3. I have attached a video, and in this particular video, it can't seem to detect the face. Could you check it out for me?
    video_and_audio.zip

Train with own Video

Hey there,

great project! Just managed to install it on my GCP. I inference my first video and wonder if its possible to train with my own specific video (~4 Minutes long) to get more realistic results of a that specific video/setting like in GeneFacePlusPlus?

Thanks in advance!

Lipsick Never worked not even a single time for me

I've tried it multiple times with different videos at 25fps and different resolutions, it never works and give me the error raise ValueError("No faces found in the frame.") or the error 13 Permission denied, i have full permission of the folders and even changed then. I've installed all the requirements, what am i doing wrong?

Video is not being created

console1
console

I'm getting 0 errors and everything seems to work fine but i get no video output.
Any ideas how to fix the problem?

GPU usage feedback

Looking for feedback from users

I'm running Windows 10 cuda 11.6 and using 50% of my dedicated 4gb vram and its running about 4fps

another user with twice the compute power was using max vram but was running slower with the same setup

everyone else doing ok?

Effect problem

Hi, thanks for your great work.
The degree of mouth opening has become smaller compared to the original video. Maybe consider using few shot instead of zero shot. Will the training code be open source? Use few epochs traing to fit more corner problems?

No faces found on the image error

Thanks for working on this potentially amazing project. I tried the tool through Google Colab and got the error in the title. Are there any particular requirements for the video resolution, quality, etc.?

Hugging face space Client error '403 Forbidden' for url

I tried cloning the hugging face space and calling it with python gradient client, but always get the error 403 forbidden url '
https://[username]-lipsick.hf.space/file=/home/user/app/asserts/inference_result/assets2FmanTalking_LIPSICK.mp4'
This is my code. I'm using a write key. When I swap the model out for other models, it can run successfully, but I just can't get this model to work. Anyone else facing the same issue?

from dotenv import load_dotenv
from gradio_client import Client, file

HF_TOKEN = os.getenv("HF_TOKEN")
client = Client("[username]/LipSick", hf_token=HF_TOKEN)
result = client.predict(
source_video=file(source_video_path),
driving_audio=file(source_audio_path),
api_name="/predict",
)

Skin color around mouth / Ability to generalize?

Hello @Inferencer.

This is a fantastic project - it's fast, and the lip movement seems to be accurate to the audio.

Right now, I'm experiencing some color mismatching in the frame box. I'm guessing it's using the rectangle landmark for each frame around the mouth or something similar? Is there a way to programmatically mask just the frame-by-frame modified part and edit only this part of the source video (eg mouth + chin + lower face)?

Guessing color mismatching could have something to do with the training of the current model not having seen enough to generalize? Interested in hearing any thoughts. I know for LipSync3D, they built a lighting normalization mechanism to make it work in any lighting with less dependency on the model's ability to learn to render the frame with lighting (https://www.youtube.com/watch?v=L1StbX9OznY). Perhaps this may be something useful we could leverage?

I might be able to have compute to help improving some of the models in the near future - where would this be the most useful?

I just pulled from main and started the gradio server, then uploaded video/audio and pressed the button. Here's an example I've produced:
https://github.com/Inferencer/LipSick/assets/17946756/1b721adb-6459-4448-a086-dfedeb5aed84

Example of upcoming box removal feature

Just though I would share an example of the upcoming much needed face mask/ box removal feature, bear in mind custom frames are not used on this video so the skin quality is a bit off, it runs super fast but unfortunately to achieve this effect there is tracking of the lipsynced face which is going to slow things down, we can't re-use the tracking data from the original video due to chin's moving while speaking. right now I'm just adding features but will make everything more suitable for people who are offering this tool for SaaS so data is reused etc. I assume this will be implemented next week and I will close this issue when it is

https://youtu.be/x3Ufuef1nEE

Collab not working

If anybody can figure out why the collab is not working I would appreciate it, I recently updated the collab files but ran into some errors with the GPU usage, I have tried a couple of different tensorflow versions but I fear the issue could be related to an upgrade to Cuda 12.2 with no fallback support so its attempting to use an AMD GPU rather than a NVIDIA GPU on the T4? based on this post and the error logs is why that's my reasoning googlecolab/colabtools#4214

Need impeccable Lipsync, Have tried GFP and GPEN-GAN post processing on each frame

I am trying to get most realistic lip-syncing; I have tried different models but feel this is achievable with DINET.
I have integrated GFP-Gan and GPEN-Gan in the code such that it enhances full frames after lip-syncing (Will share that code soon) and sharing the output.
Other output is from a third-party service with is paid, any more suggestions as to how to get better LipSync.

Also, will it be possible to connect via telegram, email or any other platform @Inferencer , I have been working on this for a while and have some ideas I would like to discuss with you.

Can't get working with 30fps

extracting frames from video: /tmp/gradio/2026283a8dead122868293158b9fa514f5073269/SaveIG.mp4 Traceback (most recent call last): File "/home/user/app/inference.py", line 91, in <module> ds_feature = DSModel.compute_audio_feature(opt.driving_audio_path) File "/home/user/app/utils/deep_speech.py", line 43, in compute_audio_feature audio_sample_rate, audio = wavfile.read(audio_path) File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/scipy/io/wavfile.py", line 650, in read file_size, is_big_endian = _read_riff_chunk(fid) File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/scipy/io/wavfile.py", line 521, in _read_riff_chunk raise ValueError(f"File format {repr(str1)} not understood. Only " ValueError: File format b'\x00\x00\x00\x1c' not understood. Only 'RIFF' and 'RIFX' supported. An error occurred: Command '['python', 'inference.py', '--source_video_path', '/tmp/gradio/2026283a8dead122868293158b9fa514f5073269/SaveIG.mp4', '--driving_audio_path', '/tmp/gradio/4a324a4a5cdf98a616bdb6c5b89d119ffc82e93d/New Recording 14.wav', '--pretrained_lipsick_path', './asserts/pretrained_lipsick.pth', '--deepspeech_model_path', './asserts/output_graph.pb', '--res_video_dir', './asserts/inference_result']' returned non-zero exit status 1.

Will the code continue to be optimized?

The lip syncing effect is still acceptable, but the result of applying the processed frame images back onto the original face is not very good. Is it possible to continue optimizing it?

V_LIPSICK.mp4

Face misalignment

Does anyone know what is the problem here? It seems to be very misaligned causing the weird edges. Thanks

access denied

Hi there,

I'm running into an issue when processing my video. Everything starts fine and it begins to process until I am hit with an "access denied" "ERRNO13". I'm running visual studio code as an administrator as well as python as an administrator.

I am using a .WAV file for the audio, and mp4 for the video.

I have also attached a screenshot of my folder structure which I believe to be ok. Any idea? Thank you so much.

(base) PS C:\Users\madis\LipSick> conda activate LipSick
(LipSick) PS C:\Users\madis\LipSick> python app.py
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.

Extracting frames from video
Hang tight! We're processing your audio, which might take a little while depending on its length.
Tracking Face
Traceback (most recent call last):
File "C:\Users\madis\LipSick\inference.py", line 97, in
video_landmark_data = np.array([load_landmark_dlib(frame) for frame in video_frame_path_list])
File "C:\Users\madis\LipSick\inference.py", line 97, in
video_landmark_data = np.array([load_landmark_dlib(frame) for frame in video_frame_path_list])
File "C:\Users\madis\LipSick\inference.py", line 57, in load_landmark_dlib
raise ValueError("No faces found in the image.")
ValueError: No faces found in the image.
An error occurred: Command '['python', 'inference.py', '--source_video_path', 'C:\Users\madis\AppData\Local\Temp\gradio\7d6c826546791df8eb196725e6e44b5a3ba2b80e\UPLOADMP4.mp4', '--driving_audio_path', 'C:\Users\madis\AppData\Local\Temp\gradio\c33c344af7ad688d606d5f6b1c58cb495cf9795a\WAV.wav', '--pretrained_lipsick_path', './asserts/pretrained_lipsick.pth', '--deepspeech_model_path', './asserts/output_graph.pb', '--res_video_dir', './asserts/inference_result', '--custom_reference_frames', '5,5,5,5,5', '--custom_crop_radius', '0', '--auto_mask']' returned non-zero exit status 1.
Traceback (most recent call last):
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\queueing.py", line 527, in process_events
response = await route_utils.call_process_api(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\route_utils.py", line 261, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\blocks.py", line 1795, in process_api
data = await self.postprocess_data(fn_index, result["prediction"], state)
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\blocks.py", line 1625, in postprocess_data
outputs_cached = await processing_utils.async_move_files_to_cache(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\processing_utils.py", line 410, in async_move_files_to_cache
return await client_utils.async_traverse(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio_client\utils.py", line 1006, in async_traverse
new_obj[key] = await async_traverse(value, func, is_root)
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio_client\utils.py", line 1002, in async_traverse
return await func(json_obj)
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\processing_utils.py", line 384, in _move_to_cache
temp_file_path = await block.async_move_resource_to_block_cache(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\blocks.py", line 275, in async_move_resource_to_block_cache
temp_file_path = processing_utils.save_file_to_cache(
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\processing_utils.py", line 188, in save_file_to_cache
temp_dir = hash_file(file_path)
File "C:\Users\madis\miniconda3\envs\LipSick\lib\site-packages\gradio\processing_utils.py", line 120, in hash_file
with open(file_path, "rb") as f:
PermissionError: [Errno 13] Permission denied: 'C:\Users\madis\LipSick'

Hugging Face Space Not Working With T4 Medium or T4 Small

Really love the work you are doing with this repo. This is by far the best OS tool for lip sync.

When I run this on my Hugging Face Space, this is what the error in the logs show:

2024-06-14 16:57:38.538093: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-14 16:57:38.682309: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-06-14 16:57:38.689257: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cv2/../../lib64:
2024-06-14 16:57:38.689283: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-06-14 16:57:38.722847: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-14 16:57:39.492782: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cv2/../../lib64:
2024-06-14 16:57:39.492930: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cv2/../../lib64:
2024-06-14 16:57:39.492947: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
extracting frames from video: /tmp/gradio/e0f0f22fd57f2316890eab0ee07fcdf2fa100e1f/whistleblower.mp4
Traceback (most recent call last):
File "/home/user/app/inference.py", line 88, in
video_size = extract_frames_from_video(opt.source_video_path, video_frame_dir)
File "/home/user/app/inference.py", line 51, in extract_frames_from_video
cv2.imwrite(result_path, frame)
cv2.error: OpenCV(4.10.0) /io/opencv/modules/imgcodecs/src/loadsave.cpp:798: error: (-215:Assertion failed) !_img.empty() in function 'imwrite'

An error occurred: Command '['python', 'inference.py', '--source_video_path', '/tmp/gradio/e0f0f22fd57f2316890eab0ee07fcdf2fa100e1f/whistleblower.mp4', '--driving_audio_path', '/tmp/gradio/7c956915b2484e2b1309eedd152dbda3a56faab1/elevenlabs.wav', '--pretrained_lipsick_path', './asserts/pretrained_lipsick.pth', '--deepspeech_model_path', './asserts/output_graph.pb', '--res_video_dir', './asserts/inference_result']' returned non-zero exit status 1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.