karanvivekbhargava / obamanet Goto Github PK
View Code? Open in Web Editor NEWObamaNet : Photo-realistic lip-sync from audio (Unofficial port)
License: MIT License
ObamaNet : Photo-realistic lip-sync from audio (Unofficial port)
License: MIT License
Hi,
I see that the mouth hasn't been mapped well by pix2pix and is pretty low res.
Cheers
Hello there! I am not able to find the vid2wav.py
and tool/process.py
here. Is there a master branch somewhere else?
Tools found in Pix2pix repository
In the readme the command for converting videos to images is as follows:
ffmpeg -i 00001.mp4 -r 1/5 -vf scale=-1:720 images/00001-$filename%05d.bmp
This will only extract an image every 5 seconds (per -r 1/5
) - is this how it should be done and if so, why? Is this arbitrary? I see in processing.py
it indicates only 1467 images from 20 videos were used to generate the image_kp_raw files which indicates this is how it was done.
I am using this model for own videos, pix2pix.py training has been completed which generates .data, .index and .meta model files but cannot run train.py for keras models due to no pkl files.
Kindly help me in generating data/audio_kp/audio_kp1467_mel.pickle, data/pca/pkp1467.pickle and data/pca/pca1467.pickle files used in train.py.
I am interested in your project and read your paper. In your paper, you process the keypoints to be independent on the face location, face size, in-plane and out-of-plane face rotation. For example, you mean-normalization the 68 keypoints, project the keypoints into a horizontal axis and divide the keypoints by the norm of the 68 vectors. It is important for this processing, but the explanation is brief in your paper. Could you explain this process including the formulas and methods in detail?
Thanks a lot
Is there a way to improve the result at https://github.com/karanvivekbhargava/obamanet/blob/master/results/key2im.gif so the mouth area is high resolution like the rest of the picture?
Hello, I'm trying to retrain the network from scratch and I want to use the 'processing.py
' from data folder downloaded from the link.
However, at line 205
numOfFiles = 1467 # First 20 videos
I want to know how is this number been calculated in order to create a correct pickle file for the train.py
file to train.
Thank you!
python3: can't open file 'tools/process.py': [Errno 2] No such file or directory
and what is dir c supposed to contain?
How did you get a2key_data/images and kp_test.pickle?
I want to create images and kp_test.pickle by my dataset.
I am able to generate the lip sync points for my input. How do I remove the black box and lip animation to obtain the original face with lip sync?
Copy the patched images into folder a and the cropped images to folder b
==> Could you release the code about how to get the cropped images ?
Thank you ~
After all processing, my training has started but on CPU, I want to train it on multiple GPUs..
Please, pull some requests.
Regards.
Hi @karanvivekbhargava -- thanks for a really great implementation of the ObamaNet framework, this has been a real joy to work with. I'm wondering if you or any other others have run into any snafus with training the audio-to-keypoint LSTM you have implemented train.py
. After training yours for about 50 epochs, the LSTM is only predicting the same "open mouth" keypoint vectors for every audio timestep, like so:
Some more details if they're useful:
logfbank
feature representation of the audio keypoints.dlib
and normalized, etc.) match up perfectly to the original images ... so it's not an issue with the keypoint extraction/dumping code.a2kp
model works moderately well (at least it does not predict the same static output).kia.wav
, 00002-007.wav`, etc.).TL;DR: Does this seem like it might just be an issue of not training for enough epochs? Or might it be some bug, i.e. related to the audio timesteps not correctly broken up in predict.py
or the PCA upsampling not working well for predicted keypoints?
Would appreciate any quick insights or hunches anyone might have on this or if folks could just verify that they've gotten replicable results simply from using the exact code in this repo. Thanks so much!
若能提供帮助或提示, 感激不尽,
有共同研究的朋友, 可以加v64053493拉群互相交流
I'm probably just doing something wrong but if somebody could help me that would be greatly appreciated.
The work is amazing, but when try to test, something makes me confused:
As you described in paper, the keypoints are normalized to be invariant to the face location, in-plane and out-of-plane face rotation.
But when i try to testing, I found, in test dataset, the keyponits rotate with the face and the keyponits is consecutive. As the described above, the keyponits generated from audio are invariants the face location, in-plane and out-of-plane face rotation.
Therefore, I want to know how the keypoints keep consistence with the face with different poses and size?
Thanks a lot!
Will this work on another person's photo not Obama?
I read the paper and the author said, he used Char2wav to train the audio to make using obama's voice to speak other words
, but I didn't find this part of using Obama's voice to train text2voice, could you help to find that ?
I trained my own data.When i used single sample,lstm would get goo result,but when i used 50 samples,the result was poor.Sorry to bother you.
how can i get the video dataset and text transcripts
How to create pickle files i.e audiokp videokp and pca. Are these automatically generated or we have to create them if creation required then how
Would that be a better resolution if pix2pixHD adopted? Just a suggestion, thanks for great implementation of the paper!
Hi~
Thanks for your code, but I found the code is incomplete. When will the full code be released? And where can I find the c/train data?
/obamanet/data/obama_addresses.txt
and the entire data folder seems to be missing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.