Giter VIP home page Giter VIP logo

obamanet's People

Contributors

karanvivekbhargava avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

obamanet's Issues

Model is not learning.

Hi, when I try to train a model, the validation performance does not improve at all as it can be seen from the tensorboard graphs. Also, training accuracy does not achieve good results. I used proposed data. What should I do? Thanks.
Screenshot from 2021-06-27 12-51-15

Consider using pix2pixHD?

Hi,

I see that the mouth hasn't been mapped well by pix2pix and is pretty low res.

  • Why are your results different from the paper? (Paper seems to have higher res)
  • Have you played around with pix2pixHD for higher resolution lips?

Cheers

vid2wav.py and tool folder

Hello there! I am not able to find the vid2wav.py and tool/process.py here. Is there a master branch somewhere else?

Tools found in Pix2pix repository

Framerate to extract images from video?

In the readme the command for converting videos to images is as follows:

ffmpeg -i 00001.mp4 -r 1/5 -vf scale=-1:720 images/00001-$filename%05d.bmp

This will only extract an image every 5 seconds (per -r 1/5) - is this how it should be done and if so, why? Is this arbitrary? I see in processing.py it indicates only 1467 images from 20 videos were used to generate the image_kp_raw files which indicates this is how it was done.

How to create .pkl files for own data set.

I am using this model for own videos, pix2pix.py training has been completed which generates .data, .index and .meta model files but cannot run train.py for keras models due to no pkl files.

Kindly help me in generating data/audio_kp/audio_kp1467_mel.pickle, data/pca/pkp1467.pickle and data/pca/pca1467.pickle files used in train.py.

how to normalize the keypoints in this paper?

I am interested in your project and read your paper. In your paper, you process the keypoints to be independent on the face location, face size, in-plane and out-of-plane face rotation. For example, you mean-normalization the 68 keypoints, project the keypoints into a horizontal axis and divide the keypoints by the norm of the 68 vectors. It is important for this processing, but the explanation is brief in your paper. Could you explain this process including the formulas and methods in detail?

Thanks a lot

The number of files in 'data/processing.py'

Hello, I'm trying to retrain the network from scratch and I want to use the 'processing.py' from data folder downloaded from the link.

However, at line 205
numOfFiles = 1467 # First 20 videos
I want to know how is this number been calculated in order to create a correct pickle file for the train.py file to train.
Thank you!

files missing

python3: can't open file 'tools/process.py': [Errno 2] No such file or directory
and what is dir c supposed to contain?

How to train with GPU ?

After all processing, my training has started but on CPU, I want to train it on multiple GPUs..
Please, pull some requests.

Regards.

After training, audio-to-kp LSTM predicting identical "open-mouth" keypoints

Hi @karanvivekbhargava -- thanks for a really great implementation of the ObamaNet framework, this has been a real joy to work with. I'm wondering if you or any other others have run into any snafus with training the audio-to-keypoint LSTM you have implemented train.py. After training yours for about 50 epochs, the LSTM is only predicting the same "open mouth" keypoint vectors for every audio timestep, like so:

screen shot 2019-01-22 at 1 00 08 pm

Some more details if they're useful:

  • I've dumped the images using a frame rate of 20, as opposed to 25 or 30.
  • I'm using the default logfbank feature representation of the audio keypoints.
  • The processed training keypoints (extracted using dlib and normalized, etc.) match up perfectly to the original images ... so it's not an issue with the keypoint extraction/dumping code.
  • The default pre-trained a2kp model works moderately well (at least it does not predict the same static output).
  • I'm using 50 address videos (as opposed to I think 20 in this repo?), so don't think it's an issue of insufficient data.
  • I'm using the default look back and time delay parameters from this repo. I train for about 50 epochs with a batch size of 1.
  • It does not appear that testing loss or validation loss seems to decrease nearly at all across epochs.
  • I'm using the audios from the demo data that you've included in this repo (e.g. kia.wav, 00002-007.wav`, etc.).

TL;DR: Does this seem like it might just be an issue of not training for enough epochs? Or might it be some bug, i.e. related to the audio timesteps not correctly broken up in predict.py or the PCA upsampling not working well for predicted keypoints?

Would appreciate any quick insights or hunches anyone might have on this or if folks could just verify that they've gotten replicable results simply from using the exact code in this repo. Thanks so much!

How to keep the mouth area consistent with the face?

The work is amazing, but when try to test, something makes me confused:
As you described in paper, the keypoints are normalized to be invariant to the face location, in-plane and out-of-plane face rotation.

But when i try to testing, I found, in test dataset, the keyponits rotate with the face and the keyponits is consecutive. As the described above, the keyponits generated from audio are invariants the face location, in-plane and out-of-plane face rotation.

Therefore, I want to know how the keypoints keep consistence with the face with different poses and size?

Thanks a lot!

Where is the Char2wav in your implementation ?

I read the paper and the author said, he used Char2wav to train the audio to make using obama's voice to speak other words, but I didn't find this part of using Obama's voice to train text2voice, could you help to find that ?

Pickle files generation

How to create pickle files i.e audiokp videokp and pca. Are these automatically generated or we have to create them if creation required then how

pix2pixHD

Would that be a better resolution if pix2pixHD adopted? Just a suggestion, thanks for great implementation of the paper!

Missing Files

/obamanet/data/obama_addresses.txt and the entire data folder seems to be missing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.