Giter VIP home page Giter VIP logo

visemenet_tensorflow's People

Contributors

yzhou359 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visemenet_tensorflow's Issues

Different versions of JALI rig

Hi Yang,

I have tried the JALI rig you provided but it didn't work. I wonder if you could specify the corresponding variables between the different versions? Thank you so much!

Meaning of the parameters

Hi!

I'm trying to understand how these parameters, 'JALI.translateX', 'JALI.translateY', 'AAA', 'Eh', 'AHH', 'OHH', 'UUU', 'IEE', 'RRR', 'WWW', 'SSS', 'FFF', 'TTH', 'MBP', 'SSH', 'Schwa', 'GK', 'LNTD', 'COARTIC.LNTD', 'COARTIC.GK', 'COARTIC.MMM', 'COARTIC.FFF', 'COARTIC.WA_PEDAL', 'COARTIC.YA_PEDAL', map to a mouth, for instance.

Is there any documentation about it?

Thank you in advance!

Question about pretrain for the first two stages

Hi! Thanks for sharing your code. I just have a simple question about the pretrain for the first two stages.

For example, the input size of the audio in the code is (1484, 65). It seems the output size of Phoneme groups and lankmarks are (1484, 21) and (1484, 76)? If so, how do you make the audio correspond to others one to one. I mean the audio actually corresponds to hundreds of video frames and each frame corresponds to one landmark. Do you duplicate frames to match the shape of the audio?

And why the output of Phoneme groups is (?, 21)? In the figure of the paper, the last channel of the output seems to be 20.

Thank you!

JALI rig annotation

Hi Yang,

Thanks for your sharing on this topic, I'm wondering how to construct a JALI-compatible face rig, and how to annotate it for training as your project? We want to try it on our own character.

Thanks in advance.

Training_from_scratch

Hello,
Thank you for providing your code to the paper!! Really appreciated it!
I would like to try to train the VisemeNet from scratch; meaning that I would have a directory of wav files for my input. Any advice on how should I use the train_visemenet.py?
Currenlty it takes as an argument one test file but I cannot get the script running for training.

Have any chance to get the training code.

Hi, yzhou359
I am an animator and a Beginner amateur in AI technology ,
I'm very interested in VisemeNet_tensorflow project.
And I'm trying training my own speech animation data for VisemeNet, and JALI rig.
Is there any chance to get the training code to do this?

NotFoundError

Hi, thanks for sharing! I am currently trying to run your code on my machine. when I run command line "python main_test.py", there are some problems, The following is the output, is there something wrong with my settings?

~/miniconda3/bin/VisemeNet_tensorflow$ python main_test.py
Warning: dir data/csv/visemenet_intro/ already exist! Continue program...
Warning: dir data/csv/visemenet_intro/test/ already exist! Continue program...

==================== Processing file data/test_audio/visemenet_intro.wav ====================
FPS: 25
WARNING:root:frame length (1103) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid.
WARNING:root:frame length (1103) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid.
WARNING:root:frame length (1103) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid.
Load #Clip 0/1, wav (1484, 65)
Save test - wav file as shape of (1484, 24)
WARNING:tensorflow:From /home/wangqianyun/miniconda3/bin/VisemeNet_tensorflow/src/model.py:210: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

WARNING:tensorflow:From /home/wangqianyun/miniconda3/bin/VisemeNet_tensorflow/src/model.py:210: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

2019-03-24 13:31:42.031331: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-24 13:31:42.095617: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-24 13:31:42.096053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:02:00.0
totalMemory: 11.92GiB freeMemory: 6.68GiB
2019-03-24 13:31:42.152892: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-24 13:31:42.153334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties:
name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745
pciBusID: 0000:01:00.0
totalMemory: 11.17GiB freeMemory: 11.08GiB
2019-03-24 13:31:42.153356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1
2019-03-24 13:31:42.783123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-24 13:31:42.783155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1
2019-03-24 13:31:42.783161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N N
2019-03-24 13:31:42.783165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: N N
2019-03-24 13:31:42.783471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6445 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0, compute capability: 5.2)
2019-03-24 13:31:42.783899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10741 MB memory) -> physical GPU (device: 1, name: Tesla K40c, pci bus id: 0000:01:00.0, compute capability: 3.5)
Warning: dir data/output_viseme/ already exist! Continue program...
2019-03-24 13:31:43.193457: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key net1_rnn/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key net1_rnn/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_301 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main_test.py", line 14, in
test(model_name='pretrain_biwi', test_audio_name=test_audio_name[:-4])
File "/home/wangqianyun/miniconda3/bin/VisemeNet_tensorflow/src/train_visemenet.py", line 33, in test
saver.restore(sess, OLD_CHECKPOINT_FILE)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1802, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key net1_rnn/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_301 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op 'save/RestoreV2', defined at:
File "main_test.py", line 14, in
test(model_name='pretrain_biwi', test_audio_name=test_audio_name[:-4])
File "/home/wangqianyun/miniconda3/bin/VisemeNet_tensorflow/src/train_visemenet.py", line 25, in test
saver = tf.train.Saver(max_to_keep=max_to_keep)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1338, in init
self.build()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1347, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1384, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key net1_rnn/multi_rnn_cell/cell_0/lstm_cell/bias not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_301 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Can this be used with Epic Metahuman?

New to this area, but I'm wondering is Jali viseme can be used with the metahuman models? Or should I do some modification to the rig of metahuman for making it works?

custom viseme

Hi @yzhou359 thank you so much for sharing your work. i was trying to generate viseme with TTS. This repo is the Only resource i found on git related to viseme generation. But I think it is usable for only in a specific software. I was wondering if you could please help me in understanding the generated viseme and how can i get viseme which i can use according to my usecase.
will really appreciate your comment.

How to normalize the audio features?

Hi, I am doing a similar project about speech animation. I am currently using the architecture and audio features from this paper to do the phoneme prediction part. But I am a bit confused about the feature normalization approach in the code. For the 65-dim audio feature input, your code seems to normalize it with the mean and std of the whole training set. However, it may not be a good approach in my case because basically my training set is not big enough and may have a different distribution from the testing set. After a bit googling, I try to do feature-wise normalization. In other words, normalize the MFB, MFC and SSC by sample mean and std for each sample. I don't know whether it works, or do you have any suggestion for the feature normalization process?

Using MFCC feature extraction default parameters will

Hi, yzhou,

When trying to repro the test, I found the MFCC features are not correctly extracted when using the default mfcc parameters. The number of samples in each frame is 25ms x 44kHz = 1100 samples, which is greater than nfft=512. In this case the frame will be truncated.

Did you down-sampled the wav file before sending it to mfcc feature extractor, or did you set nfft=2048 to avoid this happening?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.