rizkiarm / lipnet Goto Github PK

Keras implementation of 'LipNet: End-to-End Sentence-level Lipreading'

License: MIT License

Python 98.70% Shell 1.30%

lipnet's Introduction

LipNet: End-to-End Sentence-level Lipreading

Keras implementation of the method described in the paper 'LipNet: End-to-End Sentence-level Lipreading' by Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas (https://arxiv.org/abs/1611.01599).

Results

Scenario	Epoch	CER	WER	BLEU
Unseen speakers [C]	N/A	N/A	N/A	N/A
Unseen speakers	178	6.19%	14.19%	88.21%
Overlapped speakers [C]	N/A	N/A	N/A	N/A
Overlapped speakers	368	1.56%	3.38%	96.93%

Notes:

[C] means using curriculum learning.
N/A means either the training is in progress or haven't been performed.
Your contribution in sharing the results of this model is highly appreciated :)

Dependencies

Keras 2.0+
Tensorflow 1.0+
PIP (for package installation)

Plus several other libraries listed on setup.py

Usage

To use the model, first you need to clone the repository:

git clone https://github.com/rizkiarm/LipNet

Then you can install the package:

cd LipNet/
pip install -e .

Note: if you don't want to use CUDA, you need to edit the setup.py and change tensorflow-gpu to tensorflow

You're done!

Here is some ideas on what you can do next:

Modify the package and make some improvements to it.
Train the model using predefined training scenarios.
Make your own training scenarios.
Use pre-trained weights to do lipreading.
Go crazy and experiment on other dataset! by changing some hyperparameters or modify the model.

Dataset

This model uses GRID corpus (http://spandh.dcs.shef.ac.uk/gridcorpus/)

Pre-trained weights

For those of you who are having difficulties in training the model (or just want to see the end results), you can download and use the weights provided here: https://github.com/rizkiarm/LipNet/tree/master/evaluation/models.

More detail on saving and loading weights can be found in Keras FAQ.

Training

There are five different training scenarios that are (going to be) available:

Prerequisites

Download all video (normal) and align from the GRID Corpus website.
Extracts all the videos and aligns.
Create datasets folder on each training scenario folder.
Create align folder inside the datasets folder.
All current train.py expect the videos to be in the form of 100x50px mouthcrop image frames. You can change this by adding vtype = "face" and face_predictor_path (which can be found in evaluation/models) in the instantiation of Generator inside the train.py
The other way would be to extract the mouthcrop image using scripts/extract_mouth_batch.py (usage can be found inside the script).
Create symlink from each training/*/datasets/align to your align folder.
You can change the training parameters by modifying train.py inside its respective scenarios.

Random split (Unmaintained)

Create symlink from training/random_split/datasets/video to your video dataset folder (which contains s* directory).

Train the model using the following command:

./train random_split [GPUs (optional)]

Note: You can change the validation split value by modifying the val_split argument inside the train.py.

Unseen speakers

Create the following folder:

training/unseen_speakers/datasets/train
training/unseen_speakers/datasets/val

Then, create symlink from training/unseen_speakers/datasets/[train|val]/s* to your selection of s* inside of the video dataset folder.

The paper used s1, s2, s20, and s22 for evaluation and the remainder for training.

Train the model using the following command:

./train unseen_speakers [GPUs (optional)]

Unseen speakers with curriculum learning

The same way you do unseen speakers.

Note: You can change the curriculum by modifying the curriculum_rules method inside the train.py

./train unseen_speakers_curriculum [GPUs (optional)]

Overlapped Speakers

Run the preparation script:

python prepare.py [Path to video dataset] [Path to align dataset] [Number of samples]

Notes:

[Path to video dataset] should be a folder with structure: /s{i}/[video]
[Path to align dataset] should be a folder with structure: /[align].align
[Number of samples] should be less than or equal to min(len(ls '/s{i}/*'))

Then run training for each speaker:

python training/overlapped_speakers/train.py s{i}

Overlapped Speakers with curriculum learning

Copy the prepare.py from overlapped_speakers folder to overlapped_speakers_curriculum folder, and run it as previously described in overlapped speakers training explanation.

Then run training for each speaker:

python training/overlapped_speakers_curriculum/train.py s{i}

Note: As always, you can change the curriculum by modifying the curriculum_rules method inside the train.py

Evaluation

To evaluate and visualize the trained model on a single video / image frames, you can execute the following command:

./predict [path to weight] [path to video]

Example:

./predict evaluation/models/overlapped-weights368.h5 evaluation/samples/id2_vcd_swwp2s.mpg

Work in Progress

This is a work in progress. Errors are to be expected. If you found some errors in terms of implementation please report them by submitting issue(s) or making PR(s). Thanks!

Some todos:

License

MIT License

lipnet's People

Contributors

Stargazers

Watchers

Forkers

foursmall channingxiao jdc08161063 dongzhuoyao joancarles-upv fredlint juliusespanol barongeng shaniye alan918727 robbiebarrat uetetsu11 milestonesvn orchestor michiyosony johndpope izzetemre peterzhousz coderx7 mearshen 2php 15751064254 shubhampachori12110095 yash1996 zhaodan2000 adityasarathy ajinkya-dhote kinnamuri fitrialif niucheney gururajasekhar czaoth fendaq tejasdev23 ashwanijha04 cadcostajr rah003 degeta10 chauhan99 joydeepdas taylorkangbeck pilotbear tianfukang tracy6465 yt-oh96 kittymac liviust ztjnwu kgyokov amaljithev lininglouis alexvlis tomzhao2016 wassgha shlpu corner4world zumbalamambo csprh sharmer156 bouayad vikashtiwari360 dimaxano ajitaru zgsxwsdxg daggieblanqx ethanra hassanabbas92 deeep-learning vadimostanin hiyouka vvyomjjain wl3b10s batermj zhouyuxuanyx yule-li sf308 abhijithneilabraham jchen42703 amirunpri2018 mchayapol 7aughing aakashv000 dgreyling edmig nasu118 esmaeilinia shfaizan divyamadhav ssolari hugopu aurelius-ai joddiy samimideksa ajaysharma201291 aeadod rajatgrewal kashilkarkunal n130557 jansturm1 ydj515

lipnet's Issues

How to generate the align file from a video?

Hi, Is there any tool that will generate same align file from GRID? TIA

ambiguous training validation loss

hello, i'm running the model, but i'm getting a bad loss, in the sixth epoch i got these scores:
[Epoch 6] Out of 256 samples: [CER: 25.898 - 0.910] [WER: 7.492 - 1.457] [BLEU: 0.314 - 0.314]
Are these scors meaningful? or the model overfit, and what's is the bleu score? is it the accuracy in speech recognition

Python 3.x: Compilation error (video files not found)

Dear rizkiarm,

I have compiled the code after conversion it to Python 3.x. The compilation process is successful but video file not found the error is being reported. The screenshot is attached for the reference. Please help to resolve this problem. PyCharm is being used as IDE.

Evaluation Protocol of The Results

Thank you for your work !
Based on your code, I got something like this in stats.csv.

Epoch,Samples,Mean CER,Mean CER (Norm),Mean WER,Mean WER (Norm),Mean BLEU,Mean BLEU (Norm)                                                                                                         
0,256,19.73047,0.79531,5.81250,0.96875,0.34846,0.34846
1,256,17.49609,0.70680,5.70312,0.95052,0.40687,0.40687

It seems that the CER is calculated based on only 256 samples.
I write a simple script based on your functions and model to check the results on all unseen speakers (1, 2, 20, 22), 3971 samples in total. And I can only get about 0.12 CER.

Can you provide results evaluate on all unseen speakers examples ? Thank you.

who can help me?????

whether the validation_steps need to set by myself?

Prerequisite Issues

Hi, could you please tell us the environment in which you developed this repo?

While I am running python training/overlapped_speakers/train.py s1, error occurs with the message saying: ImportError: cannot import name imresize

discussion!

Thanks for your work! I am a post student interested in lipreading!!!!
Based on your code ，I have achieved a rather good result as follow:

[Epoch 66] Out of 256 samples: [CER: 0.711 - 0.029] [WER: 0.469 - 0.078] [BLEU: 0.937 - 0.937].
This result is in unseen_speakers.
I want to know how many epoch do you set in the unseen_speakers ?
the model you release is 368weights.h5. I assume that model is what you have claimed in the results and it is in 368 epoch in overlapped ,is it right?

Thanks a lot !

training issues

@rizkiarm Could you please upload your "overlapped_speakers" folder somewhere (with data in it) or provide some further insight into how to prepare data from the grid dataset to train the model on? I keep getting errors relating to trying to train the model on an empty dataset.

Problem in finding training videos

I am using unseen speakers training file
the model could not find any training videos

Minimum number of samples

for minimum number of samples, there is an issue with the type of data. INT type cannot read the min. number

[import error] ./predict evaluation/models/weights04.h5 evaluation/samples/id2_vcd_swwp2s.mpg

I am unable to get smooth execution of code while running following line

./predict evaluation/models/weights04.h5 evaluation/samples/id2_vcd_swwp2s.mpg

Errors:
Traceback (most recent call last):
File "/mnt/c/Users/bismil/Documents/Python Scripts/LipNet-master/evaluation/predict.py", line 1, in
from lipnet.lipreading.videos import Video
ImportError: No module named lipnet.lipreading.videos

Lipreading in Wild dataset

Hi @rizkiarm

Not sure if this is a right place to ask his query.
Have you tried this model on 'Lipreading in Wild' dataset, Joon and Zisserman, ACCV'16?

Thanks.

ValueError: Dimension 0 in both shapes must be equal occurs when using predict method on various images

For some images in the GRID dataset, when using the ./predict method and any weight file, the code gives me an error. One such file that does this is "lrarzn.mpg", which is in in the s1 directory of the GRID dataset, however, there are many more files that trigger this error.

ValueError: Dimension 0 in both shapes must be equal, but are 38016 and 1728 for 'Assign_18' (op: 'Assign') with input shapes: [38016,768], [1728,768].

The "Weights" folder is a folder i created in the LipNet root directory for the sake of convenience.

Also encountered this error when processing files. The "custom_evaluation" method is a way for me to evaluate pictures in bulk using the method in predict.py. This is for making evaluation easier. It should not effect the actual mechanisms of the code in any way.

Preprocessing videos: paper vs this implementation

I'm reading through the LipNet paper and trying to determine whether we're doing the preprocessing described there.

The things that stood out to me were:

“we train on both the regular and the horizontally mirrored image sequence”
The implementation seems a bit different--see the last question here.
“We augment the sentence-level training data with video clips of individual words as additional training instances. These instances have a decay rate of 0.925”
This looks like it isn't currently in place in the non-curriculum training, since 'sentence_length' is -1 in unseen_speakers/train.py and overlapped_speakers/train.py, but by modifying this value we have the capability to train on different sentence lengths (though currently each epoch would have sentences of all the same length?)
“To encourage resilience to varying motion speeds by deletion and duplication of frames, this is performed with a per-frame probability of 0.05”
This looks done, with the deletion/duplication code in in videos.temporal_jitter()!
“We standardize the RGB channels over the whole training set to have zero mean and unit variance”
I found the line X_data = np.array(X_data).astype(np.float32) / 255 # Normalize image data to [0,1], TODO: mean normalization over training data in generators.py, sounds like it needs to be done :)

Is this an accurate description of the state of the project?

Error Loading video

Have any of you experienced this problem? I was compliling training data for s1, s2 and s3 for random_split using the command ./train random_split but I am getting responses saying "error loading video".

I recently added the following line to RandomSplitGenerator in the train.py file to solve the size issue:
"vtype = "face", face_predictor_path='home\souheil\LipNet\common\predictors\shape_predictor_68_face_landmarks.dat',"

However I am still experiencing errors. Have any of you experience this problem?

questions !

Problems reproducing Unseen speakers results

Thanks a lot for the great job you've done on this project!

I'm having some difficulties reproducing the results you've got on the unseen speakers.

As You mentioned in the Readme you’ve reached the following results:

Scenario Epoch CER WER BLEU
Unseen speakers 178 6.19% 14.19% 88.21%

I'm running on Ubuntu 16.04 - GPU Nvidia 1080TI

I didn't change the code!

I used 28775 videos for training, and 3966 videos for validation (speakers: 1, 2, 20, 22)

but I only got the following results:

Epoch Samples Mean CER Mean CER (Norm) Mean WER Mean WER (Norm) Mean BLEU Mean BLEU (Norm)
178 256 5.36328 0.22138 1.94531 0.32422 0.6903 0.6903
324 256 4.97656 0.20456 1.61328 0.26888 0.71737 0.71737

Does the 14.19% stand for the Mean WER (Norm)?
Are the results you’ve posted are for running on 256 validation examples or on all the 3966 validation videos for the saved model from epoch 178?
Any ideas what I'm doing wrong and not being able to reach the same results as you have?

Thank you!

How to choose absolute_max_string_len and output size for my dataset?

Hi!
absolute_max_string_len is equal to the max label length (in characters) and output size is equal to the number of unique words in my dict + 1 (blank symbol). Am i right?
Thank you!

Error when running ./train unseen_speakers after scripts/extract_mouth_batch.py

`Using all available GPUs.
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

Enumerating dataset list from disk...
Found 5 videos for training.
Found 5 videos for validation.

Layer (type) Output Shape Param #

the_input (InputLayer) (None, 75, 100, 50, 3) 0

zero1 (ZeroPadding3D) (None, 77, 104, 54, 3) 0

conv1 (Conv3D) (None, 75, 50, 25, 32) 7232

batc1 (BatchNormalization) (None, 75, 50, 25, 32) 128

actv1 (Activation) (None, 75, 50, 25, 32) 0

spatial_dropout3d_1 (Spatial (None, 75, 50, 25, 32) 0

max1 (MaxPooling3D) (None, 75, 25, 12, 32) 0

zero2 (ZeroPadding3D) (None, 77, 29, 16, 32) 0

conv2 (Conv3D) (None, 75, 25, 12, 64) 153664

batc2 (BatchNormalization) (None, 75, 25, 12, 64) 256

actv2 (Activation) (None, 75, 25, 12, 64) 0

spatial_dropout3d_2 (Spatial (None, 75, 25, 12, 64) 0

max2 (MaxPooling3D) (None, 75, 12, 6, 64) 0

zero3 (ZeroPadding3D) (None, 77, 14, 8, 64) 0

conv3 (Conv3D) (None, 75, 12, 6, 96) 165984

batc3 (BatchNormalization) (None, 75, 12, 6, 96) 384

actv3 (Activation) (None, 75, 12, 6, 96) 0

spatial_dropout3d_3 (Spatial (None, 75, 12, 6, 96) 0

max3 (MaxPooling3D) (None, 75, 6, 3, 96) 0

time_distributed_1 (TimeDist (None, 75, 1728) 0

bidirectional_1 (Bidirection (None, 75, 512) 3048960

bidirectional_2 (Bidirection (None, 75, 512) 1181184

dense1 (Dense) (None, 75, 28) 14364

softmax (Activation) (None, 75, 28) 0

Total params: 4,572,156.0
Trainable params: 4,571,772.0
Non-trainable params: 384.0

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.645
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.25GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
Epoch 1/5000
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: slice index 0 of dimension 0 out of bounds.
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: slice index 0 of dimension 0 out of bounds.
[[Node: ctc/scan/strided_slice = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/gpu:0"](ctc/scan/Shape, ctc/scan/strided_slice/stack, ctc/scan/strided_slice/stack_1, ctc/scan/strided_slice/stack_2)]]
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: slice index 0 of dimension 0 out of bounds.
[[Node: ctc/scan/strided_slice = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/gpu:0"](ctc/scan/Shape, ctc/scan/strided_slice/stack, ctc/scan/strided_slice/stack_1, ctc/scan/strided_slice/stack_2)]]
Traceback (most recent call last):
File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/train.py", line 77, in
train(run_name, 0, 5000, 3, 100, 50, 75, 32, 1)
File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/train.py", line 72, in train
pickle_safe=True)
File "/home/deepakgupta1313/anaconda3/envs/py27/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/deepakgupta1313/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/training.py", line 1876, in fit_generator
class_weight=class_weight)
File "/home/deepakgupta1313/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/training.py", line 1620, in train_on_batch
outputs = self.train_function(ins)
File "/home/deepakgupta1313/anaconda3/envs/py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2073, in call
feed_dict=feed_dict)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index 0 of dimension 0 out of bounds.
[[Node: ctc/scan/strided_slice = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/gpu:0"](ctc/scan/Shape, ctc/scan/strided_slice/stack, ctc/scan/strided_slice/stack_1, ctc/scan/strided_slice/stack_2)]]
[[Node: batc2/moments/sufficient_statistics/Gather/_155 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7459_batc2/moments/sufficient_statistics/Gather", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Caused by op u'ctc/scan/strided_slice', defined at:
File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/train.py", line 77, in
train(run_name, 0, 5000, 3, 100, 50, 75, 32, 1)
File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/train.py", line 40, in train
absolute_max_string_len=absolute_max_string_len, output_size=lip_gen.get_output_size())
File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/lipnet/model2.py", line 21, in init
self.build()
File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/lipnet/model2.py", line 66, in build
self.loss_out = CTC('ctc', [self.y_pred, self.labels, self.input_length, self.label_length])
File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/lipnet/core/layers.py", line 7, in CTC
return Lambda(ctc_lambda_func, output_shape=(1,), name=name)(args)
File "/home/deepakgupta1313/anaconda3/envs/py27/lib/python2.7/site-packages/keras/engine/topology.py", line 554, in call
output = self.call(inputs, **kwargs)
File "/home/deepakgupta1313/anaconda3/envs/py27/lib/python2.7/site-packages/keras/layers/core.py", line 659, in call
return self.function(inputs, **arguments)
File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/lipnet/core/loss.py", line 11, in ctc_lambda_func
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
File "/home/deepakgupta1313/anaconda3/envs/py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3258, in ctc_batch_cost
sparse_labels = tf.to_int32(ctc_label_dense_to_sparse(y_true, label_length))
File "/home/deepakgupta1313/anaconda3/envs/py27/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3222, in ctc_label_dense_to_sparse
initializer=init, parallel_iterations=1)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 524, in scan
n = array_ops.shape(elems_flat[0])[0]
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 495, in _SliceHelper
name=name)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 653, in strided_slice
shrink_axis_mask=shrink_axis_mask)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3688, in strided_slice
shrink_axis_mask=shrink_axis_mask, name=name)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/deepakgupta1313/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): slice index 0 of dimension 0 out of bounds.
[[Node: ctc/scan/strided_slice = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/gpu:0"](ctc/scan/Shape, ctc/scan/strided_slice/stack, ctc/scan/strided_slice/stack_1, ctc/scan/strided_slice/stack_2)]]
[[Node: batc2/moments/sufficient_statistics/Gather/_155 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7459_batc2/moments/sufficient_statistics/Gather", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
`

ValueError: Dimension 0 in both shapes must be equal, but are 512 and 256 for 'Assign_12' (op: 'Assign') with input shapes: [512,768], [256,768].

when I run the default demo with command "./predict evaluation/models/weights04.h5 evaluation/samples/id2_vcd_swwp2s.mpg ",Error"ValueError: Dimension 0 in both shapes must be equal, but are 512 and 256 for 'Assign_12' (op: 'Assign') with input shapes: [512,768], [256,768]."happened.
Could you tell me resolution ?Thank you sincerely!

How to extract bottleneck features from one of the layers at the end of the pipeline?

How to extract bottleneck features from one of the layers at the end of the pipeline?

hi,

ValueError: When using a generator for validation data, you must specify a value for `validation_steps`.

Hi, Thanks for this open sourcing this lipnet. I'm still new to ML and still learning.
However I'm trying to train 10 videos (from grid) for now(just for testing) using the "unseen_speakers" and getting an error about validation steps.
Also enabled vtype='face', face_predictor_path=FACE_PREDICTOR_PATH

Below is the complete log

Using all available GPUs.
Using TensorFlow backend.

Enumerating dataset list from disk...
Found 10 videos for training.
Found 10 videos for validation.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
the_input (InputLayer)       (None, 75, 100, 50, 3)    0
_________________________________________________________________
zero1 (ZeroPadding3D)        (None, 77, 104, 54, 3)    0
_________________________________________________________________
conv1 (Conv3D)               (None, 75, 50, 25, 32)    7232
_________________________________________________________________
batc1 (BatchNormalization)   (None, 75, 50, 25, 32)    128
_________________________________________________________________
actv1 (Activation)           (None, 75, 50, 25, 32)    0
_________________________________________________________________
spatial_dropout3d_1 (Spatial (None, 75, 50, 25, 32)    0
_________________________________________________________________
max1 (MaxPooling3D)          (None, 75, 25, 12, 32)    0
_________________________________________________________________
zero2 (ZeroPadding3D)        (None, 77, 29, 16, 32)    0
_________________________________________________________________
conv2 (Conv3D)               (None, 75, 25, 12, 64)    153664
_________________________________________________________________
batc2 (BatchNormalization)   (None, 75, 25, 12, 64)    256
_________________________________________________________________
actv2 (Activation)           (None, 75, 25, 12, 64)    0
_________________________________________________________________
spatial_dropout3d_2 (Spatial (None, 75, 25, 12, 64)    0
_________________________________________________________________
max2 (MaxPooling3D)          (None, 75, 12, 6, 64)     0
_________________________________________________________________
zero3 (ZeroPadding3D)        (None, 77, 14, 8, 64)     0
_________________________________________________________________
conv3 (Conv3D)               (None, 75, 12, 6, 96)     165984
_________________________________________________________________
batc3 (BatchNormalization)   (None, 75, 12, 6, 96)     384
_________________________________________________________________
actv3 (Activation)           (None, 75, 12, 6, 96)     0
_________________________________________________________________
spatial_dropout3d_3 (Spatial (None, 75, 12, 6, 96)     0
_________________________________________________________________
max3 (MaxPooling3D)          (None, 75, 6, 3, 96)      0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 75, 1728)          0
_________________________________________________________________
bidirectional_1 (Bidirection (None, 75, 512)           3048960
_________________________________________________________________
bidirectional_2 (Bidirection (None, 75, 512)           1181184
_________________________________________________________________
dense1 (Dense)               (None, 75, 28)            14364
_________________________________________________________________
softmax (Activation)         (None, 75, 28)            0
=================================================================
Total params: 4,572,156.0
Trainable params: 4,571,772.0
Non-trainable params: 384.0
_________________________________________________________________
Traceback (most recent call last):
  File "/Users/rad182/Documents/rizkiarm-LipNet/training/unseen_speakers/train.py", line 78, in <module>
    train(run_name, 0, 5000, 3, 100, 50, 75, 32, 50)
  File "/Users/rad182/Documents/rizkiarm-LipNet/training/unseen_speakers/train.py", line 74, in train
    pickle_safe=True)
  File "/usr/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 1783, in fit_generator
    raise ValueError('When using a generator for validation data, '
ValueError: When using a generator for validation data, you must specify a value for `validation_steps`.

Video shape issues and Intel MKL FATAL ERROR: Cannot load libmkl_core.so.

I am running :
./train unseen_speakers

When I don't do anything regarding vface and face_predictor path.
I get the following error :
`Using all available GPUs.
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

Enumerating dataset list from disk...
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/pbwc8n.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/bwwn8p.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/swio2p.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/lgifzn.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/srih2p.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/brbm6n.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/lbid3s.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/bwim6p.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/bbil5a.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/swbp1a.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/lbiq3a.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/bbil4p.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
Video /home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/prwq4p.mpg has incorrect shape (75, 360, 288, 3), must be (75, 100, 50, 3)
`

When I make the following changes in train.py. Please tell me if the change is correct?
lip_gen = BasicGenerator(dataset_path=DATASET_DIR, vtype = "face", face_predictor_path="/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/common/predictors/shape_predictor_68_face_landmarks.dat", minibatch_size=minibatch_size, img_c=img_c, img_w=img_w, img_h=img_h, frames_n=frames_n,

I get the following error:
`Using all available GPUs.
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

Enumerating dataset list from disk...
Intel MKL FATAL ERROR: Cannot load libmkl_core.so.
`
Any help is appreciated.

About Ran out of memory 'problem

I ran the script ./train unseen_speakers
and trained it for train(run_name, 0,1000, 3, 100, 50, 75, 32, 4) (1000epoch and four videos)
but when I ran for 145 epochs,got this problem:

I tensorflow/core/common_runtime/bfc_allocator.cc:696] 19 Chunks of size 786432 totalling 14.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 12 Chunks of size 1572864 totalling 18.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1720320 totalling 1.64MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2073600 totalling 1.98MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2365440 totalling 2.26MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 13 Chunks of size 5308416 totalling 65.81MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 5529600 totalling 5.27MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 6099712 totalling 5.82MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 8294400 totalling 23.73MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 11520000 totalling 10.99MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 13630464 totalling 13.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 14745600 totalling 14.06MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 17508608 totalling 16.70MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 20756736 totalling 19.79MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 23040000 totalling 109.86MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 36476416 totalling 34.79MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 48000000 totalling 228.88MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 1.38GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 1504051200
InUse: 1478468352
MaxInUse: 1504051200
NumAllocs: 91657676
MaxAllocSize: 509946624

W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 7.91MiB. See logs for memory state.

I use geforce gtx960,is this problem about my gpu'memory?
Dose anyone happen this problem?
or can anyone tell me which gpu do you use,
and run for how many epochs and train for how many videos
thanks very much!!

about grid dataset

hello,now i want run the code with my dataset,but i don't know the grid dataset's text
0 14000 sil
14000 19750 bin
19750 25000 blue
25000 30250 by
30250 38750 s
38750 49000 five
49000 61000 please
61000 74500 sil
the data before the word means what,and how can i change my data to train the code?thank you @rizkiarm @michiyosony

Expecting a directory, but getting a video in unseen speakers

Look at the last few lines of the error.

File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/lipnet/lipreading/generators.py", line 209, in next_train ret = self.get_batch(cur_train_index, self.minibatch_size, train=True) File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/lipnet/lipreading/generators.py", line 147, in get_batch video = Video().from_frames(path) File "/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/lipnet/lipreading/videos.py", line 114, in from_frames frames_path = sorted([os.path.join(path, x) for x in os.listdir(path)]) OSError: [Errno 20] Not a directory: '/home/deepakgupta1313/Desktop/Deepak/Programs/Github/LipNet/training/unseen_speakers/datasets/train/s1/bbal6n.mpg'

Are there any pre-trained models?

Please Help.
My computer is too old, i could not do any training.
are there any pre-trained models you can help me with?
Even if the accuracy is too bad, i need it for my graduation project.
Just to prove the concept.
thanks a lot.

Need help on setup

I'm installing on my macOs Serria 10.12.4. I'm using python 3.6.
Pip version : pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

When I run pip3 install -e .
But it show me this error. Anyone know how to fix this?

Build using cmake ...
Scanning dependencies of target dlib
[ 0%] Building CXX object dlib_build/CMakeFiles/dlib.dir/base64/base64_kernel_1.cpp.o
[ 1%] Building CXX object dlib_build/CMakeFiles/dlib.dir/bigint/bigint_kernel_1.cpp.o
[ 2%] Building CXX object dlib_build/CMakeFiles/dlib.dir/bigint/bigint_kernel_2.cpp.o
.....
[ 91%] Building CXX object CMakeFiles/dlib_.dir/src/other.cpp.o
/private/var/folders/c6/vl7hxs354zjfz4qdgmn258900000gn/T/pip-build-ja5_ssrg/dlib/tools/python/src/other.cpp:56:1: error: reference to 'list' is ambiguous
list _max_cost_assignment (
^
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/list:777:29: note: candidate found by name lookup is 'std::__1::list'
class _LIBCPP_TYPE_VIS_ONLY list
^
/usr/local/include/boost/python/list.hpp:57:7: note: candidate found by name lookup is 'boost::python::list'
class list : public detail::list_base
^
/private/var/folders/c6/vl7hxs354zjfz4qdgmn258900000gn/T/pip-build-ja5_ssrg/dlib/tools/python/src/other.cpp:72:11: error: reference to 'list' is ambiguous
const list& assignment
^
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/list:777:29: note: candidate found by name lookup is 'std::__1::list'
class _LIBCPP_TYPE_VIS_ONLY list
^
/usr/local/include/boost/python/list.hpp:57:7: note: candidate found by name lookup is 'boost::python::list'
class list : public detail::list_base
^
2 errors generated.
make[2]: *** [CMakeFiles/dlib_.dir/src/other.cpp.o] Error 1
make[1]: *** [CMakeFiles/dlib_.dir/all] Error 2
make: *** [all] Error 2
error: cmake build failed!

a question about the net...

code in model2.py, line 52:
self.resh1 = TimeDistributed(Flatten())(self.maxp3)

Could you please tell me the input shape and out shape ?the shape of self.maxp3 and self.resh1 ??
Thank u~

ValueError: output of generator should be a tuple `(x, y, sample_weight)` or `(x, y)`. Found: None

Hi, I'm using the GRID s1 sample videos to train (unseen_speakers) and I'm getting this error.
Any idea?

Using all available GPUs.
Using TensorFlow backend.

Enumerating dataset list from disk...
Found 10 videos for training.
Found 10 videos for validation.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
the_input (InputLayer)       (None, 75, 100, 50, 3)    0
_________________________________________________________________
zero1 (ZeroPadding3D)        (None, 77, 104, 54, 3)    0
_________________________________________________________________
conv1 (Conv3D)               (None, 75, 50, 25, 32)    7232
_________________________________________________________________
batc1 (BatchNormalization)   (None, 75, 50, 25, 32)    128
_________________________________________________________________
actv1 (Activation)           (None, 75, 50, 25, 32)    0
_________________________________________________________________
spatial_dropout3d_1 (Spatial (None, 75, 50, 25, 32)    0
_________________________________________________________________
max1 (MaxPooling3D)          (None, 75, 25, 12, 32)    0
_________________________________________________________________
zero2 (ZeroPadding3D)        (None, 77, 29, 16, 32)    0
_________________________________________________________________
conv2 (Conv3D)               (None, 75, 25, 12, 64)    153664
_________________________________________________________________
batc2 (BatchNormalization)   (None, 75, 25, 12, 64)    256
_________________________________________________________________
actv2 (Activation)           (None, 75, 25, 12, 64)    0
_________________________________________________________________
spatial_dropout3d_2 (Spatial (None, 75, 25, 12, 64)    0
_________________________________________________________________
max2 (MaxPooling3D)          (None, 75, 12, 6, 64)     0
_________________________________________________________________
zero3 (ZeroPadding3D)        (None, 77, 14, 8, 64)     0
_________________________________________________________________
conv3 (Conv3D)               (None, 75, 12, 6, 96)     165984
_________________________________________________________________
batc3 (BatchNormalization)   (None, 75, 12, 6, 96)     384
_________________________________________________________________
actv3 (Activation)           (None, 75, 12, 6, 96)     0
_________________________________________________________________
spatial_dropout3d_3 (Spatial (None, 75, 12, 6, 96)     0
_________________________________________________________________
max3 (MaxPooling3D)          (None, 75, 6, 3, 96)      0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 75, 1728)          0
_________________________________________________________________
bidirectional_1 (Bidirection (None, 75, 512)           3048960
_________________________________________________________________
bidirectional_2 (Bidirection (None, 75, 512)           1181184
_________________________________________________________________
dense1 (Dense)               (None, 75, 28)            14364
_________________________________________________________________
softmax (Activation)         (None, 75, 28)            0
=================================================================
Total params: 4,572,156.0
Trainable params: 4,571,772.0
Non-trainable params: 384.0
_________________________________________________________________
nextVal [<lipnet.helpers.threadsafe.threadsafe_iter instance at 0x1155eecf8>]
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 606, in data_generator_task
Epoch 1/5000
Traceback (most recent call last):
  File "/Users/rad182/Documents/rizkiarm-LipNet/training/unseen_speakers/train.py", line 78, in <module>
    train(run_name, 0, 5000, 3, 100, 50, 75, 32, 10)
  File "/Users/rad182/Documents/rizkiarm-LipNet/training/unseen_speakers/train.py", line 74, in train
    pickle_safe=True)
  File "/usr/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 1851, in fit_generator
    str(generator_output))
ValueError: output of generator should be a tuple `(x, y, sample_weight)` or `(x, y)`. Found: None

What are the steps to train using scripts/extract_mouth_batch.py?

How exactly are we supposed to do it? What are the steps? What all changes need to be made to the code? What are the correct folder structures?

Training error for overlapped speakers (exit status 1 error)

I tried running the training step in the form of python prepare.py [Path to video dataset] [Path to align dataset] [Number of samples]. Below is code I ran(screenshot below):

These are the parameter I used in my code:
[Path to video dataset] - C:\Users\SFeng\Miniconda2\envs\py3k\LipNet\training\overlapped_speakers\datasets\video
[Path to align dataset] - C:\Users\SFeng\Miniconda2\envs\py3k\LipNet\training\overlapped_speakers\datasets\align
[Number of samples] - 3
I have been using datasets s1, s2 and s3 and below are screenshots of what my align and video directories look like:

Is there any problem with the way I have set up my directories?

[Question] using saliency.py

@rizkiarm Thanks for open-sourcing this implementation! It looks very interesting.

I've been trying out the pre-trained model in /evaluation and have successfully used predict.py. When I try to run saliency.py, however, I get this error:

Traceback (most recent call last):
  File "saliency.py", line 9, in <module>
    from vis.visualization import visualize_saliency
ImportError: No module named vis.visualization

I've been looking for package named vis on the internet with no success. Can you clarify what this dependency is and where to find it?

Input shape error when training random_split

I keep getting this message when training random_split:
ValueError: Error when checking input: expected the_input to have shape (None, 75, 100, 50, 3) but got array with shape (50, 75, 360, 288, 3)

This is despite only running training for videos that have the correct shape. Have can I fix this?

Validation_steps error in train.py for random_split

I am getting an error with Keras when using the fit_generator. I tried specifying a value for "validation_steps" but it did not seem to work out.

Videos seen by model each epoch

My (very potentially incorrect) understanding of an "epoch" is a set of iterations over which the model is exposed to each item in the training set one time.

In trying to understand the system better, I created a very small training set composed of

s1/
    s1lbax4n
    s1swwp2s
    s1pwij3p
    s1bbaf2n
s2/
    s2lbax4n
    s2swwp2s
    s2pwij3p
    s2bbaf2n

and the corresponding .align files.

I modified unseen_speakers/train.py to train using the line

train(run_name, 0, 1, 3, 100, 50, 75, 32, 2)

so training would run for 1 epoch on a batch size of 2.

My output looks like this:

epoch is: 0
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
Train [0,1] 0:2
Epoch 1/1
epoch is: 0
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
Train [0,1] 2:4
In Curriculum.apply: NOT flipping video s2/s2swwp2s
In Curriculum.apply: NOT flipping video s1/s1lbax4n
In Curriculum.apply: NOT flipping video s1/s1swwp2s
In Curriculum.apply: flipping video s1/s1bbaf2n
Train [0,0] 4:6
Train [0,0] 6:8
In Curriculum.apply: flipping video s1/s1pwij3p
In Curriculum.apply: NOT flipping video s2/s2bbaf2n
In Curriculum.apply: NOT flipping video s2/s2lbax4n
In Curriculum.apply: NOT flipping video s2/s2pwij3p
Train [0,0] 0:2
Train [0,0] 2:4
In Curriculum.apply: flipping video s1/s1lbax4n
In Curriculum.apply: NOT flipping video s2/s2swwp2s
In Curriculum.apply: NOT flipping video s1/s1swwp2s
In Curriculum.apply: NOT flipping video s1/s1bbaf2n
Train [0,0] 4:6
Train [0,0] 6:8
In Curriculum.apply: NOT flipping video s1/s1pwij3p
In Curriculum.apply: flipping video s2/s2bbaf2n
In Curriculum.apply: flipping video s2/s2lbax4n
In Curriculum.apply: flipping video s2/s2pwij3p
1/4 [======>.......................] - ETA: 255s - loss: 191.3861Train [0,0] 0:2
In Curriculum.apply: flipping video s2/s2swwp2s
In Curriculum.apply: NOT flipping video s1/s1swwp2s

2/4 [==============>...............] - ETA: 168s - loss: 183.9747Train [0,0] 2:4
In Curriculum.apply: flipping video s1/s1lbax4n
In Curriculum.apply: flipping video s1/s1bbaf2n

3/4 [=====================>........] - ETA: 83s - loss: 180.0006 Train [0,0] 4:6
In Curriculum.apply: flipping video s1/s1pwij3p
In Curriculum.apply: NOT flipping video s2/s2lbax4n
epoch is: 0
Epoch 0: Curriculum(train: False, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
epoch is: 0
Epoch 0: Curriculum(train: False, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
epoch is: 0
Epoch 0: Curriculum(train: False, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)


[Epoch 0] Out of 256 samples: [CER: 30.250 - 1.440] [WER: 6.000 - 1.000] [BLEU: 0.325 - 0.325]

/Users/michiyosony/tensorflow/lib/python2.7/site-packages/nltk/translate/bleu_score.py:472: UserWarning: 
Corpus/Sentence contains 0 counts of 2-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  warnings.warn(_msg)

4/4 [==============================] - 1326s - loss: 173.2259 - val_loss: 145.0103

Process finished with exit code 0

Why does it appear that the model is exposed to 22 videos during the first epoch? From the paper, I would have expected 16 (the 8 training videos + 8 horizontally flipped training videos).

The 16 original videos loaded can be seen (organized) here (asterisks added):

In Curriculum.apply: flipping video s1/s1bbaf2n
In Curriculum.apply: flipping video s1/s1pwij3p
In Curriculum.apply: flipping video s1/s1lbax4n
**In Curriculum.apply: NOT flipping video s1/s1swwp2s**
**In Curriculum.apply: NOT flipping video s1/s1swwp2s**
In Curriculum.apply: NOT flipping video s1/s1bbaf2n
In Curriculum.apply: NOT flipping video s1/s1pwij3p
In Curriculum.apply: NOT flipping video s1/s1lbax4n

In Curriculum.apply: flipping video s2/s2bbaf2n
In Curriculum.apply: flipping video s2/s2lbax4n
In Curriculum.apply: flipping video s2/s2pwij3p
**In Curriculum.apply: NOT flipping video s2/s2swwp2s**
**In Curriculum.apply: NOT flipping video s2/s2swwp2s**
In Curriculum.apply: NOT flipping video s2/s2bbaf2n
In Curriculum.apply: NOT flipping video s2/s2lbax4n
In Curriculum.apply: NOT flipping video s2/s2pwij3p

In Curriculum.py I can see that each video has a 50% chance of being flipped horizontally. This looks like a slightly different implementation of "...we train on both the regular and the horizontally mirrored image sequence." (LipNet). Is there a motivation for leaving it to chance whether both a video and its mirror will be included (as opposed to the same video twice, as seen in the asterisked examples above)?

Cannot find installation of ffmpeg

Issue with the preprocessing

Hi,
I attempted to use your code on the grid dataset. In my case, I had processed the frames to be a tighter fit to the face and saw a far worse performance of the method. I traced it back to the way that you cut the mouth regions. It seems to me that you want to pad the mouth by 38% of its length (19% on each side). However, the way you have done this I believe is wrong since you take the x coordinates of the mouth edges and multiply them by 0.81 and 1.19 respectively and then calculate the width in order to normalize. This way is dependent on the location of the mouth. For example if the mouth edges are at x_left = 100 and x_right=200 then you calculate the width with padding to be 200 x 1.19 + 100 x 0.81 = 157. Lets assume that we have the same mouth now at a different position in the image x_left = 1100 and x_right=1200 then we calculate 1200 x 1.19 + 1100 x 0.81 = 537 which is drastically different even though the mouth actually has the same size. What you actually want is to find the width = x_left- x_right and then find 19% of it (i.e. 0.19 x width) and then add and subtract it to the edges respectively.

In your case the mouths are around x = 200 to 400 which leads to taking quite a lot of padding. If you don't want this then you might have to retrain with the new way of cropping the mouth. Also for anyone that has a different cropping of the grid database will not be able to use your code out-of-the-box

about abstract the image feature

hello,my research is related to the lipnet,now t want to get the features from this code with 3Dcnn, now i can run the code,but i dont know how to get the lip features.can you give me some tips? @rizkiarm @michiyosony

HI,

problem when transcribing a new video

Thank you for your great work! My question is that when I tried a new video and wanted to generate the transcription from the movement of the lip, I only got unrelated words such as [bin/green/c/nine/soon...] Are there some tricks for correctly running the model for an arbitrary video? Thanks in advance.

User worning

i am having the following issue:
Update your fit_generator call to the Keras 2 API
What should i do about it?
I know i am asking a lot, but i am working on a graduation project about lip readign
please if any one could help me.
Thanks a lot

Error while training overlapped speakers (question updated)

Can anyone help me with that?

This is the output while I am training overlapped speakers:


Loading dataset list from cache...
Found 950 videos for training.
Found 50 videos for validation.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
the_input (InputLayer)       (None, 75, 360, 288, 3)   0         
_________________________________________________________________
zero1 (ZeroPadding3D)        (None, 77, 364, 292, 3)   0         
_________________________________________________________________
conv1 (Conv3D)               (None, 75, 180, 144, 32)  7232      
_________________________________________________________________
batc1 (BatchNormalization)   (None, 75, 180, 144, 32)  128       
_________________________________________________________________
actv1 (Activation)           (None, 75, 180, 144, 32)  0         
_________________________________________________________________
spatial_dropout3d_1 (Spatial (None, 75, 180, 144, 32)  0         
_________________________________________________________________
max1 (MaxPooling3D)          (None, 75, 90, 72, 32)    0         
_________________________________________________________________
zero2 (ZeroPadding3D)        (None, 77, 94, 76, 32)    0         
_________________________________________________________________
conv2 (Conv3D)               (None, 75, 90, 72, 64)    153664    
_________________________________________________________________
batc2 (BatchNormalization)   (None, 75, 90, 72, 64)    256       
_________________________________________________________________
actv2 (Activation)           (None, 75, 90, 72, 64)    0         
_________________________________________________________________
spatial_dropout3d_2 (Spatial (None, 75, 90, 72, 64)    0         
_________________________________________________________________
max2 (MaxPooling3D)          (None, 75, 45, 36, 64)    0         
_________________________________________________________________
zero3 (ZeroPadding3D)        (None, 77, 47, 38, 64)    0         
_________________________________________________________________
conv3 (Conv3D)               (None, 75, 45, 36, 96)    165984    
_________________________________________________________________
batc3 (BatchNormalization)   (None, 75, 45, 36, 96)    384       
_________________________________________________________________
actv3 (Activation)           (None, 75, 45, 36, 96)    0         
_________________________________________________________________
spatial_dropout3d_3 (Spatial (None, 75, 45, 36, 96)    0         
_________________________________________________________________
max3 (MaxPooling3D)          (None, 75, 22, 18, 96)    0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, 75, 38016)         0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, 75, 512)           58787328  
_________________________________________________________________
bidirectional_2 (Bidirection (None, 75, 512)           1181184   
_________________________________________________________________
dense1 (Dense)               (None, 75, 28)            14364     
_________________________________________________________________
softmax (Activation)         (None, 75, 28)            0         
=================================================================
Total params: 60,310,524.0
Trainable params: 60,310,140.0
Non-trainable params: 384.0
_________________________________________________________________
Traceback (most recent call last):
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
SystemError: NULL result without error in PyObject_Call
Traceback (most recent call last):
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
SystemError: NULL result without error in PyObject_Call
Process Process-1:
Process Process-2:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
    self.run()
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/multiprocessing/process.py", line 114, in run
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
    self._target(*self._args, **self._kwargs)
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/site-packages/keras/engine/training.py", line 607, in data_generator_task
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/site-packages/keras/engine/training.py", line 607, in data_generator_task
    self.queue.put(generator_output)
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/multiprocessing/queues.py", line 101, in put
    if not self._sem.acquire(block, timeout):
KeyboardInterrupt
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
    self.queue.put(generator_output)
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/multiprocessing/queues.py", line 101, in put
    if not self._sem.acquire(block, timeout):
KeyboardInterrupt
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
Epoch 1/5000
Traceback (most recent call last):
  File "training/overlapped_speakers/train.py", line 79, in <module>
    train(run_name, speaker, 0, 5000, 3, 360, 288, 75, 32, 50)
  File "training/overlapped_speakers/train.py", line 74, in train
    pickle_safe=True)
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/yurzho/anaconda3/envs/lipnet/lib/python2.7/site-packages/keras/engine/training.py", line 1845, in fit_generator
    time.sleep(wait_time)
KeyboardInterrupt

OSError: [Errno 20] Not a directory:

I ran the training script for random_split but I very time I run the script, I get an error telling that the video is "not a directory":
/home/souheil/LipNet/training/random_split/train.py:69: UserWarning: Update your fit_generator call to the Keras 2 API: fit_generator(initial_epoch=0, verbose=1, generator=<lipnet.he..., workers=2, validation_data=<lipnet.he..., steps_per_epoch=7, epochs=20, callbacks=[<keras.ca..., max_queue_size=5, validation_steps=1, use_multiprocessing=True)
pickle_safe=True)
/usr/local/lib/python2.7/dist-packages/keras/engine/training.py:2023: UserWarning: Using a generator with use_multiprocessing=True and multiple workers may duplicate your data. Please consider using thekeras.utils.Sequence class. UserWarning('Using a generator with use_multiprocessing=True`'
2018-01-18 14:32:30.901083: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Epoch 0: Curriculum(train: True, sentence_length: -1, flip_probability: 0.5, jitter_probability: 0.05)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/keras/utils/data_utils.py", line 635, in data_generator_task
generator_output = next(self._generator)
File "/home/souheil/LipNet/lipnet/helpers/threadsafe.py", line 16, in next
return self.it.next()
File "/home/souheil/LipNet/lipnet/lipreading/generators.py", line 206, in next_train
ret = self.get_batch(cur_train_index, self.minibatch_size, train=True)
File "/home/souheil/LipNet/lipnet/lipreading/generators.py", line 148, in get_batch
video = Video().from_frames(path)
File "/home/souheil/LipNet/lipnet/lipreading/videos.py", line 114, in from_frames
frames_path = sorted([os.path.join(path, x) for x in os.listdir(path)])
OSError: [Errno 20] Not a directory: '/home/souheil/LipNet/training/random_split/datasets/video/s1/swwv8p.mpg'
Epoch 1/20
Traceback (most recent call last):
File "/home/souheil/LipNet/training/random_split/train.py", line 73, in
train(run_name, 0, 20, 3, 100, 50, 75, 32, 50)
File "/home/souheil/LipNet/training/random_split/train.py", line 69, in train
pickle_safe=True)
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 2115, in fit_generator
generator_output = next(output_generator)
File "/usr/local/lib/python2.7/dist-packages/keras/utils/data_utils.py", line 735, in get
six.reraise(value.class, value, value.traceback)
File "", line 3, in reraise
OSError: [Errno 20] Not a directory: '/home/souheil/LipNet/training/random_split/datasets/video/s1/swwv8p.mpg'

Have any of you had this issue and if so how can it be resolved?

can anyone help me on how to make this project work?

Getting Loading dataset list from cache... Found 0 videos for training. Found 0 videos for validation.

Hi,

Can anyone help me as I am getting the following error while training the model from train.py

Loading dataset list from cache...
Found 0 videos for training.
Found 0 videos for validation.

Layer (type) Output Shape Param #

the_input (InputLayer) (None, 75, 100, 50, 3) 0

zero1 (ZeroPadding3D) (None, 77, 104, 54, 3) 0

conv1 (Conv3D) (None, 75, 50, 25, 32) 7232

batc1 (BatchNormalization) (None, 75, 50, 25, 32) 128

actv1 (Activation) (None, 75, 50, 25, 32) 0

spatial_dropout3d_1 (Spatial (None, 75, 50, 25, 32) 0

max1 (MaxPooling3D) (None, 75, 25, 12, 32) 0

zero2 (ZeroPadding3D) (None, 77, 29, 16, 32) 0

conv2 (Conv3D) (None, 75, 25, 12, 64) 153664

batc2 (BatchNormalization) (None, 75, 25, 12, 64) 256

actv2 (Activation) (None, 75, 25, 12, 64) 0

spatial_dropout3d_2 (Spatial (None, 75, 25, 12, 64) 0

max2 (MaxPooling3D) (None, 75, 12, 6, 64) 0

zero3 (ZeroPadding3D) (None, 77, 14, 8, 64) 0

conv3 (Conv3D) (None, 75, 12, 6, 96) 165984

batc3 (BatchNormalization) (None, 75, 12, 6, 96) 384

actv3 (Activation) (None, 75, 12, 6, 96) 0

spatial_dropout3d_3 (Spatial (None, 75, 12, 6, 96) 0

max3 (MaxPooling3D) (None, 75, 6, 3, 96) 0

time_distributed_1 (TimeDist (None, 75, 1728) 0

bidirectional_1 (Bidirection (None, 75, 512) 3048960

bidirectional_2 (Bidirection (None, 75, 512) 1181184

dense1 (Dense) (None, 75, 28) 14364

softmax (Activation) (None, 75, 28) 0

Total params: 4,572,156
Trainable params: 4,571,772
Non-trainable params: 384

train.py:69: UserWarning: Update your fit_generator call to the Keras 2 API: fit_generator(generator=<lipnet.he..., steps_per_epoch=0.0, epochs=20, validation_data=<lipnet.he..., validation_steps=0.0, callbacks=[<keras.ca..., initial_epoch=0, verbose=1, workers=2, use_multiprocessing=False, max_queue_size=5)
pickle_safe=False)
Traceback (most recent call last):
File "train.py", line 73, in
train(run_name, 0, 20, 3, 100, 50, 75, 32, 50)
File "train.py", line 69, in train
pickle_safe=False)
File "C:\Users\karan.kumar.panda\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\karan.kumar.panda\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 2107, in fit_generator
raise ValueError('validation_steps=None is only valid for a'
ValueError: validation_steps=None is only valid for a generator based on the keras.utils.Sequence class. Please specify validation_steps or use the keras.utils.Sequence class.

Error in training using random_split

Epoch 1/20
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Users\bismil\AppData\Local\Programs\Python\Python35\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "C:\Users\bismil\AppData\Local\Programs\Python\Python35\lib\threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\bismil\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\training.py", line 606, in data_generator_task
generator_output = next(self._generator)
TypeError: 'threadsafe_iter' object is not an iterator

Traceback (most recent call last):
File "train.py", line 72, in
train(run_name, 0, 20, 3, 100, 50, 75, 32, 50)
File "train.py", line 68, in train
pickle_safe=False)
File "C:\Users\bismil\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\legacy\interfaces.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\Users\bismil\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\training.py", line 1851, in fit_generator
str(generator_output))
ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None

valueerror

ValueError: When using a generator for validation data, you must specify a value for validation_steps.
Found 0 videos for training.
Found 0 videos for validation.
there is trouble with" def get_video_frames(self, path):
videogen = skvideo.io.vreader(path)"or the video form,please give me some advise,thank you!

Print parentheses error

I'm running the training script for overlapped speakers s1 and I keep getting this error:

C:\Users\SFeng\Miniconda2\envs\py3k\LipNet>python training\overlapped_speakers\train.py s1
Using TensorFlow backend.
Traceback (most recent call last):
File "training\overlapped_speakers\train.py", line 3, in
from lipnet.lipreading.generators import BasicGenerator
File "c:\users\sfeng\miniconda2\envs\py3k\lipnet\lipnet\lipreading\generators.py", line 93
print "Error loading video: "+video_path
^
SyntaxError: Missing parentheses in call to 'print'

How do I resolve this?

rizkiarm / lipnet Goto Github PK

lipnet's Introduction

LipNet: End-to-End Sentence-level Lipreading

Results

Dependencies

Usage

Dataset

Pre-trained weights

Training

Prerequisites

Random split (Unmaintained)

Unseen speakers

Unseen speakers with curriculum learning

Overlapped Speakers

Overlapped Speakers with curriculum learning

Evaluation

Work in Progress

License

lipnet's People

Contributors

Stargazers

Watchers

Forkers

lipnet's Issues

Layer (type) Output Shape Param #

softmax (Activation) (None, 75, 28) 0

Layer (type) Output Shape Param #

softmax (Activation) (None, 75, 28) 0

Recommend Projects

Recommend Topics

Recommend Org