buriburisuri / speech-to-text-wavenet Goto Github PK

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow

License: Apache License 2.0

Python 94.52% Dockerfile 5.48%

speech-to-text-wavenet's Introduction

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNet

A tensorflow implementation of speech recognition based on DeepMind's WaveNet: A Generative Model for Raw Audio. (Hereafter the Paper)

Although ibab and tomlepaine have already implemented WaveNet with tensorflow, they did not implement speech recognition. That's why we decided to implement it ourselves.

Some of Deepmind's recent papers are tricky to reproduce. The Paper also omitted specific details about the implementation, and we had to fill the gaps in our own way.

Here are a few important notes.

First, while the Paper used the TIMIT dataset for the speech recognition experiment, we used the free VTCK dataset.

Second, the Paper added a mean-pooling layer after the dilated convolution layer for down-sampling. We extracted MFCC from wav files and removed the final mean-pooling layer because the original setting was impossible to run on our TitanX GPU.

Third, since the TIMIT dataset has phoneme labels, the Paper trained the model with two loss terms, phoneme classification and next phoneme prediction. We, instead, used a single CTC loss because VCTK provides sentence-level labels. As a result, we used only dilated conv1d layers without any dilated conv1d layers.

Finally, we didn't do quantitative analyses such as BLEU score and post-processing by combining a language model due to the time constraints.

The final architecture is shown in the following figure.

(Some images are cropped from [WaveNet: A Generative Model for Raw Audio](https://arxiv.org/abs/1609.03499) and [Neural Machine Translation in Linear Time](https://arxiv.org/abs/1610.10099))

Version

Current Version : 0.0.0.2

Dependencies ( VERSION MUST BE MATCHED EXACTLY! )

tensorflow == 1.0.0
sugartensor == 1.0.0.2
pandas >= 0.19.2
librosa == 0.5.0
scikits.audiolab==0.11.0

If you have problems with the librosa library, try to install ffmpeg by the following command. ( Ubuntu 14.04 )


sudo add-apt-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get dist-upgrade -y
sudo apt-get -y install ffmpeg

Dataset

We used VCTK, LibriSpeech and TEDLIUM release 2 corpus. Total number of sentences in the training set composed of the above three corpus is 240,612. Valid and test set is built using only LibriSpeech and TEDLIUM corpuse, because VCTK corpus does not have valid and test set. After downloading the each corpus, extract them in the 'asset/data/VCTK-Corpus', 'asset/data/LibriSpeech' and 'asset/data/TEDLIUM_release2' directories.

Audio was augmented by the scheme in the Tom Ko et al's paper. (Thanks @migvel for your kind information)

Pre-processing dataset

The TEDLIUM release 2 dataset provides audio data in the SPH format, so we should convert them to some format librosa library can handle. Run the following command in the 'asset/data' directory convert SPH to wave format.


find -type f -name '*.sph' | awk '{printf "sox -t sph %s -b 16 -t wav %s\n", $0, $0".wav" }' | bash

If you don't have installed sox, please installed it first.


sudo apt-get install sox

We found the main bottle neck is disk read time when training, so we decide to pre-process the whole audio data into the MFCC feature files which is much smaller. And we highly recommend using SSD instead of hard drive.
Run the following command in the console to pre-process whole dataset.


python preprocess.py

Training the network

Execute


python train.py ( <== Use all available GPUs )
or
CUDA_VISIBLE_DEVICES=0,1 python train.py ( <== Use only GPU 0, 1 )

to train the network. You can see the result ckpt files and log files in the 'asset/train' directory. Launch tensorboard --logdir asset/train/log to monitor training process.

We've trained this model on a 3 Nvidia 1080 Pascal GPUs during 40 hours until 50 epochs and we picked the epoch when the validatation loss is minimum. In our case, it is epoch 40. If you face the out of memory error, reduce batch_size in the train.py file from 16 to 4.

The CTC losses at each epoch are as following table:

epoch	train set	valid set	test set
20	79.541500	73.645237	83.607269
30	72.884180	69.738348	80.145867
40	69.948266	66.834316	77.316114
50	69.127240	67.639895	77.866674

Testing the network

After training finished, you can check valid or test set CTC loss by the following command.


python test.py --set train|valid|test --frac 1.0(0.01~1.0)

The frac option will be useful if you want to test only the fraction of dataset for fast evaluation.

Transforming speech wave file to English text

Execute


python recognize.py --file

to transform a speech wave file to the English sentence. The result will be printed on the console.

For example, try the following command.


python recognize.py --file asset/data/LibriSpeech/test-clean/1089/134686/1089-134686-0000.flac
python recognize.py --file asset/data/LibriSpeech/test-clean/1089/134686/1089-134686-0001.flac
python recognize.py --file asset/data/LibriSpeech/test-clean/1089/134686/1089-134686-0002.flac
python recognize.py --file asset/data/LibriSpeech/test-clean/1089/134686/1089-134686-0003.flac
python recognize.py --file asset/data/LibriSpeech/test-clean/1089/134686/1089-134686-0004.flac

The result will be as follows:


he hoped there would be stoo for dinner turnips and charrats and bruzed patatos and fat mutton pieces to be ladled out in th thick peppered flower fatan sauce
stuffid into you his belly counsiled him
after early night fall the yetl lampse woich light hop here and there on the squalled quarter of the browfles
o berty and he god in your mind
numbrt tan fresh nalli is waiting on nou cold nit husband

The ground truth is as follows:


HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FATTENED SAUCE
STUFF IT INTO YOU HIS BELLY COUNSELLED HIM
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
HELLO BERTIE ANY GOOD IN YOUR MIND
NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND

As mentioned earlier, there is no language model, so there are some cases where capital letters, punctuations, and words are misspelled.

pre-trained models

You can transform a speech wave file to English text with the pre-trained model on the VCTK corpus. Extract the following zip file to the 'asset/train/' directory.

Docker support

See docker README.md.

Future works

Language Model
Polyglot(Multi-lingual) Model

We think that we should replace CTC beam decoder with a practical language model
and the polyglot speech recognition model will be a good candidate to future works.

Other resources

Namju's other repositories

Citation

If you find this code useful please cite us in your work:


Kim and Park. Speech-to-Text-WaveNet. 2016. GitHub repository. https://github.com/buriburisuri/.

Authors

Namju Kim ([email protected]) at KakaoBrain Corp.

Kyubyong Park ([email protected]) at KakaoBrain Corp.

speech-to-text-wavenet's People

Contributors

Stargazers

Watchers

Forkers

klms ml-lab vyraun codeaudit jemisa nazifberat soroushmehr hitluobin clear-datacenter fireae djoldman motasay panyang qboticslabs mjunaidi aosmith manureghukumar daywednes johnsonc gopigrip7 johndpope mikeogezi cournape cpehle 000box clcarwin ironj augmify diwahars rockystevejobs asilvino lorenzfischer lyk125 icewwn tjadamlee dantodor marcbelmont zihaow21 hal2001 sn0wfree carloslema folkevil deepcompute xiaolongmeng backupmanager kissmonx kafkafield hemel-cse barbagrigia benjamesbabala xsongx vijaysudheer bradparks paway eternalnation happychallenge jjjjohnson sh4d0wst0rm nasrullahmahar sagaruprety mogaio chirayukong matthewwilfred libardo1 neomatrixcode faisal-w pursueorigin wanjinchang xzm2004260 tngamemo chagge savourylie chenguoguo hedgefair lakrish lab-x joseroubert08 dancres cogmeta renarl coderham recrack kjeanclaude sunilkgrao iprashantp youngdev pzelasko anhncs nonamestreet seguce92 minganlin solertis neverjoe leezqcst theolivenbaum nwpu-aslp sentimentron chrysolily yingkitw chanil1218

speech-to-text-wavenet's Issues

recognize.py Working!

@buriburisuri
If Dependencies are installed as described then it works!
Seems to brake with newer dependencies installed.
Used: pip freeze to see which version installed.
If a newer version is installed then uninstall: sudo pip uninstall sugartensor
Then install correct version: sudo pip install sugartensor==0.0.1.9
Tensorflow version: sudo pip show tensorflow
To install correct version of tensorflow:
sudo pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl

I created this with all the files needed for recognize.py
https://github.com/EN10/STT

Error: speaker_info.txt missing

Hi, I'm a beginner to Machine Learning concepts and I'm just trying to run the code. I have downloaded your pre-trained model (Extracted it to 'asset/train/ckpt/' as suggested) and I'm trying to run 'recognize.py' but I keep getting this error. I have not downloaded the VCTK dataset as I thought since I already have the pre-trained model, I don't need to download the dataset and train the model. Is my way of thinking correct or do I need to definitely download the VCTK dataset , extract it in 'asset/data/' and then run 'recognize.py'?

Thank you.

The training is so slow....

24h+, and only 7 epoch has been finshed with the default settings, and the GPU usage is quite low. My GPU is nvidia M40.
@buriburisuri How do you run 20 epoch finished on the Titan GPU in 30 h? Any special settings?

recognize error

when i run recognize.py i got this error. please help!

Traceback (most recent call last):
File "recognize.py", line 80, in
y = tf.sparse_to_dense(decoded[0].indices, decoded[0].get_shape(), decoded[0].values) + 1
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/ops/sparse_ops.py", line 554, in sparse_to_dense
name=name)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/ops/gen_sparse_ops.py", line 1032, in _sparse_to_dense
validate_indices=validate_indices, name=name)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/op_def_library.py", line 504, in apply_op
values, as_ref=input_arg.is_ref).dtype.name
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 702, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/constant_op.py", line 128, in _tensor_shape_tensor_conversion_function
"Cannot convert a partially known TensorShape to a Tensor: %s" % s)
ValueError: Cannot convert a partially known TensorShape to a Tensor: (?, ?)

No module named tensorboard.plugins

Any thoughts?

[ec2-user@ip-172-31-43-155 speech-to-text-wavenet]$ python recognize.py --file .                                                               /test.wav
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library                                                                libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library                                                                libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library                                                                libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library                                                                libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library                                                                libcurand.so.7.5 locally
Traceback (most recent call last):
  File "recognize.py", line 2, in <module>
    import sugartensor as tf
  File "/usr/lib/python2.7/dist-packages/sugartensor/__init__.py", line 7, in <m                                                               odule>
    from .sg_train import *
  File "/usr/lib/python2.7/dist-packages/sugartensor/sg_train.py", line 9, in <m                                                               odule>
    from tensorflow.contrib.tensorboard.plugins import projector
ImportError: No module named tensorboard.plugins

TypeError: init() got an unexpected keyword argument 'shape'

Hi,
I was trying to run train.py, but got the following error:

Traceback (most recent call last):
File "train.py", line 79, in
loss = logit.sg_ctc(target=y, seq_len=seq_len)
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_main.py", line 151, in wrapper
out = func(tensor, tf.sg_opt(kwargs))
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_loss.py", line 225, in sg_ctc
out = tf.nn.ctc_loss(tensor, opt.target.sg_to_sparse(), opt.seq_len, time_major=False)
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_main.py", line 151, in wrapper
out = func(tensor, tf.sg_opt(kwargs))
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_transform.py", line 256, in sg_to_sparse
shape=tf.shape(tensor).sg_cast(dtype=tf.int64))
TypeError: init() got an unexpected keyword argument 'shape'

I'm using Ubuntu 14.04LTS
GeForce GTX 970/PCIe/SSE2
tensorflow 0.12.1
sugartensor 0.0.2.3

Any help will be very appreciated!

why the training time is much longer using Tesla K80?

Thanks for share the code.
I trained the wavenet using VCTK dataset on Tesla K80. It turned out that I need to wait around 50 hours to complete the training process. As the document says "We've trained this model on a single Titan X GPU during 30 hours until 20 epoch", I wonder if I miss some important tricks when training the network, can you give me some suggestions? Thank you.

missing requirements.txt

A common practice for managing dependencies in python projects is to declare them in a requirements.txt file.

running only recognize.py with a wave file gives speaker-info.txt does not exist error

python recognize.py --file arctic_a0047.wav

File "recognize.py", line 27, in
data = VCTK(vocabulary_loading=True)
File "/home/justdial/speech-to-text-wavenet/data.py", line 45, in init
labels, wave_files = self._load_corpus(data_path)
File "/home/justdial/speech-to-text-wavenet/data.py", line 79, in _load_corpus
index_col=False, delim_whitespace=True)
File "/home/justdial/wavenet-speech/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/justdial/wavenet-speech/lib/python2.7/site-packages/pandas/io/parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/justdial/wavenet-speech/lib/python2.7/site-packages/pandas/io/parsers.py", line 730, in init
self._make_engine(self.engine)
File "/home/justdial/wavenet-speech/lib/python2.7/site-packages/pandas/io/parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/justdial/wavenet-speech/lib/python2.7/site-packages/pandas/io/parsers.py", line 1390, in init
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.cinit (pandas/parser.c:4184)
File "pandas/parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:8449)
IOError: File asset/data/speaker-info.txt does not exist

Training on own data _InvalidArgumentError

I try to training with own data, but I get a problem.
I don't change any of training parameters in train.py script

At least, that is fine to begin training.
INFO:tensorflow:0216:14:00:33.290:sg_train.py:306] Training started from epoch[001]-step[0].
Mtrain: 0%| | 0/1706 [00:00<?, ?b/s]^Mtrain: 0%| ................
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
Mtrain: 0%| | 2/1706 [00:08<2:31:47, 5.34s/b]^Mtrain: 0%| .................
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
Mtrain: 1%|2 | 17/1706 [00:45<1:02:16, 2.21s/b]^Mtrain: 1%|2 ..........................
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 655 to 720
Mtrain: 3%|7 | 52/1706 [02:19<1:20:40, 2.93s/b]^Mtrain: 3%|7 ..........................

and then the problem shows up.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Not enough time for target transition sequence (required: 76, available: 66), skipping data instance in batch: 4

How can I correct this??

Checkpoint restore error

When I try to run:
python recognize.py --file test.wav

I get an error with the checkpoint:
NotFoundError (see above for traceback): Tensor name "aconv1d_24/W" not found in checkpoint files asset/train/ckpt/model-020-45480

I have downloaded the pre-trained model zip and extracted it to asset/train/ckpt/

Not sure what problem is?

Tensor name "lyr-aconv1d_1/mean" not found in checkpoint files asset/train/ckpt\model-020-45480

I'm attempting to run python recognize.py --file asset/data/wav48/p225/p225_003.wav on:

Windows 7
Python 3.5.2
TensorFlow 0.12.1
sugartensor 0.0.2.3
pandas 0.19.2
librosa 0.4.3
tqdm 4.10.0-9175881

When I try to convert the speech in p225_003.wav into text, I receive the following traceback:

(tensorflow) D:\home\josephwinston\src\Remote\GIT\speech-to-text-wavenet>python recognize.py --file asset/data/wav48/p225/p225_003.wav
INFO:tensorflow:0109:15:54:36.033:data.py:60] VCTK corpus loaded.(total data=36395, total batch=2274)
Traceback (most recent call last):
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "lyr-aconv1d_1/mean" not found in checkpoint files asset/train/ckpt\model-020-45480
         [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "recognize.py", line 102, in <module>
    saver.restore(sess, tf.train.latest_checkpoint('asset/train/ckpt'))
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1388, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 766, in run
    run_metadata_ptr)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "lyr-aconv1d_1/mean" not found in checkpoint files asset/train/ckpt\model-020-45480
         [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]]

Caused by op 'save/RestoreV2_5', defined at:
  File "recognize.py", line 101, in <module>
    saver = tf.train.Saver()
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1000, in __init__
    self.build()
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 1030, in build
    restore_sequentially=self._restore_sequentially)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 624, in build
    restore_sequentially, reshape)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 361, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\training\saver.py", line 200, in restore_op
    [spec.tensor.dtype])[0])
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 441, in restore_v2
    dtypes=dtypes, name=name)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "C:\Users\hb55683\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Tensor name "lyr-aconv1d_1/mean" not found in checkpoint files asset/train/ckpt\model-020-45480
         [[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]]

The file exists:

ls -l asset/train/ckpt/model-020-45480
-rw-rw-r-- 1 HB55683 Domain Users 29919277 Nov 25 11:16 asset/train/ckpt/model-020-45480

The pre-trained models are from <(https://drive.google.com/open?id=0B3ILZKxzcrUyVWwtT25FemZEZ1k>

Running grep on asset/train/ckpt/model-020-45480 shows that lyr-aconv1d_1 is not in the pickle.

Could there perhaps be an issue with the saved models since they are not being correctly restored?

ValueError: Shape must be rank 1 but is rank 0 for 'CTCBeamSearchDecoder' (op: 'CTCBeamSearchDecoder') with input shapes: [?,1,28], [].

When I run the file recognize.py the following errors appear:

python recognize.py --file asset/data/wav48/p225/p225_003.wav
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
INFO:tensorflow:0225:23:55:48.653:data.py:41] VCTK vocabulary loaded.
Traceback (most recent call last):
File "recognize.py", line 77, in
decoded, _ = tf.nn.ctc_beam_search_decoder(logit.sg_transpose(perm=[1, 0, 2]), seq_len, merge_repeated=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/ctc_ops.py", line 258, in ctc_beam_search_decoder
merge_repeated=merge_repeated))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_ctc_ops.py", line 66, in _ctc_beam_search_decoder
merge_repeated=merge_repeated, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2397, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Shape must be rank 1 but is rank 0 for 'CTCBeamSearchDecoder' (op: 'CTCBeamSearchDecoder') with input shapes: [?,1,28], [].

ValueError in CTCLoss

Ubuntu 14.

python train.py results in :

INFO:tensorflow:0228:18:35:26.811:data.py:73] VCTK corpus loaded.(total data=36395, total batch=2274)                         
Traceback (most recent call last):                            
  File "train.py", line 79, in <module>                               
    loss = logit.sg_ctc(target=y, seq_len=seq_len)
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/sugartensor/sg_main.py", line 151, in wrapper                         
    out = func(tensor, tf.sg_opt(kwargs))       
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/sugartensor/sg_loss.py", line 226, in sg_ctc                          
    ctc_merge_repeated=opt.merge, time_major=False)                                           
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/tensorflow/python/ops/ctc_ops.py", line 145, in ctc_loss              
    ctc_merge_repeated=ctc_merge_repeated)    
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 164, in _ctc_loss         
    name=name)        
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op 
    op_def=op_def)    
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2397, in create_op          
    set_shapes_for_outputs(ret)               
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs   
    shapes = shape_func(op)                       
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)                                               
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn       
    debug_python_shape_fn, require_shape_fn)  
  File "/home/gvoysey/.local/share/virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl 
    raise ValueError(err.message)             
ValueError: Shape must be rank 1 but is rank 0 for 'CTCLoss' (op: 'CTCLoss') with input shapes: [?,16,28], [?,2], [?], [].

Any ideas?

tensorflow version 1.0.0
sugartensor version 1.0.0.1

train.py ERROR

after running python3 train.py
am getting like this

Traceback (most recent call last):
File "train.py", line 3, in
from data import VCTK
File "/home/user/Desktop/Project/speech-to-text-wavenet-master/data.py", line 142
print str_
^
SyntaxError: Missing parentheses in call to 'print'

Need a follow up regarding this.....

Train.py error

Ubuntu 16.04.1, Tensorflow 0.11.head, CUDA 8.0, CUDADNN 5.1.5

gpu@gpu:~/speech-to-text-wavenet$ python -c 'import tensorflow as tf; print(tf.version)'
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
0.11.head

ERROR:
gpu@gpu:~/speech-to-text-wavenet$ python train.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
INFO:tensorflow:1129:15:23:18.141:data.py:60] VCTK corpus loaded.(total data=36395, total batch=2274)
Traceback (most recent call last):
File "train.py", line 70, in
z, s = res_block(z, size=7, rate=r)
File "train.py", line 52, in res_block
conv_gate = tensor.sg_aconv1d(size=size, rate=rate, act='sigmoid', bn=True)
File "/home/gpu/.local/lib/python2.7/site-packages/sugartensor/sg_main.py", line 149, in wrapper
out = func(tensor, opt)
File "/home/gpu/.local/lib/python2.7/site-packages/sugartensor/sg_layer.py", line 91, in sg_aconv1d
w = init.he_uniform('W', (1, opt.size, opt.in_dim, opt.dim))
File "/home/gpu/.local/lib/python2.7/site-packages/sugartensor/sg_initializer.py", line 31, in he_uniform
return uniform(name, shape, s, dtype)
File "/home/gpu/.local/lib/python2.7/site-packages/sugartensor/sg_initializer.py", line 20, in uniform
initializer=tf.random_uniform_initializer(minval=-scale, maxval=scale))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 632, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable layers/aconv1d_1/W already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "/home/gpu/.local/lib/python2.7/site-packages/sugartensor/sg_initializer.py", line 20, in uniform
initializer=tf.random_uniform_initializer(minval=-scale, maxval=scale))
File "/home/gpu/.local/lib/python2.7/site-packages/sugartensor/sg_initializer.py", line 31, in he_uniform
return uniform(name, shape, s, dtype)
File "/home/gpu/.local/lib/python2.7/site-packages/sugartensor/sg_layer.py", line 91, in sg_aconv1d
w = init.he_uniform('W', (1, opt.size, opt.in_dim, opt.dim))

Train should download VCTK corpus if it does not already exist

Train.py error - ValueError: Shape must be rank 1 but is rank 0 for 'CTCLoss'

Exception during training. Can you help? I did not modify the code, just trying to get it running..

INFO:tensorflow:0308:21:40:05.892:data.py:72] VCTK corpus loaded.(total data=36395, total batch=9098)
Traceback (most recent call last):
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 671, in _call_cpp_shape_fn_impl
input_tensors_as_shapes, status)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\contextlib.py", line 66, in exit
next(self.gen)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 1 but is rank 0 for 'CTCLoss' (op: 'CTCLoss') with input shapes: [?,4,37], [?,2], [?], [].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/msi/Source/Repos/speech-to-text-wavenet/train.py", line 60, in
loss = logit.sg_ctc(target=y, seq_len=seq_len)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\sugartensor\sg_main.py", line 151, in wrapper
out = func(tensor, tf.sg_opt(kwargs))
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\sugartensor\sg_loss.py", line 226, in sg_ctc
ctc_merge_repeated=opt.merge, time_major=False)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\ops\ctc_ops.py", line 145, in ctc_loss
ctc_merge_repeated=ctc_merge_repeated)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\ops\gen_ctc_ops.py", line 164, in _ctc_loss
name=name)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 763, in apply_op
op_def=op_def)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 2329, in create_op
set_shapes_for_outputs(ret)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in set_shapes_for_outputs
shapes = shape_func(op)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\ops.py", line 1667, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "C:\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 676, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Shape must be rank 1 but is rank 0 for 'CTCLoss' (op: 'CTCLoss') with input shapes: [?,4,37], [?,2], [?], [].

Thanks! (just a quick appreciation letter)

You guys are wizards! It is fantastic that you managed to implement a paper were half the info was missing. Thanks for sharing!

ImportError: No module named 'sg_util'

I've installed (as far as I know) the Python 3 version of all your dependencies, including Tensorflow.

When running recognize.py I get:

colin@yumi ~/c/h/e/speech-to-text-wavenet> python recognize.py --file carlin_disappointed.wav 
Traceback (most recent call last):
  File "recognize.py", line 2, in <module>
    import sugartensor as tf
  File "/usr/lib/python3.5/site-packages/sugartensor/__init__.py", line 5, in <module>
    from sg_util import sg_opt
ImportError: No module named 'sg_util'

sugartensor is definitely installed. Thoughts?

would this algorithm works for other languages like mandarin?

would this algorithm works for other languages like mandarin? how the approach should be? Thanks.

What's the separations of train set/test set?

I have looked over the architecture of the data set, and I also viewed your code very carefully, but I don't think there are explicit separation of train/test set? Is it that you didn't separate the data set because the CTC loss do not need such separation? But it looks just a little weird.

Upgrade to sugartensor 0.0.2.0

Thanks for making this project!

I found that I needed to use the version of sugartensor 0.0.1.9.

Using 0.0.2.0 I get the following error when running recognize.py

INFO:tensorflow:1202:21:10:24.229:data.py:60] VCTK corpus loaded.(total data=36395, total batch=2274)
Traceback (most recent call last):
  File "recognize.py", line 62, in <module>
    z = x.sg_conv1d(size=1, dim=num_dim, act='tanh', bn=True)
  File "/Users/-------/anaconda3/envs/py27/lib/python2.7/site-packages/sugartensor/sg_main.py", line 147, in wrapper
    for t in tf.global_variables():
AttributeError: 'module' object has no attribute 'global_variables'

Error running recognize.py after training (and with pretrained models)

Note: Using tensorflow v0.12.1

I downloaded the dataset as instructed in the README into /data and was able to successfully run train.py through one full epoch (took about 8 hours without GPU with the batch size decreased to 4). And then around the time the second epoch was beginning I attempted to test the results so far by calling:

python recognize.py --file asset/data/wav48/p225/p225_001.wav

which I had expected to work as expected, since it's running on one of the training examples. But I got the following trace:

INFO:tensorflow:0120:10:08:54.924:data.py:41] VCTK vocabulary loaded.
Traceback (most recent call last):
  File "recognize.py", line 102, in <module>
    saver.restore(sess, tf.train.latest_checkpoint('asset/train/ckpt'))
  File "/Users/jeremy/Documents/Other/personal/learning/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1388, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/Users/jeremy/Documents/Other/personal/learning/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/Users/jeremy/Documents/Other/personal/learning/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/jeremy/Documents/Other/personal/learning/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/Users/jeremy/Documents/Other/personal/learning/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Unable to get element from the feed as bytes.

Then, thinking that perhaps my checkpoint was corrupted, I tried downloading the pretrained models as described in the README, but this produced the same error. In a StackOverflow post I noticed that a missing checkpoint file could cause this problem, but I have a file asset/train/checkpoint present, both in the pretrained and live training cases.

Please let me know if you know of anything I can do to get this working!

Training on own data -- ValueError

Since I couldn't find any specific notes on training on your own corpus, I assumed it'd maintain the same structure as VCTK (asset/data/) with text description in (asset/data/txt/) and audio files to be trained in (asset/data/wav48/).
Despite doing that I end up getting the following error trace:
Traceback (most recent call last): File "train.py", line 26, in <module> data = VCTK(batch_size=batch_size) File "/Users/anshpatel/Downloads/speech-to-text-wavenet-master/data.py", line 45, in __init__ labels, wave_files = self._load_corpus(data_path) File "/Users/anshpatel/Downloads/speech-to-text-wavenet-master/data.py", line 111, in _load_corpus self.max_len = np.max([len(s) for s in sents]) File "/Users/anshpatel/anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2252, in amax out=out, **kwargs) File "/Users/anshpatel/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py", line 26, in _amax return umr_maximum(a, axis, None, out, keepdims) ValueError: zero-size array to reduction operation maximum which has no identity

Any idea on what is causing such an error?

recognize.py --> AttributeError: 'SparseTensor' object has no attribute 'shape'

Hi,
I was trying the performance of the pre-trained model.
When I run python recognize.py --file asset/data/wav48/p225/p225_003.wav
I got the following error:

Traceback (most recent call last):
File "recognize.py", line 80, in
y = tf.sparse_to_dense(decoded[0].indices, decoded[0].shape, decoded[0].values) + 1
AttributeError: 'SparseTensor' object has no attribute 'shape'

But I am sure that SparseTensor has the attribute 'shape'
Did i miss something?
Any help will be appreciated!

Not Recognized New Speech

Finally, I trained model until under 9 loss.
I got an wav , "I'm Student" , from google translation.
It's result of "ahimrstuond".
it's like a mark of pronounciation.
Should not it be a word unit instead of a character unit?

Numpy missing in requirements.txt

steps to reproduce:
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

ollecting resampy>=0.1.0 (from librosa>=0.4.3->-r requirements.txt (line 3))
Downloading resampy-0.1.4.tar.gz (442kB)
100% |████████████████████████████████| 450kB 2.0MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/0v/h83f7t897wnfzkh127v1y0qh3f10dr/T/pip-build-yPf939/resampy/setup.py", line 6, in
import numpy as np
ImportError: No module named numpy

implement a validation loss

thanks for sharing the implementation, tested and works well on the training set but seems overfit to it, and resulted a bad generalization on testing set. do you have any observation on that?
I was then trying to implement to check the validation loss or validation WER, however, i was stuck on that logit.sg_reuse, because of the definition of the network is through x, residual block, ..., there are quite limited resources online regarding the sugartensor, could you point me an example how to build a validation module? thanks!

No such file or directory: 'asset/train/VCTK_vocabulary.npy'

On MacOS

Traceback (most recent call last):
  File "train.py", line 26, in <module>
    data = VCTK(batch_size=batch_size)
  File "/Users/husband/Documents/Work/speech-to-text-wavenet/data.py", line 45, in __init__
    labels, wave_files = self._load_corpus(data_path)
  File "/Users/husband/Documents/Work/speech-to-text-wavenet/data.py", line 116, in _load_corpus
    np.save(vocabulary_file, self.index2byte)
  File "/usr/local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 477, in save
    fid = open(file, "wb")
IOError: [Errno 2] No such file or directory: 'asset/train/VCTK_vocabulary.npy'

can be solved by:
mkdir asset/train & touch asset/train/VCTK_vocabulary.npy

Which features to implement now?

I think to add this features now.

Docker images
- to resolve python, tensorflow and sugartensor version conflict
- to help for just testing guys I want to include VCTK corpus and pre-trained weights
Data augmenting
- to resolve overfitting problem.
Quantative analysis

Please, reply features you think important !!!

Train.py error

ERROR: sugartensor-0.0.1.9/sugartensor/sg_queue.py

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.4 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally
Traceback (most recent call last):
File "train.py", line 26, in
data = VCTK(batch_size=batch_size)
File "/mnt/NFS2/zouwei/work/wavenet/speech-to-text-wavenet-master/data.py", line 43, in init
capacity=128, num_threads=32)
File "/mnt/NFS2/zouwei/tools/env/relays/sugartensor-0.0.1.9/sugartensor/sg_queue.py", line 49, in wrapper
runner = FuncQueueRunner(enqueue_func, queue, [enqueue_op] * opt.num_threads)
File "/mnt/NFS2/zouwei/tools/env/relays/sugartensor-0.0.1.9/sugartensor/sg_queue.py", line 69, in init
queue_closed_exception_types, queue_runner_def)
TypeError: init() takes at most 6 arguments (7 given

error with training own data

Hi
when I trained my own data with 16KHz sample rate, there is an error as following:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [165] rhs shape= [28]

my voca_size = data.voca_size = 165

How can I fix it?

Many thanks

Is sampling rate of audio important?

I have tested with audio files different than VCTK. The results were gibberish. Can it be the sampling rate difference?

Or the trained model just memorized the VCTK samples?

Error: got an unexpected keyword argument 'shape'

When I run the train.py the following errors appear:

root@7b5e7d552258:~/speech-to-text-wavenet# python train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
INFO:tensorflow:0225:23:49:51.252:data.py:73] VCTK corpus loaded.(total data=36395, total batch=2274)
Traceback (most recent call last):
File "train.py", line 79, in
loss = logit.sg_ctc(target=y, seq_len=seq_len)
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_main.py", line 151, in wrapper
out = func(tensor, tf.sg_opt(kwargs))
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_loss.py", line 225, in sg_ctc
out = tf.nn.ctc_loss(tensor, opt.target.sg_to_sparse(), opt.seq_len, time_major=False)
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_main.py", line 151, in wrapper
out = func(tensor, tf.sg_opt(kwargs))
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_transform.py", line 277, in sg_to_sparse
shape=tf.shape(tensor).sg_cast(dtype=tf.int64))
TypeError: init() got an unexpected keyword argument 'shape'

Missing tensor in pre-trained model

I'm trying to use the pre-trained model provided in the readme. When I run recognise.py it throws the following error

Traceback (most recent call last):
  File "recognize.py", line 103, in <module>
    saver.restore(sess, tf.train.latest_checkpoint('asset/train/ckpt'))
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1388, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
NotFoundError: Tensor name "lyr-conv1d_5/mean" not found in checkpoint files asset/train/ckpt/model-020-45480
	 [[Node: save/RestoreV2_217 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_217/tensor_names, save/RestoreV2_217/shape_and_slices)]]

Caused by op u'save/RestoreV2_217', defined at:
  File "recognize.py", line 102, in <module>
    saver = tf.train.Saver()
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1000, in __init__
    self.build()
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1030, in build
    restore_sequentially=self._restore_sequentially)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 624, in build
    restore_sequentially, reshape)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 361, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 200, in restore_op
    [spec.tensor.dtype])[0])
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/ops/gen_io_ops.py", line 441, in restore_v2
    dtypes=dtypes, name=name)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Library/Python/2.7/lib/python/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Tensor name "lyr-conv1d_5/mean" not found in checkpoint files asset/train/ckpt/model-020-45480
	 [[Node: save/RestoreV2_217 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_217/tensor_names, save/RestoreV2_217/shape_and_slices)]]

I'm using tensorflow-0.12.1
Any help will be much appreciated

whether your project can be applied on production? or it is only a demo ?

How to run without Tkinter dependency?

I'm running it on EC2 instance.

Problem with sugartensor 1.0.0

I am grateful your project the sugartensor.
Recently, I noticed speech-to-text-wavenet doesn't work with sugartensor 1.0.0.

train.py line 35
seq_len = tf.not_equal(x.sg_sum(dims=2), 0.).sg_int().sg_sum(dims=1)
is need to change to below.
seq_len = tf.not_equal(x.sg_sum(axis=2), 0.).sg_int().sg_sum(axis=1)

Thank you again for your project.

TypeError: reduce_sum() got an unexpected keyword argument 'axis'

root@ubuntu:/home/user/Desktop/speech-to-text-wavenet-master# python train.py
INFO:tensorflow:0303:01:36:26.310:data.py:73] VCTK corpus loaded.(total data=36395, total batch=2274)
Traceback (most recent call last):
File "train.py", line 35, in
seq_len = tf.not_equal(x.sg_sum(dims=2), 0.).sg_int().sg_sum(dims=1)
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_main.py", line 151, in wrapper
out = func(tensor, tf.sg_opt(kwargs))
File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_transform.py", line 300, in sg_sum
return tf.reduce_sum(tensor, axis=opt.axis, keep_dims=opt.keep_dims, name=opt.name)
TypeError: reduce_sum() got an unexpected keyword argument 'axis'

what is this error actually and what do i need to do exactly get rid and execute python train.py without any error

TypeError: init() got an unexpected keyword argument 'dense_shape'

when i use python3.5 , tensoflow 0.12.1 for windows , run train.py

runfile('E:/speech-to-text-wavenet-master/train.py', wdir='E:/speech-to-text-wavenet-master')
Reloaded modules: data
INFO:tensorflow:0310:14:10:48.482:data.py:73] VCTK corpus loaded.(total data=36395, total batch=9098)
Traceback (most recent call last):

File "", line 1, in
runfile('E:/speech-to-text-wavenet-master/train.py', wdir='E:/speech-to-text-wavenet-master')

File "d:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)

File "d:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "E:/speech-to-text-wavenet-master/train.py", line 79, in
loss = logit.sg_ctc(target=y, seq_len=seq_len)

File "d:\Anaconda3\lib\site-packages\sugartensor-1.0.0.1-py3.5.egg\sugartensor\sg_main.py", line 151, in wrapper
out = func(tensor, tf.sg_opt(kwargs))

File "d:\Anaconda3\lib\site-packages\sugartensor-1.0.0.1-py3.5.egg\sugartensor\sg_loss.py", line 225, in sg_ctc
out = tf.nn.ctc_loss(opt.target.sg_to_sparse(), tensor, opt.seq_len,

File "d:\Anaconda3\lib\site-packages\sugartensor-1.0.0.1-py3.5.egg\sugartensor\sg_main.py", line 151, in wrapper
out = func(tensor, tf.sg_opt(kwargs))

File "d:\Anaconda3\lib\site-packages\sugartensor-1.0.0.1-py3.5.egg\sugartensor\sg_transform.py", line 277, in sg_to_sparse
dense_shape=tf.shape(tensor).sg_cast(dtype=tf.int64))

TypeError: init() got an unexpected keyword argument 'dense_shape'

TypeError: init() takes at most 6 arguments (7 given)

Hi,
Might be a small issue, but I cannot get pass this. This is the error log (while executing python recognize.py --file p225_001.wav )

Traceback (most recent call last):
File "recognize.py", line 27, in
data = VCTK()
File "/Users/rs/Documents/in/Work/Development/speech-to-text-wavenet-master/data.py", line 54, in init
capacity=128, num_threads=32)
File "//anaconda/lib/python2.7/site-packages/sugartensor/sg_queue.py", line 49, in wrapper
runner = FuncQueueRunner(enqueue_func, queue, [enqueue_op] * opt.num_threads)
File "//anaconda/lib/python2.7/site-packages/sugartensor/sg_queue.py", line 69, in init
queue_closed_exception_types, queue_runner_def)
TypeError: init() takes at most 6 arguments (7 given)

Would be great if you can help me out!

train.py Error

Hello, after downloading VCTK corpus and moving it to asset/data, I tried to run train.py. I got the following output:

INFO:tensorflow:0307:21:29:31.864:data.py:73] VCTK corpus loaded.(total data=36395, total batch=9098)
Traceback (most recent call last):
File "train.py", line 82, in
tf.sg_train(log_interval=30, lr=0.0001, loss=loss, ep_size=data.num_batch, max_ep=20, early_stop=False)
File "/home/cc/.local/lib/python2.7/site-packages/sugartensor/sg_train.py", line 34, in sg_train
train_func(**opt)
File "/home/cc/.local/lib/python2.7/site-packages/sugartensor/sg_train.py", line 143, in wrapper
saver.restore(sess, last_file)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1345, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 717, in run
run_metadata_ptr)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 915, in _run
feed_dict_string, options, run_metadata)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _do_run
target_list, options, run_metadata)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 985, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [28] rhs shape= [1]
[[Node: save/Assign_394 = Assign[T=DT_FLOAT, _class=["loc:@layers/conv1d_18/b"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](layers/conv1d_18/b/MaxProp, save/restore_slice_394)]]

Caused by op u'save/Assign_394', defined at:
File "train.py", line 82, in
tf.sg_train(log_interval=30, lr=0.0001, loss=loss, ep_size=data.num_batch, max_ep=20, early_stop=False)
File "/home/cc/.local/lib/python2.7/site-packages/sugartensor/sg_train.py", line 34, in sg_train
train_func(**opt)
File "/home/cc/.local/lib/python2.7/site-packages/sugartensor/sg_train.py", line 116, in wrapper
keep_checkpoint_every_n_hours=opt.keep_interval)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 986, in init
self.build()
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1015, in build
restore_sequentially=self._restore_sequentially)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 620, in build
restore_sequentially, reshape)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 369, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 212, in restore
self.op.get_shape().is_fully_defined())
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
use_locking=use_locking, name=name)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 749, in apply_op
op_def=op_def)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2380, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/cc/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1298, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [28] rhs shape= [1]
[[Node: save/Assign_394 = Assign[T=DT_FLOAT, _class=["loc:@layers/conv1d_18/b"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](layers/conv1d_18/b/MaxProp, save/restore_slice_394)]]

Current versions:

tensorflow - 0.11.0 sugartensor - 0.0.1.9 librosa - 0.5.0 pandas - 0.19.2

Is this a common error that I making? Will continue to explore this but thought I would post in case there was a known answer. Thanks.

Train.py Error

I tried to run both train.py and recognize,py but I have the same error like this. Noted that i already place the data or the pre trained model at the right folder. By reading the error, my python might be the problem, please help me to fix it !
/usr/lib/pymodules/python2.7/matplotlib/rcsetup.py:378: UserWarning: tk.pythoninspect is obsolete, and has no effect
warnings.warn("tk.pythoninspect is obsolete, and has no effect")
Traceback (most recent call last):
File "train.py", line 3, in
from data import VCTK
File "/home/phong/speech-to-text-wavenet-master/data.py", line 5, in
import librosa
File "/usr/local/lib/python2.7/dist-packages/librosa/init.py", line 18, in
from . import display
File "/usr/local/lib/python2.7/dist-packages/librosa/display.py", line 30, in
_matplotlibrc = copy.deepcopy(mpl.rcParams)
File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/usr/lib/python2.7/copy.py", line 358, in _reconstruct
y[key] = value
File "/usr/lib/pymodules/python2.7/matplotlib/init.py", line 808, in setitem
cval = self.validatekey
File "/usr/lib/pymodules/python2.7/matplotlib/rcsetup.py", line 95, in validate_bool_maybe_none
raise ValueError('Could not convert "%s" to boolean' % b)
ValueError: Could not convert "None" to boolean

Tensorflow versions

Change the requirements of Tensorflow to rc0.12 from 0.11 otherwise sugartensor will report error as it requires this version:

Traceback (most recent call last): File "recognize.py", line 62, in <module> z = x.sg_conv1d(size=1, dim=num_dim, act='tanh', bn=True) File "/usr/local/lib/python2.7/dist-packages/sugartensor/sg_main.py", line 147 , in wrapper for t in tf.global_variables():

when trying to use recongize.py

Can I train it to use other languages?

if it is, what will be the steps?

Error training/recognizing on ubuntu

ValueError: Variable layers/aconv1d_1/W already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

  File "/home/seanfitz/.virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/sugartensor/sg_initializer.py", line 20, in uniform
    initializer=tf.random_uniform_initializer(minval=-scale, maxval=scale))
  File "/home/seanfitz/.virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/sugartensor/sg_initializer.py", line 31, in he_uniform
    return uniform(name, shape, s, dtype)
  File "/home/seanfitz/.virtualenvs/speech-to-text-wavenet/local/lib/python2.7/site-packages/sugartensor/sg_layer.py", line 91, in sg_aconv1d
    w = init.he_uniform('W', (1, opt.size, opt.in_dim, opt.dim))

bulky dependency on Tkinter

Probably coming in through matplotlib, but it would be nice for this not to be a requirement. Speech recognition should be a relatively headless exercise, and Tkinter does not cleanly install into virtualenvs at this time.

cant find the assets/data directory???

Sir,can you help me finding the assets/data directory to put the corpus..... after installation of all dependencies........

Does this code support multiple GPU cards?

Dear All,

Does this code could use multiple GPUs at the same time? I have eight GPUs in my server, but it seems it only used one GPU (although it occupied all GPUs). If I want to take advantage of all my GPUs, how should I modify the code (where to start)?

Thanks for your help and have a nice day!

Best Regards,
yuanfu