tomlepaine / fast-wavenet Goto Github PK
View Code? Open in Web Editor NEWSpeedy Wavenet generation using dynamic programming :zap:
License: GNU General Public License v3.0
Speedy Wavenet generation using dynamic programming :zap:
License: GNU General Public License v3.0
Using OS X 10.11.6, Python 2.7, tensorflow 0.9.0, the line
model = Model(num_time_samples=num_time_samples,
num_channels=num_channels,
gpu_fraction=gpu_fraction)
in the "demo" code produces the following error and fails:
Traceback (most recent call last):
File "/fast-wavenet/demo.py", line 17, in <module>
gpu_fraction=gpu_fraction)
File "/fast-wavenet/wavenet/models.py", line 47, in __init__
outputs, targets)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 454, in sparse_softmax_cross_entropy_with_logits
logits, labels, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1450, in _sparse_softmax_cross_entropy_with_logits
features=features, labels=labels, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 704, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2262, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1702, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 462, in _SparseSoftmaxCrossEntropyWithLogitsShape
input_shape = logits_shape.with_rank(2)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 641, in with_rank
raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape (?, 35315, 256) must have rank 2
Hi, I'm having an "out of memory" issue while running the demo.
Snippet:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:04:00.0
Total memory: 11.91GiB
Free memory: 11.67GiB
(full log below)
I have tried to lower the model parameters, but nothing seems to work. Do you have any advice?
Why does the demo take so much GPU memory?
Thanks a lot,
Daniele
Full log:
python demo.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3459] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:04:00.0
Total memory: 11.91GiB
Free memory: 11.67GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x48c4140
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.342
pciBusID 0000:0a:00.0
Total memory: 3.94GiB
Free memory: 487.88MiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x48c0320
E tensorflow/core/common_runtime/direct_session.cc:135] Internal: failed initializing StreamExecutor for CUDA device ordinal 2: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073648275456
Traceback (most recent call last):
File "demo.py", line 16, in
gpu_fraction=gpu_fraction)
File "/home/daniele/fast-wavenet-master/wavenet/models.py", line 54, in init
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
File "/home/daniele/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1186, in init
super(Session, self).init(target, graph, config=config)
File "/home/daniele/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 551, in init
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/home/daniele/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
Yes I'm kinda new to TF and still... Training, so bear with me for my lame questions.
I'm experiencing with the demo. It trained and generated correctly with the very short audio sample provided with the code, but then I wanted to try something different. I ran the demo on a short (abt. 20seconds) sample from a well-known Beethoven's symphony and then generated 300000 samples. Well, something strange happened: only the first half a second is fine, the rest of the generated sound is extremely noisy and barely recognizable.
In the code, I just changed the path of the input audio and the duration of the generated audio.
What am I doing wrong? Thank you for your patience in reading my post (and answering, if possible!)
Hi, I'm pretty new to machine learning, so maybe this is a silly question, but I was wondering how I would train this network using more than just one sample? Because it looks like the make_batch
function only loads a single file. Also, I'm not sure if this is related, but would I need to train a separate network for each class of generator? Or is there a way to label the training data? Any help and tutoring is much appreciated!
Ran your demo, and it worked fine. The generator can reproduce the training sample. Now I want to generate some new sounds, e.g., by changing the initial conditions. I tried changing the input sample to the generator, but that didn't change anything---it still reproduces the training excerpt. Any ideas? I'm wondering if the model has been overfit?
I use Google Colab (python 3).
I cloned the repo
!git clone https://github.com/tomlepaine/fast-wavenet.git
%cd fast-wavenet
Then I ran this:
from time import time
from wavenet.utils import make_batch
from wavenet.models import Model, Generator
from IPython.display import Audio
%matplotlib inline
And I got this error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-6-4dd95abb40c7> in <module>()
2
3 from wavenet.utils import make_batch
----> 4 from wavenet.models import Model, Generator
5
6 from IPython.display import Audio
/content/fast-wavenet/wavenet/models.py in <module>()
2 import numpy as np
3 import tensorflow as tf
----> 4 from layers import (_causal_linear, _output_linear, conv1d,
5 dilated_conv1d)
6
ModuleNotFoundError: No module named 'layers'
---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------
Did I miss something?
when running the demo using tensorflow .10, python 3.5 (anaconda), commit 20485a2 I get the following :
TypeError Traceback (most recent call last)
in ()
----> 1 generator = Generator(model)
/home/denis/fast-wavenet/wavenet/models.py in init(self, model, batch_size, input_size)
99 count += 1
100
--> 101 outputs = _output_linear(h)
102
103 out_ops = [tf.argmax(tf.nn.softmax(outputs), 1)]
/home/denis/fast-wavenet/wavenet/layers.py in _output_linear(h, name)
170
171 def _output_linear(h, name=''):
--> 172 with tf.variable_scope(name, reuse=True):
173 w = tf.get_variable('w')[0, :, :]
174 b = tf.get_variable('b')
/home/denis/anaconda3/lib/python3.5/contextlib.py in enter(self)
57 def enter(self):
58 try:
---> 59 return next(self.gen)
60 except StopIteration:
61 raise RuntimeError("generator didn't yield") from None
/home/denis/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py in variable_scope(name_or_scope, default_name, values, initializer, regularizer, caching_device, partitioner, custom_getter, reuse, dtype)
1350 """
1351 if default_name is None and not name_or_scope:
-> 1352 raise TypeError("If default_name is None then name_or_scope is required")
1353 if values is None:
1354 values = []
TypeError: If default_name is None then name_or_scope is required
I used Google colab (python 3 GPU)
How to reproduce the error:
!git clone https://github.com/tomlepaine/fast-wavenet.git
%cd fast-wavenet
!mkdir wavenet/assets
!cp assets/voice.wav wavenet/assets/voice.wav
%cd wavenet
from time import time
from utils import make_batch
from models import Model, Generator
from IPython.display import Audio
%matplotlib inline
inputs, targets = make_batch('assets/voice.wav')
num_time_samples = inputs.shape[1]
num_channels = 1
gpu_fraction = 1.0
model = Model(num_time_samples=num_time_samples,
num_channels=num_channels,
gpu_fraction=gpu_fraction)
Audio(inputs.reshape(inputs.shape[1]), rate=44100)
And you will see this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-afc93d99fff0> in <module>()
2 model = Model(num_time_samples=num_time_samples,
3 num_channels=num_channels,
----> 4 gpu_fraction=gpu_fraction)
5
6 Audio(inputs.reshape(inputs.shape[1]), rate=44100)
/content/fast-wavenet/wavenet/models.py in __init__(self, num_time_samples, num_channels, num_classes, num_blocks, num_layers, num_hidden, gpu_fraction)
34 rate = 2**i
35 name = 'b{}-l{}'.format(b, i)
---> 36 h = dilated_conv1d(h, num_hidden, rate=rate, name=name)
37 hs.append(h)
38
/content/fast-wavenet/wavenet/layers.py in dilated_conv1d(inputs, out_channels, filter_width, rate, padding, name, gain, activation)
136 with tf.variable_scope(name):
137 _, width, _ = inputs.get_shape().as_list()
--> 138 inputs_ = time_to_batch(inputs, rate=rate)
139 outputs_ = conv1d(inputs_,
140 out_channels=out_channels,
/content/fast-wavenet/wavenet/layers.py in time_to_batch(inputs, rate)
24 padded = tf.pad(inputs, [[0, 0], [pad_left, 0], [0, 0]])
25 transposed = tf.transpose(padded, perm)
---> 26 reshaped = tf.reshape(transposed, shape)
27 outputs = tf.transpose(reshaped, perm)
28 return outputs
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py in reshape(tensor, shape, name)
6480 if _ctx is None or not _ctx._eager_context.is_eager:
6481 _, _, _op = _op_def_lib._apply_op_helper(
-> 6482 "Reshape", tensor=tensor, shape=shape, name=name)
6483 _result = _op.outputs[:]
6484 _inputs_flat = _op.inputs
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
607 _SatisfiesTypeConstraint(base_type,
608 _Attr(op_def, input_arg.type_attr),
--> 609 param_name=input_name)
610 attrs[input_arg.type_attr] = attr_value
611 inferred_from[input_arg.type_attr] = input_name
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py in _SatisfiesTypeConstraint(dtype, attr_def, param_name)
58 "allowed values: %s" %
59 (param_name, dtypes.as_dtype(dtype).name,
---> 60 ", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
61
62
TypeError: Value passed to parameter 'shape' has DataType float32 not in list of allowed values: int32, int64
What did I miss? How to make it work in Google colab?
Preferably on LJSpeech dataset.
There is a problem on running the demo:
Even changing the code in layers.py by using int() to change the type to integer cannot solve the problem.
I don't know whether it is the problem of the python and tensorflow themselves because some issues are raised due to the incompatibility of the version.
I use python 3.5.3 and tensorflow 1.1.0
anyone can help please?
As is kown,
For a downsized model (4000hz vs 16000 sampling rate, 16 filters v/s 256, 2 stacks vs ??):
A Tesla K80 needs around ~4 minutes to generate one second of audio.
A recent macbook pro needs around ~15 minutes. Deepmind has reported that generating one second of audio with their model takes about 90 minutes.
Just to compare against Google's implementation.
Which models runs on consumer resources? Can fast wavenet do anything wavenet can do? Besides generating outputs way faster, and needing less memory, how much memory does it use? Is this repo based on the tensorflow-wavenet repo? Also, is fast wavenet in anyway a downgrade (output wise eg vocal synthesis) to wavenet? Thanks?
More questions about how to use coming soon to a GitHub thread near you!
Hey, would really love to try this out. Here are a couple things I find when trying to run the demo...
(This is with tensorflow 0.10.0 on Mac OS X El Capitan, 10.11.6.)
Traceback (most recent call last):
File "demo.py", line 15, in <module>
from wavenet.models import Model, Generator
File "/Users/myusername/exercises/neural/fast-wavenet/wavenet/models.py", line 4, in <module>
from layers import (_causal_linear, _output_linear, conv1d,
ImportError: No module named 'layers'
If you change line 4 of models.py so it reads "from wavenet.layers" instead of just "from layers", then this error goes away. That's easy.
The next error, I don't know how to fix...
Traceback (most recent call last):
File "demo.py", line 32, in <module>
gpu_fraction=gpu_fraction)
File "/Users/myusername/exercises/neural/fast-wavenet/wavenet/models.py", line 36, in __init__
h = dilated_conv1d(h, num_hidden, rate=rate, name=name)
File "/Users/myusername/exercises/neural/fast-wavenet/wavenet/layers.py", line 138, in dilated_conv1d
inputs_ = time_to_batch(inputs, rate=rate)
File "/Users/myusername/exercises/neural/fast-wavenet/wavenet/layers.py", line 26, in time_to_batch
reshaped = tf.reshape(transposed, shape)
File "/Users/myusername/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1383, in reshape
name=name)
File "/Users/myusername/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/op_def_library.py", line 455, in apply_op
as_ref=input_arg.is_ref)
File "/Users/myusername/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 620, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/Users/myusername/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/constant_op.py", line 179, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/Users/myusername/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/constant_op.py", line 162, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
File "/Users/myusername/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 353, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/Users/myusername/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 290, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got 35316.0 of type 'float' instead.
How do we fix that? I can see that right before the call to reshape,
transposed = Tensor("b0-l0/transpose:0", shape=(35316, ?, 1), dtype=float32)
Without Training, I can run generate.run directly, with dilation_num = 4, generate cost 19s, dilation_num = 8, cost 29s, dilation_num = 14, cost 43s, so the results seems not linearly?
from time import time
from wavenet.utils import make_batch
from wavenet.models import Model, Generator
#from IPython.display import Audio
inputs, targets = make_batch('assets/voice.wav')
num_time_samples = inputs.shape[1]
num_channels = 1
gpu_fraction = 1.0
model = Model(num_time_samples=num_time_samples,
num_channels=num_channels,
gpu_fraction=gpu_fraction)
#Audio(inputs.reshape(inputs.shape[1]), rate=44100)
tic = time()
#model.train(inputs, targets)
toc = time()
print('Training took {} seconds.'.format(toc-tic))
generator = Generator(model)
# Get first sample of input
input_ = inputs[:, 0:1, 0]
tic = time()
predictions = generator.run(input_, 32000)
toc = time()
print('Generating took {} seconds.'.format(toc-tic))
#Audio(predictions, rate=44100)
layers=4
$python test.py
WARNING:tensorflow:From /home/xianning.lu/sing/tf-no-mkl/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py:553: calling conv1d (from tensorflow.python.ops.nn_ops) with data_format=NHWC is deprecated and will be removed in a future version.
Instructions for updating:
`NHWC` for data_format is deprecated, use `NWC` instead
2018-11-09 18:17:04.603404: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING:tensorflow:From /home/xianning.lu/sing/tf-no-mkl/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py:189: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
Training took 0.0 seconds.
Make Generator.
Generating took 19.5070331097 seconds.
layers=8
$python test.py
WARNING:tensorflow:From /home/xianning.lu/sing/tf-no-mkl/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py:553: calling conv1d (from tensorflow.python.ops.nn_ops) with data_format=NHWC is deprecated and will be removed in a future version.
Instructions for updating:
`NHWC` for data_format is deprecated, use `NWC` instead
2018-11-09 18:17:51.435269: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING:tensorflow:From /home/xianning.lu/sing/tf-no-mkl/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py:189: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
Training took 0.0 seconds.
Make Generator.
Generating took 29.0180389881 seconds.
layers=14
$python test.py
WARNING:tensorflow:From /home/xianning.lu/sing/tf-no-mkl/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py:553: calling conv1d (from tensorflow.python.ops.nn_ops) with data_format=NHWC is deprecated and will be removed in a future version.
Instructions for updating:
`NHWC` for data_format is deprecated, use `NWC` instead
2018-11-09 18:15:56.114761: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
WARNING:tensorflow:From /home/xianning.lu/sing/tf-no-mkl/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py:189: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
Training took 0.0 seconds.
Make Generator.
Generating took 43.2759749889 seconds.
This is what I get on step 2. I tried working with the fixes that people suggested on the other posts, but have still not had any luck. Any suggestions would be greatly appreciated!
ValueErrorTraceback (most recent call last)
in ()
6 model = Model(num_time_samples=num_time_samples,
7 num_channels=num_channels,
----> 8 gpu_fraction=gpu_fraction)
9
10 Audio(inputs.reshape(inputs.shape[1]), rate=44100)
/root/shared/fast-wavenet-master/wavenet/models.py in init(self, num_time_samples, num_channels, num_classes, num_blocks, num_layers, num_hidden, gpu_fraction)
34 for i in range(num_layers):
35 rate = 2**i
---> 36 name = 'b{}-l{}'.format(b, i)
37 h = dilated_conv1d(h, num_hidden, rate=rate, name=name)
38 hs.append(h)
/root/shared/fast-wavenet-master/wavenet/layers.py in dilated_conv1d(inputs, out_channels, filter_width, rate, padding, name, gain, activation)
142 padding=padding,
143 gain=gain,
--> 144 activation=activation)
145 , conv_out_width, _ = outputs.get_shape().as_list()
146 new_width = conv_out_width * rate
/root/shared/fast-wavenet-master/wavenet/layers.py in conv1d(inputs, out_channels, filter_width, stride, padding, data_format, gain, activation, bias)
89 w = tf.get_variable(name='w',
90 shape=(filter_width, in_channels, out_channels),
---> 91 initializer=w_init)
92
93 outputs = tf.nn.conv1d(inputs,
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, custom_getter)
986 collections=collections, caching_device=caching_device,
987 partitioner=partitioner, validate_shape=validate_shape,
--> 988 custom_getter=custom_getter)
989 get_variable_or_local_docstring = (
990 """%s
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, custom_getter)
888 collections=collections, caching_device=caching_device,
889 partitioner=partitioner, validate_shape=validate_shape,
--> 890 custom_getter=custom_getter)
891
892 def _get_partitioned_variable(self,
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, custom_getter)
346 reuse=reuse, trainable=trainable, collections=collections,
347 caching_device=caching_device, partitioner=partitioner,
--> 348 validate_shape=validate_shape)
349
350 def _get_partitioned_variable(
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape)
331 initializer=initializer, regularizer=regularizer, reuse=reuse,
332 trainable=trainable, collections=collections,
--> 333 caching_device=caching_device, validate_shape=validate_shape)
334
335 if custom_getter is not None:
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape)
637 " Did you mean to set reuse=True in VarScope? "
638 "Originally defined at:\n\n%s" % (
--> 639 name, "".join(traceback.format_list(tb))))
640 found_var = self._vars[name]
641 if not shape.is_compatible_with(found_var.get_shape()):
ValueError: Variable b0-l0/w already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
File "wavenet/layers.py", line 91, in conv1d
initializer=w_init)
File "wavenet/layers.py", line 144, in dilated_conv1d
activation=activation)
File "wavenet/models.py", line 36, in init
name = 'b{}-l{}'.format(b, i)
In the prediction, the input is a single number. But if I want to use this code to solve tts, I have no idea how to start.
In the new paper, Google use filter width =3 to increase the receptive field.
Then how could we do inference with filter width 3?
My idea is use to Queue, because the dilation is still 2 times increased, the first Queue is used to store the first half of middle value, and the second Queue is used to store the second half middle value.
Output of first Queue then be enqueued into the second Queue.
such as:
current_state = q.dequeue()
push = q.enqueue([current_layer])
init_ops.append(init)
push_ops.append(push)
pre_state = None
if self.filter_width == 3:
q2 = tf.FIFOQueue(
1,
dtypes=tf.float32,
shapes=(self.batch_size, self.quantization_channels))
init2 = q2.enqueue_many(tf.zeros((1, self.batch_size, self.quantization_channels)))
pre_state = q2.dequeue()
push2 = q2.enqueue([current_state])
init_ops2.append(init2)
push_ops2.append(push2)
if self.filter_width == 2:
current_layer = self._generator_causal_layer(
current_layer, current_state)
if self.filter_width == 3:
current_layer = self._generator_causal_layer(
current_layer, current_state, pre_state)
...
with tf.name_scope('dilated_stack'):
for layer_index, dilation in enumerate(self.dilations):
with tf.name_scope('layer{}'.format(layer_index)):
q = tf.FIFOQueue(
dilation,
dtypes=tf.float32,
shapes=(self.batch_size, self.residual_channels))
init = q.enqueue_many(
tf.zeros((dilation, self.batch_size,
self.residual_channels)))
current_state = q.dequeue()
push = q.enqueue([current_layer])
init_ops.append(init)
push_ops.append(push)
pre_state = None
if self.filter_width == 3:
q2 = tf.FIFOQueue(
dilation,
dtypes=tf.float32,
shapes=(self.batch_size, self.residual_channels))
init2 = q2.enqueue_many(tf.zeros((dilation, self.batch_size, self.residual_channels)))
pre_state = q2.dequeue()
push2 = q2.enqueue([current_state])
init_ops2.append(init2)
push_ops2.append(push2)
output, current_layer = self._generator_dilation_layer(
current_layer, current_state, layer_index, dilation,
global_condition_batch, local_condition, pre_state)
outputs.append(output)
is that make sense?
Hello,
I am wondering to run it without GPU, and I tried with 'sess = tf.Session(config=tf.ConfigProto(device_count={'gpu':0}))'. And the training part is successful. However, when I start to generate,it seems cannot generate anything and jump out.
Is this possible to run without GPU?
hi
can we save model?
how to use it after if it generate only this training phrase?
I recommend you upload the voice samples to some websites like soundcloud in US or ximalaya in China. And we can listen the effect of your demo on line.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.