Comments (30)
@myagues i know that. BTW, in my case, somehow fixed shape is still faster than bucket sequence. BTW when i said i use bucket sequence before that mean i use tf.data.experimental.bucket_by_sequence_length :D
from tensorflowtts.
Filling up shuffle buffer (this may take a while)’
This mean the dataloader is calculating and cache dataset. After it finished, it will training model without preprocess anything.
from tensorflowtts.
Filling up shuffle buffer (this may take a while)’
This mean the dataloader is calculating and cache dataset. After it finished, it will training model without preprocess anything.
Is it normal to load data for half an hour?
from tensorflowtts.
@rgzn-aiyun i will enhance it tonight, so it will take around 5 minutes :D
from tensorflowtts.
@rgzn-aiyun i will enhance it tonight, so it will take around 5 minutes :D
Looking forward to the latest, I will continue to test!
from tensorflowtts.
@rgzn-aiyun dathudeptrai@4add642. Pls check if it work :)
from tensorflowtts.
@rgzn-aiyun reopen if it doesn't work.
from tensorflowtts.
@rgzn-aiyun 4add642. Pls check if it work :)
It only takes 5 seconds to load data, which is great.
from tensorflowtts.
Pls help me check if the output is same as the old code. :)))
from tensorflowtts.
Pls help me check if the output is same as the old code. :)))
Layer (type) Output Shape Param #
encoder (TFTacotronEncoder) multiple 8218624
decoder_cell (TFTacotronDeco multiple 18246402
post_net (TFTacotronPostnet) multiple 5460480
residual_projection (Dense) multiple 41040
Total params: 31,966,546
Trainable params: 31,956,306
Non-trainable params: 10,240
[train]: 0% 0/200000 [00:00<?, ?it/s]2020-06-09 02:13:17.096090: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 2533 of 9500
2020-06-09 02:13:27.096872: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 5071 of 9500
2020-06-09 02:13:37.098078: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 7625 of 9500
2020-06-09 02:13:44.514868: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
2020-06-09 02:13:52.435945: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] function_optimizer failed: Invalid argument: Node 'tacotron2/StatefulPartitionedCall/encoder/bilstm/forward_lstm/StatefulPartitionedCall_Func/tacotron2/StatefulPartitionedCall/output/_325': Connecting to invalid output 29 of source node tacotron2/StatefulPartitionedCall/encoder/bilstm/forward_lstm/StatefulPartitionedCall which has 29 outputs.
2020-06-09 02:13:52.674893: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] shape_optimizer failed: Out of range: src_output = 29, but num_outputs is only 29
2020-06-09 02:13:52.873193: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] layout failed: Out of range: src_output = 29, but num_outputs is only 29
[train]: 0% 100/200000 [15:01<471:01:16, 8.48s/it]2020-06-09 02:28:08,977 (train_tacotron2:253) INFO: (Step: 100) train_stop_token_loss = 0.3202.
2020-06-09 02:28:08,978 (train_tacotron2:253) INFO: (Step: 100) train_mel_loss_before = 0.4082.
2020-06-09 02:28:08,979 (train_tacotron2:253) INFO: (Step: 100) train_mel_loss_after = 0.8663.
2020-06-09 02:28:08,979 (train_tacotron2:253) INFO: (Step: 100) train_guided_attention_loss = 0.0041.
[train]: 0% 196/200000 [28:37<471:17:11, 8.49s/it]
The test should be no problem, but the training speed is too slow, so training for half an hour?
from tensorflowtts.
@rgzn-aiyun tacotron-2 training too slow because it's sequence to sequence. My 2080Ti run 4s/1it and it's normal, what is ur machine ?
from tensorflowtts.
@rgzn-aiyun tacotron-2 training too slow because it's sequence to sequence. My 2080Ti run 4s/1it and it's normal, what is ur machine ?
Tesla P100,It shouldn’t be so slow, less than 400 steps in an hour.
from tensorflowtts.
@rgzn-aiyun what is ur max char len and max mel len ? . https://github.com/dathudeptrai/TensorflowTTS/issues/19#issuecomment-636548991 reference here
from tensorflowtts.
@rgzn-aiyun what is ur max char len and max mel len ?
max_char_length: 290
max_mel_length: 1300
from tensorflowtts.
@rgzn-aiyun it's too long. You may need eliminate some sample to get smaller max_char and max_mel. On Ljspeech, max char is 170 and max_len is 800.
from tensorflowtts.
You can try to set os.environ["TF_GPU_THREAD_MODE"] = "gpu_private"
in the global scope of train_tacotron2.py
. Sometimes it helps to speed up the process a bit, and I don't know of any downsides.
from tensorflowtts.
You can try to set
os.environ["TF_GPU_THREAD_MODE"] = "gpu_private"
in the global scope oftrain_tacotron2.py
. Sometimes it helps to speed up the process a bit, and I don't know of any downsides.
TF_GPU_THREAD_MODE: Whether and how the GPU device uses its own threadpool. Possible values:
global: GPU uses threads shared with CPU in the main compute thread-pool. This is currently the default.
gpu_private: GPU uses threads dedicated to this device.
gpu_shared: All GPUs share a dedicated thread pool.
I will test. For me, the average speed is 400 steps per hour. There are only 8800 steps in 24 hours. If 200k trainings are required, it will take too long!
from tensorflowtts.
100k is enough, if u use my pretrained and finetune i think 50k is ok. You should eliminate some samples to get smaller max char len and max mel length. Ur length is too long. You can move long sentences to valid folder and training with short sentences.
from tensorflowtts.
I did some further changes to the Tacotron 2 data reading in my fork. It uses tf.data.experimental.bucket_by_sequence_length
, which groups mel spectrograms of similar length into the same batches. This means that when using variable shapes in training, less padding will be needed as most mels will have similar length (in my case buckets of 50), and makes training faster.
I don't know if this causes training problems (if you have few mels of a given length, those will be grouped together every epoch) so I did a short run with variable (bucket variable sizes each batch) and fixed shapes (maximum constant length each batch) to see if the differences were meaningful. It seems that variable shape gets higher training loss, but very similar in validation, and is faster than fixed shape.
from tensorflowtts.
@myagues i use bucket sequence before too. But as the tacotron notes you can see i said that fixed shape training 2x faster than dynamic shape and i cann't understand why. So i remove bucket sequence :(. BTW, what is ur tensorflow version. That's great if u create pull request :D
from tensorflowtts.
@myagues i use bucket sequence before too. But as the tacotron notes you can see i said that fixed shape training 2x faster than dynamic shape and i cann't understand why. So i remove bucket sequence :(. BTW, what is ur tensorflow version. That's great if u create pull request :D
I will test the dynamic shape, because the fixed shape is always wrong when saving the model soon after training.
from tensorflowtts.
I did some further changes to the Tacotron 2 data reading in my fork. It uses
tf.data.experimental.bucket_by_sequence_length
, which groups mel spectrograms of similar length into the same batches. This means that when using variable shapes in training, less padding will be needed as most mels will have similar length (in my case buckets of 50), and makes training faster.I don't know if this causes training problems (if you have few mels of a given length, those will be grouped together every epoch) so I did a short run with variable (bucket variable sizes each batch) and fixed shapes (maximum constant length each batch) to see if the differences were meaningful. It seems that variable shape gets higher training loss, but very similar in validation, and is faster than fixed shape.
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/executor.py", line 67, in wait
pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.DataLossError: Attempted to pad to a smaller size than the input element.
[train]: 0% 2/200000 [00:50<1389:46:04, 25.02s/it]
Can't seem to run?
from tensorflowtts.
@rgzn-aiyun are you running with reduction_factor=1
and n_mels=80
in config? It is the only thing that comes to my mind with this error, since n_mels
is hard coded and I have not tested a reduction factor other than 1.
@dathudeptrai before sending the pull I want to test that it works correctly with different configurations. In your version there is bucketing too, in tacotron_dataset.py#L121-L137, but if you have is_shuffle=True
then it won't have any effect.
from tensorflowtts.
Oh, ok I thought you meant the groupby
.
Well I don't know, I suppose fixed shape runs some optimizations that cannot be used in variable shape. For me fixed shape has a constant step of 3.76s/it (batch_size=16
), and variable shape between 2.7 and 4s/it at maximum, but generally closer to 3s/it than 4s/it.
from tensorflowtts.
@myagues what is ur version of tf and tf_addons?. I use batch_size 32, will check batch_size 16. There are some remain issues with this implementation, i don't know why i cann't apply mixed precision for tacotron :)). I didn't see anything wrong in my implementation :(
from tensorflowtts.
The same version as in setup.py
, TF v2.2.0 and TF-addons v0.9.1
I run with fp32, have not tried fp16 yet, although my card does not have tensor cores, so I don't think it will have much benefit in my case. I will try it later and see if I can find any errors.
from tensorflowtts.
Calculate the maximum value of char_lengths
nums = char_lengths
nums.sort()
max=nums[len(nums)-1]
min=nums[0]
print("Maximum:",max)
from tensorflowtts.
@rgzn-aiyun are you running with
reduction_factor=1
andn_mels=80
in config? It is the only thing that comes to my mind with this error, sincen_mels
is hard coded and I have not tested a reduction factor other than 1.@dathudeptrai before sending the pull I want to test that it works correctly with different configurations. In your version there is bucketing too, in tacotron_dataset.py#L121-L137, but if you have
is_shuffle=True
then it won't have any effect.
Yes, the default configuration file is used.
from tensorflowtts.
[train]: 0% 0/200000 [00:00<?, ?it/s]2020-06-11 09:50:22.090650: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 3244 of 9500
2020-06-11 09:50:32.089778: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 6514 of 9500
2020-06-11 09:50:41.202193: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
2020-06-11 09:50:46.995056: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] function_optimizer failed: Invalid argument: Node 'tacotron2/StatefulPartitionedCall/encoder/bilstm/forward_lstm/StatefulPartitionedCall_Func/tacotron2/StatefulPartitionedCall/output/_325': Connecting to invalid output 29 of source node tacotron2/StatefulPartitionedCall/encoder/bilstm/forward_lstm/StatefulPartitionedCall which has 29 outputs.
2020-06-11 09:50:47.193020: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] shape_optimizer failed: Out of range: src_output = 29, but num_outputs is only 29
2020-06-11 09:50:47.336215: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] layout failed: Out of range: src_output = 29, but num_outputs is only 29
[train]: 0% 50/200000 [06:13<374:38:00, 6.75s/it]2020-06-11 09:57:20,083 (base_trainer:144) INFO: Successfully saved checkpoint @ 50 steps.
[train]: 0% 86/200000 [11:10<375:00:07, 6.75s/it]Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1986, in execution_mode
yield
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 655, in _next_internal
output_shapes=self._flat_output_shapes)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2363, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.DataLossError: Attempted to pad to a smaller size than the input element. [Op:IteratorGetNext]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_tacotron2.py", line 507, in
main()
File "train_tacotron2.py", line 500, in main
resume=args.resume)
File "train_tacotron2.py", line 343, in fit
self.run()
File "/ai/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 72, in run
self._train_epoch()
File "/ai/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 92, in _train_epoch
for train_steps_per_epoch, batch in enumerate(self.train_data_loader, 1):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 631, in next
return self.next()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 670, in next
return self._next_internal()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py", line 661, in _next_internal
return structure.from_compatible_tensor_list(self._element_spec, ret)
File "/usr/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 1989, in execution_mode
executor_new.wait()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/executor.py", line 67, in wait
pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.DataLossError: Attempted to pad to a smaller size than the input element.
[train]: 0% 86/200000 [11:14<435:36:28, 7.84s/it]
This error is coming again? It doesn't seem to be a problem with raw-feat or norm-feats.
from tensorflowtts.
So it begins the training process, but at some point it encounters an item that cannot be padded because is larger than the pad applied.
I don't really know what could cause this, since all unknown dimensions should be padded to the largest in the batch.
from tensorflowtts.
Related Issues (20)
- Multi Speaker Training HOT 1
- Support Arabic Language HOT 2
- Tacotron2 Pre-training have difficulties
- Training Tacotron2 model became so slow after update HOT 1
- How do I get the RTF index HOT 1
- Japanese TTS model HOT 2
- Preprocessing error with ljspeech HOT 6
- tacotron2 parameter confusing, hop size configuration for databaker dataset is 256, not 300 HOT 1
- Installation on MacOS HOT 1
- Hifi-Gan config for Baker dataset HOT 1
- tensorflow-gpu==2.7.0 HOT 15
- Dose it support mutil speaker of chinese language ? HOT 1
- Android release as TTS engine HOT 7
- Train with another dataset HOT 2
- No module named 'tensorflow_tts' HOT 2
- Inference on MB MelGAN sounds great until testing on iOS HOT 3
- TensorFlowTTS support vietnamese HOT 2
- [MB_Melgan] Why is a model trained only generator is better than trained on both?
- support chinese HOT 2
- How to config CMakeLists.txt ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflowtts.