tensorspeech / tensorflowtts Goto Github PK

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Home Page: https://tensorspeech.github.io/TensorFlowTTS/

License: Apache License 2.0

Python 100.00%

speech-synthesis text-to-speech tensorflow2 melgan fastspeech real-time tts vocoder multi-speaker-tts fastspeech2

tensorflowtts's Issues

pqmf synthesis filter

Hi, why synthesis filter still use h_analysis here?

  # [subbands, 1, taps + 1] == [filter_width, in_channels, out_channels]
        analysis_filter = np.expand_dims(h_analysis, 1)
        analysis_filter = np.transpose(analysis_filter, (2, 1, 0))
        synthesis_filter = np.expand_dims(h_analysis, 0)
        synthesis_filter = np.transpose(synthesis_filter, (2, 1, 0))

Is this mb-melgan possible to use taco2's or fastspeech's mel output directly? and how about the RTF on cpu inference?

Will it support Chinese?

Will it support Chinese training and synthisis?

An error was encountered during data preprocessing

[root@localhost TensorflowTTS]# tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --stats ./dump/stats.npy --config preprocess/ljspeech_preprocess.yaml
2020-06-10 11:07:41.220189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-10 11:07:41.472070: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-06-10 11:07:41.472150: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: localhost.localdomain
2020-06-10 11:07:41.472174: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: localhost.localdomain
2020-06-10 11:07:41.472313: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 418.39.0
2020-06-10 11:07:41.472366: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 418.39.0
2020-06-10 11:07:41.472382: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 418.39.0
2020-06-10 11:07:41.473387: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-06-10 11:07:41.489617: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2100000000 Hz
2020-06-10 11:07:41.492554: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abc9916ef0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-10 11:07:41.492595: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/ysj/bin/tensorflow-tts-normalize", line 8, in
sys.exit(main())
File "/usr/local/anaconda3/envs/ysj/lib/python3.7/site-packages/tensorflow_tts/bin/normalize.py", line 107, in main
mel = scaler.transform(mel)
File "/usr/local/anaconda3/envs/ysj/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 794, in transform
force_all_finite='allow-nan')
File "/usr/local/anaconda3/envs/ysj/lib/python3.7/site-packages/sklearn/base.py", line 436, in _validate_data
self._check_n_features(X, reset=reset)
File "/usr/local/anaconda3/envs/ysj/lib/python3.7/site-packages/sklearn/base.py", line 373, in check_n_features
"The reset parameter is False but there is no "
RuntimeError: The reset parameter is False but there is no n_features_in attribute. Is this estimator fitted?

Running path problem

[root@localhost TensorflowTTS]# CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py --train-dir ./dump/train/ --dev-dir ./dump/valid/ --outdir ./examples/tacotron2/exp/train.tacotron2.v1/ --config ./examples/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mix_precision 0 --resume ""
Traceback (most recent call last):
File "examples/tacotron2/train_tacotron2.py", line 31, in
from examples.tacotron2.tacotron_dataset import CharactorMelDataset
ModuleNotFoundError: No module named 'examples'

Why do I get this error when running in the root directory ?

MelNet & other dataset than ljspeech

Hello,
thank you for this exellent and very intuitive implementation!

I am not familiar with TTS research, so my questions can probably be quite naive. :)

Do you also plan to implement MelNet? The audio results provided in the paper overview are quite impressive.
Is there any chance that in the long run you will train models with different dataset, other than ljspeech? This implementation is great, but the commercial applications (Google, Microsoft) have models trained with much better datasets.

P.S: As I said I am not familiar with TTS research, but there is plenty of exellent readers on librivox - all of them being in public domain. I have done plenty of audio-text matching tasks with aeneas library in the past. Would it be enough to emulate the structure of ljspeech dataset? With the quantity and lenght of samples in librivox I could probably create a dataset with 30-50 single-speaker hours of samples...

Once again thanks for your great work!

melgan residual stack config

Hi, In original melgan paper, the kernel size of the second conv1d layer is 3, I see that in this melgan residual config it's 1. Is it a better config?

I want to create organization repo and need enroll more member :D

after an open source time, I see a lot of things to do sụch as support more model like flow/glow, GAN, tensorrt, ... but 1 or 2 people may not be enough to do that. I am thinking about creating a organization repo and enroll more members with us to develop. In the future, I want to change the repo name to TensorSpeech and support speech-related problems like voice conversion, speech recognition, ... Does anyone want to join?: D.

Data enhancement leads to increased training time

The first thing to note is that tensorflow originally had a very useful library for data augmentation.
from tensorflow.keras.preprocessing.image import ImageDataGenerator

However, in the 2.0 official version, this library and the corresponding training method model.fit_generator() had problems, and the training time increased by 3-4 times. The answer given by the official staff of tensorflow is that this is indeed a bug（issue #33177）

And they decided to abandon this method instead of repairing it. The solution given is to directly use model.fit() to receive the data generated by ImageDataGenerator, but this creates an additional problem, the program prompts ‘Filling up shuffle buffer (this may take a while)’ before each epoch.

This is also a considerable extra time overhead. For me, my machine needs 10s, and 10s per epoch is unacceptable. Is there any better way to deal with it?

Double Decoder Consistency for Tacotron 2

Hi, I found this interesting post while regularly checking up on erogol's blog. Any chances of having it here? Having a Tacotron 2 that is practically immune to alignment problems sounds very state of the art.
For reference, Mozilla/TTS has it.

Fine-tuning Multi-Band MelGAN yields pure noise as soon as discriminator starts training

As the title says, I started fine-tuning the published mb-melgan v1 model and set the discriminator to start at 10k steps. As soon as it started, all the sample audios became pure noise.
I can confirm it's not the dataset since I could train MelGAN-STFT. If it helps, I'm using mixed precision for the generator but not the discriminator. I also had this problem with kan-bayashi's mb-melgan v2.

TypeError: Input 'filter' of 'Conv2D' Op has type float32 that does not match type int32 of argument 'input'.

I am training on CPU machine to make sure everything is okay before moving to GPU machine.
I get to 500 steps (first evaluation) and get the following type error in the attached
log.txt

I figure out which feature is the issue?

Help me learn Melgan.

Melgan learning is not possible.
It says there is no voice file.
Help me.

tensorflow-tts-preprocess: assert len(mel) == len(f0) == len(energy) AssertionError

My dataset which always worked properly gave me this error when running the preprocessing step.

2020-06-19 20:34:52.303453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[Preprocessing]:   0% 6/3431 [00:02<24:16,  2.35it/s]
[Preprocessing]:   0% 13/3431 [00:07<32:39,  1.74it/s]
[Preprocessing]:   0% 6/3431 [00:10<1:35:38,  1.68s/it]
[Preprocessing]:   2% 64/3431 [00:20<18:08,  3.09it/s]
[Preprocessing]:   1% 49/3431 [00:25<29:30,  1.91it/s]
[Preprocessing]:   2% 73/3431 [00:53<40:54,  1.37it/s]
[Preprocessing]:   3% 106/3431 [00:59<31:02,  1.78it/s]
[Preprocessing]:   0% 13/3431 [01:03<4:39:15,  4.90s/it]
[Preprocessing]:   2% 59/3431 [01:12<1:09:08,  1.23s/it]
[Preprocessing]:   6% 215/3431 [01:12<18:09,  2.95it/s]
[Preprocessing]:   6% 215/3431 [01:13<18:12,  2.94it/s]
[Preprocessing]:   0% 12/3431 [01:16<6:05:32,  6.41s/it]
[Preprocessing]:   1% 20/3431 [01:25<4:03:18,  4.28s/it]
[Preprocessing]:   2% 76/3431 [01:35<1:10:16,  1.26s/it]
[Preprocessing]:   5% 178/3431 [01:49<33:13,  1.63it/s]
[Preprocessing]:   6% 209/3431 [01:56<06:16,  8.55it/s]multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.6/dist-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 217, in save_to_file
    assert len(mel) == len(f0) == len(energy)
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/tensorflow-tts-preprocess", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 250, in main
[Preprocessing]:   6% 209/3431 [01:56<29:58,  1.79it/s]
    p.map(save_to_file, range(len(processor.items)))
  File "/usr/local/lib/python3.6/dist-packages/pathos/multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 644, in get
    raise self._value
AssertionError
[Preprocessing]:   0% 0/3431 [01:56<?, ?it/s]

When I run the normalization step it only does 1314 iterations instead of 3431 as it should. In addition, when trying to train I get this.

<ipython-input-12-8616bad8c9dc> in dotrain(inargs, ptpath, maxsteps)
    371         energy_stat=args.energy_stat,
    372         mel_length_threshold=mel_length_threshold,
--> 373         return_utt_id=False
    374     ).create(
    375         is_shuffle=config["is_shuffle"],

/content/TensorflowTTS/ttsexamples/fastspeech2/fastspeech2_dataset.py in __init__(self, root_dir, charactor_query, mel_query, duration_query, f0_query, energy_query, f0_stat, energy_stat, max_f0_embeddings, max_energy_embeddings, charactor_load_fn, mel_load_fn, duration_load_fn, f0_load_fn, energy_load_fn, mel_length_threshold, return_utt_id)
    101             mel_files = [mel_files[idx] for idx in idxs]
    102             charactor_files = [charactor_files[idx] for idx in idxs]
--> 103             duration_files = [duration_files[idx] for idx in idxs]
    104             mel_lengths = [mel_lengths[idx] for idx in idxs]
    105             f0_files = [f0_files[idx] for idx in idxs]

/content/TensorflowTTS/ttsexamples/fastspeech2/fastspeech2_dataset.py in <listcomp>(.0)
    101             mel_files = [mel_files[idx] for idx in idxs]
    102             charactor_files = [charactor_files[idx] for idx in idxs]
--> 103             duration_files = [duration_files[idx] for idx in idxs]
    104             mel_lengths = [mel_lengths[idx] for idx in idxs]
    105             f0_files = [f0_files[idx] for idx in idxs]

IndexError: list index out of range

I have no idea why this happens. My dataset is formatted exactly the same (like LJSpeech) as it was back when it was working.

Enabling mixed_precision for training

Hi, I would like to turn on mixed_precision setting for training so set "1" for the command below, but mixed_precision is still False after that. Do I miss something?

2020-06-25 06:58:32,264 (train_tacotron2:413) INFO: mixed_precision = False

CUDA_VISIBLE_DEVICES=0 python3 examples_models/tacotron2/train_tacotron2.py \
  --train-dir ./dump_lisa/train/ \
  --dev-dir ./dump_lisa/valid/ \
  --outdir ./examples_models/tacotron2/exp_lisa/train.tacotron2.v2_mixed_precision/ \
  --config ./examples_models/tacotron2/conf/tacotron2.v1.yaml \
  --use-norm 1
  --mixed_precision 1 \
  --resume ""

how to generate -durations.npy

there is no documentation about generating -durations.npy

SqueezeWave Implementation

Hi,
Will it be possible to add a TF2 implementation of SqueezeWave vocoder to this system? The performance is really fast and promising. I'm working on the same. But I'm not well versed with TF2 yet. I had quite a struggle trying to train the PyTorch implementation from the authors with my custom dataset even though it had almost the same characteristics as LJSpeech but double the size of dataset. I believe TF2 is more suitable for post training optimization and deployment.
Original Repo: https://github.com/tianrengao/SqueezeWave

Error following Tacotron2 tutorial

First, great project. Thanks a ton for maintaining it. I'm learning a lot going through the code.

I tried following the Tacotron2 tutorial -- downloaded the dataset, ran the preprocessing steps, and tried training. I ran into this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  [_Derived_]  Trying to access element 156 in a list with 156 elements.
         [[{{node while_21/body/_1/TensorArrayV2Read_1/TensorListGetItem}}]]
         [[tacotron2/StatefulPartitionedCall/encoder/bilstm/forward_lstm/StatefulPartitionedCall]] [Op:__inference__one_step_tacotron2_406287]

Function call stack:
_one_step_tacotron2 -> _one_step_tacotron2 -> _one_step_tacotron2

Is this an error you've seen before? Unless I've made a dumb mistake, I imagine anyone might experience this error since I'm following the tutorial. Any pointers on how I can debug this? I don't mind trying to solve it myself but I'm fairly new to Tensorflow2.

The full log is below:

$ CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py \
>   --train-dir /media/usb0/tts/ljspeech_dump/train/ \
>   --dev-dir /media/usb0/tts/ljspeech_dump/valid/ \
>   --outdir ./examples/tacotron2/exp/train.tacotron2.v1/ \
>   --config ./examples/tacotron2/conf/tacotron2.v1.yaml \
>   --use-norm 1 \
>   --mixed_precision 0 \
>   --resume ""

2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: hop_size = 256
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: format = npy
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: tacotron2_params = {'embedding_hidden_size': 512, 'initializer_range': 0.02, 'embedding_dropout_prob': 0.1, 'n_speakers': 1, 'n_conv_encoder': 5, 'encoder_conv_filters': 512, 'encoder_conv_kernel_sizes': 5, 'encoder_conv_activation': 'relu', 'encoder_conv_dropout_rate': 0.5, 'encoder_lstm_units': 256, 'n_prenet_layers': 2, 'prenet_units': 256, 'prenet_activation': 'relu', 'prenet_dropout_rate': 0.5, 'n_lstm_decoder': 1, 'reduction_factor': 1, 'decoder_lstm_units': 1024, 'attention_dim': 128, 'attention_filters': 32, 'attention_kernel': 31, 'n_mels': 80, 'n_conv_postnet': 5, 'postnet_conv_filters': 512, 'postnet_conv_kernel_sizes': 5, 'postnet_dropout_rate': 0.1, 'attention_type': 'lsa'}
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: batch_size = 32
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: remove_short_samples = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: allow_cache = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: mel_length_threshold = 32
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: is_shuffle = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: use_fixed_shapes = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 1e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: train_max_steps = 200000
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: save_interval_steps = 5000
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: eval_interval_steps = 500
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: log_interval_steps = 100
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: start_schedule_teacher_forcing = 200001
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: start_ratio_value = 0.5
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: schedule_decay_steps = 50000
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: end_ratio_value = 0.0
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: num_save_intermediate_results = 1
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: train_dir = /media/usb0/tts/ljspeech_dump/train/
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: dev_dir = /media/usb0/tts/ljspeech_dump/valid/
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: use_norm = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: outdir = ./examples/tacotron2/exp/train.tacotron2.v1/
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: config = ./examples/tacotron2/conf/tacotron2.v1.yaml
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: resume =
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: verbose = 1
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: mixed_precision = False
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: version = 0.6.1
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: max_mel_length = 871
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: max_char_length = 188
2020-06-27 17:02:49.799192: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-27 17:02:49.827610: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 17:02:49.827904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.23GiB/s
2020-06-27 17:02:49.828043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-27 17:02:49.828913: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-27 17:02:49.829804: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-27 17:02:49.829957: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-27 17:02:49.830830: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-27 17:02:49.831252: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-27 17:02:49.831346: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-06-27 17:02:49.831352: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-06-27 17:02:49.831534: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-27 17:02:49.835216: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 4200000000 Hz
2020-06-27 17:02:49.835388: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f5318000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-27 17:02:49.835400: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-27 17:02:49.836162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-27 17:02:49.836171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]
Model: "tacotron2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
encoder (TFTacotronEncoder)  multiple                  8218624
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple                  18246402
_________________________________________________________________
post_net (TFTacotronPostnet) multiple                  5460480
_________________________________________________________________
residual_projection (Dense)  multiple                  41040
=================================================================
Total params: 31,966,546
Trainable params: 31,956,306
Non-trainable params: 10,240
_________________________________________________________________
[train]:   0%|                                                                             | 0/200000 [00:00<?, ?it/s]2020-06-27 17:03:08.327047: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 4125 of 12445
2020-06-27 17:03:18.326677: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 8227 of 12445
2020-06-27 17:03:28.325737: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 12285 of 12445
2020-06-27 17:03:28.724872: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
Traceback (most recent call last):
  File "examples/tacotron2/train_tacotron2.py", line 513, in <module>
    main()
  File "examples/tacotron2/train_tacotron2.py", line 503, in main
    trainer.fit(train_dataset,
  File "examples/tacotron2/train_tacotron2.py", line 343, in fit
    self.run()
  File "/home/caleb/repos/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 72, in run
    self._train_epoch()
  File "/home/caleb/repos/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 94, in _train_epoch
    self._train_step(batch)
  File "examples/tacotron2/train_tacotron2.py", line 116, in _train_step
    self._one_step_tacotron2(charactor, char_length, mel, mel_length, guided_attention)
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  [_Derived_]  Trying to access element 156 in a list with 156 elements.
         [[{{node while_21/body/_1/TensorArrayV2Read_1/TensorListGetItem}}]]
         [[tacotron2/StatefulPartitionedCall/encoder/bilstm/forward_lstm/StatefulPartitionedCall]] [Op:__inference__one_step_tacotron2_406287]

Function call stack:
_one_step_tacotron2 -> _one_step_tacotron2 -> _one_step_tacotron2

[train]:   0%|                                                                             | 0/200000 [00:36<?, ?it/s]

Can't train MelGAN-STFT discriminator

I have been training MelGAN-STFT by finetuning it on the LJSpeech model. When it gets to discriminator_train_start_steps, it stops and tells me to restart. When I restart with the discriminator on less than the steps of the latest checkpoint (210k vs 220k), I get this:

ValueError: in user code:

    <ipython-input-13-8bd0e2e9cdea>:110 _one_step_generator  *
        p_hat = self.discriminator(y_hat)
    /content/TensorflowTTS/tensorflow_tts/models/melgan.py:441 call  *
        outs += [f(x)]
    /content/TensorflowTTS/tensorflow_tts/models/melgan.py:386 call  *
        x = f(x)
    /content/TensorflowTTS/tensorflow_tts/utils/group_conv.py:283 call  *
        self._convolution_op = nn_ops.Convolution(
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py:1063 __init__  **
        filter_shape[num_spatial_dims]))

    ValueError: number of input channels does not match corresponding dimension of filter, 16 != 4

Also, all my predictions have heavy metallic noise. I'm assuming this is due to lack of discriminator training.

inference time is more then pytorch version fastspeech , is there any additional layers added?

Does this Tacotron2 support phonetic training?

Training with phonemes instead of raw text could improve the performance of the models. If it does, will there be a phonetic pretrained model like ESPNet's?

fast speech mel normalization

when compute mean and scaler for mel-spectrogram before normization, mean and scaler are computed from all dataset and only the first frame mel?

mel = mel[0].numpy()

Difference btw decode_tacotron2.py vs tacotron2_inference.ipynb

The procedure of running inference of tacotron2 is not clear. It appears that running only "tacotron2_inference.ipynb" looks enough to me. What is "decode_tacotron2.py" for? Even running this code gives me errors as below.

2020-07-01 01:59:24.756772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow d
evice (/job:localhost/replica:0/task:0/device:GPU:0 with 299 MB memory) -> physical GPU (device: 0, name:
 GeForce GTX 1080 Ti, pci bus id: 0000:08:00.0, compute capability: 6.1)
2020-07-01 01:59:43.496259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully 
opened dynamic library libcublas.so.10                                 
2020-07-01 01:59:43.723366: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully 
opened dynamic library libcudnn.so.7
2020-07-01 01:59:44.157435: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn han
dle: CUDNN_STATUS_INTERNAL_ERROR                                  
2020-07-01 01:59:44.167314: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn han
dle: CUDNN_STATUS_INTERNAL_ERROR                                       
2020-07-01 01:59:44.172894: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn han
dle: CUDNN_STATUS_INTERNAL_ERROR                    
2020-07-01 01:59:44.175197: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn han
dle: CUDNN_STATUS_INTERNAL_ERROR   
Traceback (most recent call last):                                                             [105/1355]
  File "examples_models/tacotron2/decode_tacotron2.py", line 136, in <module>                            
    main()                                                                                               
  File "examples_models/tacotron2/decode_tacotron2.py", line 103, in main                                
    tacotron2._build()  # build model to be able load_weights.                                           
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/models/tacotron2.py", line 677, in _build  
    self(input_ids, input_lengths, speaker_ids, mel_outputs, mel_lengths, 10, training=True)             
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 968, $
n __call__                                                                                               
    outputs = self.call(cast_inputs, *args, **kwargs)                                                    
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 580, in __$
all__                                                                                                    
    result = self._call(*args, **kwds)                                                                   
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 644, in _c$
ll                                                                                                       
    return self._stateless_fn(*args, **kwds)                                                             
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2420, in __cal$
__                           
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1665, in _filt$
red_call                     
    self.captured_inputs)    
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call$
flat                         
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)                
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_exe
cute                        
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,
 so try looking to see if a warning log message was printed above.
         [[node encoder/conv_batch_norm/tf_tacotron_conv_batch_norm/conv_._0/conv1d (defined at /usr/loca
l/lib/python3.6/dist-packages/tensorflow_tts/models/tacotron2.py:86) ]]
         [[decoder/while/body/_1/decoder_cell/assert_positive/assert_less/Assert/AssertGuard/pivot_f/_265
/_47]]                      
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,
 so try looking to see if a warning log message was printed above.
         [[node encoder/conv_batch_norm/tf_tacotron_conv_batch_norm/conv_._0/conv1d (defined at /usr/loca
l/lib/python3.6/dist-packages/tensorflow_tts/models/tacotron2.py:86) ]]
0 successful operations.     
0 derived errors ignored. [Op:__inference_call_8385]
                                
Function call stack:               
call -> call

Fine-tuning does not work

Hi,
I am trying to fine-tune the LJSpeech pretrained tacotron 2 model to fit custom voice dataset in English. I made the changes as mentioned for rebuilding with a new Embedding layer. But it fails to build it the second time. Since my dataset is also english but with different voice, I used the same vocab_size.
This is the change made for the finetuning code with respect to the training code.
`pretrained_config = Tacotron2Config(**config["tacotron2_params"])

tacotron2 = TFTacotron2(pretrained_config, training=True, name='tacotron2')

tacotron2._build()

tacotron2.summary()

tacotron2.load_weights(path)

pretrained_config.vocab_size = len(symbols)

new_embedding_layers = TFTacotronEmbeddings(pretrained_config, name='embeddings')

tacotron2.encoder.embeddings = new_embedding_layers

# re-build model

tacotron2._build() #BREAKS HERE

tacotron2.summary()`

Error:
`Total params: 31,966,546
Trainable params: 31,956,306
Non-trainable params: 10,240

2020-06-22 10:26:33.976375: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at resource_variable_ops.cc:637 : Not found: Resource localhost/encoder/embeddings/character_embeddings/weight_147/N10tensorflow3VarE does not exist.
Traceback (most recent call last):
File "examples/tacotron2/finetune_tacotron.py", line 518, in
main()
File "examples/tacotron2/finetune_tacotron.py", line 471, in main
tacotron2._build()
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow_tts/models/tacotron2.py", line 677, in _build
self(input_ids, input_lengths, speaker_ids, mel_outputs, mel_lengths, 10, training=True)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in call
outputs = self.call(cast_inputs, *args, **kwargs)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in call
result = self._call(*args, **kwds)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 611, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2420, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
self.captured_inputs)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 598, in call
ctx=ctx)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
(0) Failed precondition: Error while reading resource variable encoder/embeddings/LayerNorm/beta_164 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/beta_164/N10tensorflow3VarE does not exist.
[[node encoder/embeddings/LayerNorm/batchnorm/ReadVariableOp (defined at /anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow_tts/models/tacotron2.py:153) ]]
[[decoder/while/body/_1/decoder_cell/assert_positive/assert_less/Assert/AssertGuard/pivot_f/_289/_53]]
(1) Failed precondition: Error while reading resource variable encoder/embeddings/LayerNorm/beta_164 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/beta_164/N10tensorflow3VarE does not exist.
[[node encoder/embeddings/LayerNorm/batchnorm/ReadVariableOp (defined at /anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow_tts/models/tacotron2.py:153) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_call_8492]

Function call stack:
call -> call`

Since the end goal for me is to just change the voice, i initially tried to train the model further from the pretrained checkpoint with my dataset since this worked for me with PyTorch Tacotron2 implementation from Nvidia. But somehow the model's voice doesn't change here while the quality is getting better. That is why i decided to redo the embedding layer fine-tune.
Should I train it from scratch?

Tacotron2 end2end sample

Could you provide a end2end sample showing how to use the Tacotron2 pretrained model? I tried to reach the source code but couldn't figure out what to feed to each parameter.

Thanks!

Error when loading pretrained model ljspeech to training another language

When i tried to load pretrained model(model_65000.h5) like the tutorials in the examples/tacotron2/README.md to train on the other language, i got the error like this:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable encoder/embeddings/LayerNorm/beta_164 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/beta_164/N10tensorflow3VarE does not exist.

And when i used another pretrained(model_40000.h5), i got another error:

(0) Failed precondition: Error while reading resource variable encoder/embeddings/LayerNorm/gamma_157 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/gamma_157/N10tensorflow3VarE does not exist.
[[node encoder/embeddings/LayerNorm/batchnorm/mul/ReadVariableOp (defined at /data/linhld6/Project/TensorflowTTS-master/tensorflow_tts/models/tacotron2.py:163) ]]
[[decoder/while/LoopCond/_71/_86]]
(1) Failed precondition: Error while reading resource variable encoder/embeddings/LayerNorm/gamma_157 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/gamma_157/N10tensorflow3VarE does not exist.

Can anyone help me to fix this error?! Thanks!

Error when loading data for train tacotron2 model

when i load my own dataset to training tacotron2 model, i got this error:

tensorflow.python.framework.errors_impl.DataLossError: Attempted to pad to a smaller size than the input element. [Op:IteratorGetNext]

Can anyone help me to fixed this error.

Saving entire model

Hello I tried to save the entire model for Tacotron-2, instead of the weights, as an h5 file. However, I am getting the following error

NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.

I used the following code to successfully load the weights:

tacotron2 = TFTacotron2(config=fs_config, name="tacotron2", training=False)
tacotron2._build()  
tacotron2.load_weights("tacotron2.h5")

Then I tried to call tacotron2.save("full_tacotron2.h5") and I got the afformentioned error. Should I modify the trainers/base_trainer.py as follows and re-train or is there another way to save the entire model as an h5 file?

def save_checkpoint(self):
      """Save checkpoint."""
      self.ckpt.steps.assign(self.steps)
      self.ckpt.epochs.assign(self.epochs)
      self.ckp_manager.save(checkpoint_number=self.steps)
      self.model.save_weights(self.saved_path + 'model-{}.h5'.format(self.steps))
      self.model.save_model(self.saved_path + 'model-total{}.h5'.format(self.steps))

Some questions

Are the mel outputs generated compatible with kan-bayashi's ParallelWaveGAN?
There's a FastSpeech synthesis example, but not Tacotron2. How to generate speech with the Tacotron2 pretrained model and MelGAN-STFT?

pretrain model of melgan

Hi, I tried to use run melgan on your pretrained model, but I found there is no ckpt data, only discriminator and generator's h5 files.
I read the train_melgan.py, found it needs --resume parameter, but I think it needs ckpt file to get steps and epochs.
Could you upload the ckpt files along with the pre-trained model or is there other way to do it ?

Thank you for your awesome repo. It is really well constructed and I can't wait to see its performance.

melgan missing weightnorm.

Maybe we can add using
https://www.tensorflow.org/addons/tutorials/layers_weightnormalization

Extract duration from tacotron2 model

I don't send any issue to this comment. Following to the tutorial in training model fastspeech2, we have to extract the duration from alignment of tacotron2 model( on function get_duration_from_alignment on file extract_duration.py).
I just want to know what exactly of this term "duration". Anyone help me to figure out this definition?!

FastSpeech 2

Hello, thank you for this project. I'm aware of two different implementations of FastSpeech, any plan to support the recent FastSpeech 2 architecture?

Thank you very much.

Tacotron2 teacher forcing

@dathudeptrai Training Sampler used teacher forching(ground_truth mel-spectrogram frames) during all training period. Did you met Explosure bias problem? audio quality always good in inference time?

fastspeech inference duration_outputs

when run fastspeech in inference mode, why use exp op after duration_output from duration predictor?

duration_outputs = tf.math.exp(duration_outputs) - 1.0

Deployment of frozen models with tf-addons

Hi,
For freezing the graph and deploying on target device, is it necessary to have Tensorflow-Addons installed on the device? Or will it be possible for just tensorflow 2.x package to perform inference from the frozen graph. Sorry but I have never used Tensorflow-Addons before and dont have much knowledge about it.

Error

I ran into an error when I run

tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml

The error is

Traceback (most recent call last):
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/multiprocess/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/tensorflow_tts/bin/preprocess.py", line 215, in save_to_file
    np.save(os.path.join(args.outdir, subdir, "wavs", f"{utt_id}-wave.npy"),
  File "<__array_function__ internals>", line 5, in save
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/numpy/lib/npyio.py", line 541, in save
    fid = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: './dump/train/wavs/LJ001-0001-wave.npy'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/trunk/miniconda3/envs/tts_env/bin/tensorflow-tts-preprocess", line 8, in <module>
    sys.exit(main())
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/tensorflow_tts/bin/preprocess.py", line 228, in main
    p.map(save_to_file, range(len(processor.items)))
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/pathos/multiprocessing.py", line 137, in map
[Preprocessing]:   0%|                                | 0/13100 [00:04<?, ?it/s]
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/multiprocess/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/multiprocess/pool.py", line 768, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: './dump/train/wavs/LJ001-0001-wave.npy'
^[[A^[[A^[[B

I am running Manjaro Linux 20.3 XFCE.
My folder structure is

datasets
│   ├── LJSpeech-1.1--
│   ├── metadata.csv
│   ├── README
│   └── wavs
├── dump
│   ├── train_utt_ids.npy
│   └── valid_utt_ids.npy
├── examples
│   ├── fastspeech
│   ├── melgan
│   ├── melgan.stft
│   ├── multiband_melgan
│   └── tacotron2
├── google9b8578adaee731be.html
├── LICENSE
├── notebooks
│   └── tacotron2_inference.ipynb
├── preprocess
│   └── ljspeech_preprocess.yaml
├── README.md
├── setup.cfg
├── setup.py
├── tensorflow_tts
│   ├── bin
│   ├── configs
│   ├── datasets
│   ├── init.py
│   ├── losses
│   ├── models
│   ├── optimizers
│   ├── processor
│   ├── trainers
│   └── utils
├── test
│   ├── test_fastspeech.py
│   ├── test_mb_melgan.py
│   ├── test_melgan_layers.py
│   ├── test_melgan.py
│   └── test_tacotron2.py
└── tts
├── bin
├── include
├── lib
├── lib64 -> lib
└── pyvenv.cfg

config length of the melspectrogram input

When i training tacotron2 model with my own data, some of mel-spetrogram has timestep > 2000 so I always get this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [16,2000,80] vs. [16,2023,80]

I dkn that where to set the timestep to higher value to adapt my data.
Thanks

griffin-lim implementation for quick sanity check of tacotron output

It'd be helpful if there is some griffin-lim code to check if tacotron training is OK before training vocoders like MelGAN.

tensorflow-tts-normalize: "UnboundLocalError: local variable 'subdir' referenced before assignment"

I've formatted my dataset like the LJSpeech one in the README so I can skip writing a dataloader for finetuning.
This is my directory

And this is my metadata.csv. I've made it fileid|transcription|transcription because in ljspeech.py there was text = parts[2] which was giving me index out of range errors with just fileid|trans

And this is a small portion of os.listdir("wavs")

file0816.wav
file0039.wav
file2292.wav
file2433.wav
file0794.wav
file1314.wav
file2486.wav
file0695.wav
file2564.wav

All the preprocessing steps run fine until the normalization one:

Traceback (most recent call last):
  File "/usr/local/bin/tensorflow-tts-normalize", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/normalize.py", line 115, in main
    np.save(os.path.join(args.outdir, subdir, "norm-feats", f"{utt_id}-norm-feats.npy"),
UnboundLocalError: local variable 'subdir' referenced before assignment

Am I doing something wrong?

converting fastspeech model to tflite

can we convert Fastspeech model to tflite ?

optimizer further by using fake-quantize aware and pruning

Thanks for your this great job.
I have trained Tacotron2 model as your repository.
Now I am trying to convert our model to a int8 quantization model with Tensorflow Lite.
But I encountered some errors，when I use
converter = tf.lite.TFLiteConverter.from_saved_model("./test_saved")
or
converter = tf.lite.TFLiteConverter.from_keras_model(tacotron2).
And do you have any advise about optimizer further by using fake-quantize aware and pruning?
Thank you very much.

Optimizer error at the start of training

Hi,
I notice this error at the start of training every time after loading data although the training starts and progresses as usual. I have a bit of hard time trying to get the model to converge and that's when i started to doubt if the optimizer is having any issue. Is this normal?

FastSpeech2 implementation

Hi, i just fast implemented FastSpeech2 to check the contribution of F0 embedding and Energy embedding. See PR #45 for detail.

Problems creating Tensorflow-based Dataloader

Hi, I encountered the following problems when using it, do you know how to solve it?
tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml
[Preprocessing]: 0% 0/10000 [00:00<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:03<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:03<?, ?it/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/usr/local/lib/python3.6/dist-packages/pathos/helpers/mp_helper.py", line 15, in
func = lambda args: f(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 165, in save_to_file
f"{utt_id} seems to have a different sampling rate."
AssertionError: 000001 seems to have a different sampling rate.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/bin/tensorflow-tts-preprocess", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 228, in main
p.map(save_to_file, range(len(processor.items)))
File "/usr/local/lib/python3.6/dist-packages/pathos/multiprocessing.py", line 137, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 644, in get
raise self._value
AssertionError: 000001 seems to have a different sampling rate.
[Preprocessing]: 0% 0/10000 [00:03<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:03<?, ?it/s]

decode_tacotron2.py Problem

Hi~
After training tacotron2 with my data, the following problems occur during decoding.

Does anyone know the answer to this problem?

CUDA_VISIBLE_DEVICES=3 python ./decode_tacotron2.py   --rootdir ./dump/valid/   --outdir ./prediction/tacotron2-75k/ 
--checkpoint ./examples/tacotron2/exp/train.tacotron2.v1/checkpoints/model-75000.h5 
--config ./examples/tacotron2/conf/tacotron2.v1.yaml   --batch-size 32

2020-06-30 13:31:07.511973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-30 13:31:07.512004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-06-30 13:31:07.512024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-06-30 13:31:07.516529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21397 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:08:00.0, compute capability: 6.1)
2020-06-30 13:31:28.713565: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-30 13:31:28.964408: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[Decoding]: 0it [00:00, ?it/s]2020-06-30 13:31:34.449436: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
[Decoding]: 0it [00:03, ?it/s]
Traceback (most recent call last):
  File "./decode_tacotron2.py", line 135, in <module>
    main()
  File "./decode_tacotron2.py", line 116, in main
    speaker_ids=tf.zeros(shape=[tf.shape(charactor)[0]]),
  File "/data/nlp/ihkim/miniconda2/envs/data_TensorflowTTS/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/data/nlp/ihkim/miniconda2/envs/data_TensorflowTTS/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 648, in _call
    *args, **kwds)
  File "/data/nlp/ihkim/miniconda2/envs/data_TensorflowTTS/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2238, in canonicalize_function_inputs
    self._flat_input_signature)
  File "/data/nlp/ihkim/miniconda2/envs/data_TensorflowTTS/lib/python3.6/site-
packages/tensorflow/python/eager/function.py", line 2305, in _convert_inputs_to_signature
    format_error_message(inputs, input_signature))
ValueError: Python inputs incompatible with input_signature:
  inputs: (
    tf.Tensor(
[[ 0 13 27 ...  0  0  0]
 [ 0 20 27 ...  0  0  0]
 [ 0  8 26 ...  0  0  0]
 ...
 [ 0  2 29 ...  0  0  0]
 [ 0 11 34 ...  0  0  0]
 [ 0  9 21 ...  0  0  0]], shape=(32, 134), dtype=int32),
    tf.Tensor(
[ 29  25  17  26  23  52  11  21  22  37 102  76  54  45 130  59  39  36
  20  47  31  43  45 134 101  53  80  73 112 114  16 114], shape=(32,), dtype=int32),
    tf.Tensor(
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0.], shape=(32,), dtype=float32))
  input_signature: (
    TensorSpec(shape=(None, None), dtype=tf.int32, name=None),
    TensorSpec(shape=(None,), dtype=tf.int32, name=None),
    TensorSpec(shape=(None,), dtype=tf.int32, name=None))

python3 examples_models/tacotron2/train_tacotron2.py --train-dir ./dump/train/ --dev-dir ./dump/valid/ --outdir ./examples_models/tacotron2/exp/train.tacotron2.v2_mixed_precision/ --config ./examples_models/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0 --resume "./examples_models/tacotron2/exp/train.tacotron2.v1/checkpoints/model-95000.h5"

But it failed as showing logs below.

2020-06-30 13:26:51.312649: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open ./examples_models/tacotron2/exp/train.tacotron2.v1/checkpoints/model-95000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line
95, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: Unable to open table file ./examples_models/tacotron2/exp/train.tacotron2.v1/checkpoints/model-95000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples_models/tacotron2/train_tacotron2.py", line 511, in <module>
    main()
  File "examples_models/tacotron2/train_tacotron2.py", line 504, in main
    resume=args.resume)
  File "examples_models/tacotron2/train_tacotron2.py", line 345, in fit
    self.load_checkpoint(resume)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 343, in load_checkpoint
    self.ckpt.restore(pretrained_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/util.py", line 2009, in restore
    status = self._saver.restore(save_path=save_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/util.py", line 1260, in restore
    reader = py_checkpoint_reader.NewCheckpointReader(save_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line
99, in NewCheckpointReader
    error_translator(e)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line
44, in error_translator
    raise errors_impl.DataLossError(None, None, error_message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file ./examples_models/tacotron2/exp/train.tacotron2.v1/checkpoints/model-95000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

Am I missing something or did I run in a wrong way?

Does it support multi-speaker?

Seems like the papers you implemented don't have multi-speaker support.

tensorspeech / tensorflowtts Goto Github PK

tensorflowtts's Issues

Recommend Projects

Recommend Topics

Recommend Org