tensorspeech / tensorflowtts Goto Github PK

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Home Page: https://tensorspeech.github.io/TensorFlowTTS/

License: Apache License 2.0

Python 100.00%

speech-synthesis text-to-speech tensorflow2 melgan fastspeech real-time tts vocoder multi-speaker-tts fastspeech2

tensorflowtts's Introduction

😋 TensorFlowTTS

Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

🤪 TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems.

What's new

2021/08/18 (NEW!) Integrated to Huggingface Spaces with Gradio. See Gradio Web Demo.
2021/08/12 (NEW!) Support French TTS (Tacotron2, Multiband MelGAN). Pls see the colab. Many Thanks Samuel Delalez
2021/06/01 Integrated with Huggingface Hub. See the PR. Thanks patrickvonplaten and osanseviero
2021/03/18 Support IOS for FastSpeech2 and MB MelGAN. Thanks kewlbear. See here
2021/01/18 Support TFLite C++ inference. Thanks luan78zaoha. See here
2020/12/02 Support German TTS with Thorsten dataset. See the Colab. Thanks thorstenMueller and monatis
2020/11/24 Add HiFi-GAN vocoder. See here
2020/11/19 Add Multi-GPU gradient accumulator. See here
2020/08/23 Add Parallel WaveGAN tensorflow implementation. See here
2020/08/20 Add C++ inference code. Thank @ZDisket. See here
2020/08/18 Update new base processor. Add AutoProcessor and pretrained processor json file
2020/08/14 Support Chinese TTS. Pls see the colab. Thank @azraelkuan
2020/08/05 Support Korean TTS. Pls see the colab. Thank @crux153
2020/07/17 Support MultiGPU for all Trainer
2020/07/05 Support Convert Tacotron-2, FastSpeech to Tflite. Pls see the colab. Thank @jaeyoo from the TFlite team for his support
2020/06/20 FastSpeech2 implementation with Tensorflow is supported.
2020/06/07 Multi-band MelGAN (MB MelGAN) implementation with Tensorflow is supported

Features

High performance on Speech Synthesis.
Be able to fine-tune on other languages.
Fast, Scalable, and Reliable.
Suitable for deployment.
Easy to implement a new model, based-on abstract class.
Mixed precision to speed-up training if possible.
Support Single/Multi GPU gradient Accumulate.
Support both Single/Multi GPU in base trainer class.
TFlite conversion for all supported models.
Android example.
Support many languages (currently, we support Chinese, Korean, English, French and German)
Support C++ inference.
Support Convert weight for some models from PyTorch to TensorFlow to accelerate speed.

Requirements

This repository is tested on Ubuntu 18.04 with:

Python 3.7+
Cuda 10.1
CuDNN 7.6.5
Tensorflow 2.2/2.3/2.4/2.5/2.6
Tensorflow Addons >= 0.10.0

Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. We recommend you install TensorFlow 2.6.0 to training in case you want to use MultiGPU.

Installation

With pip

$ pip install TensorFlowTTS

From source

Examples are included in the repository but are not shipped with the framework. Therefore, to run the latest version of examples, you need to install the source below.

$ git clone https://github.com/TensorSpeech/TensorFlowTTS.git
$ cd TensorFlowTTS
$ pip install .

If you want to upgrade the repository and its dependencies:

$ git pull
$ pip install --upgrade .

Supported Model architectures

TensorFlowTTS currently provides the following architectures:

MelGAN released with the paper MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis by Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville.
Tacotron-2 released with the paper Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions by Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu.
FastSpeech released with the paper FastSpeech: Fast, Robust, and Controllable Text to Speech by Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
Multi-band MelGAN released with the paper Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech by Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie.
FastSpeech2 released with the paper FastSpeech 2: Fast and High-Quality End-to-End Text to Speech by Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu.
Parallel WaveGAN released with the paper Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram by Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim.
HiFi-GAN released with the paper HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis by Jungil Kong, Jaehyeon Kim, Jaekyoung Bae.

We are also implementing some techniques to improve quality and convergence speed from the following papers:

Guided Attention Loss released with the paper Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention by Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara.

Audio Samples

Here in an audio samples on valid set. tacotron-2, fastspeech, melgan, melgan.stft, fastspeech2, multiband_melgan

Tutorial End-to-End

Prepare Dataset

Prepare a dataset in the following format:

|- [NAME_DATASET]/
|   |- metadata.csv
|   |- wavs/
|       |- file1.wav
|       |- ...

Where metadata.csv has the following format: id|transcription. This is a ljspeech-like format; you can ignore preprocessing steps if you have other format datasets.

Note that NAME_DATASET should be [ljspeech/kss/baker/libritts/synpaflex] for example.

Preprocessing

The preprocessing has two steps:

Preprocess audio features
- Convert characters to IDs
- Compute mel spectrograms
- Normalize mel spectrograms to [-1, 1] range
- Split the dataset into train and validation
- Compute the mean and standard deviation of multiple features from the training split
Standardize mel spectrogram based on computed statistics

To reproduce the steps above:

tensorflow-tts-preprocess --rootdir ./[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]
tensorflow-tts-normalize --rootdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --outdir ./dump_[ljspeech/kss/baker/libritts/thorsten/synpaflex] --config preprocess/[ljspeech/kss/baker/libritts/thorsten/synpaflex]_preprocess.yaml --dataset [ljspeech/kss/baker/libritts/thorsten/synpaflex]

Right now we only support ljspeech, kss, baker, libritts, thorsten and synpaflex for dataset argument. In the future, we intend to support more datasets.

Note: To run libritts preprocessing, please first read the instruction in examples/fastspeech2_libritts. We need to reformat it first before run preprocessing.

Note: To run synpaflex preprocessing, please first run the notebook notebooks/prepare_synpaflex.ipynb. We need to reformat it first before run preprocessing.

After preprocessing, the structure of the project folder should be:

|- [NAME_DATASET]/
|   |- metadata.csv
|   |- wav/
|       |- file1.wav
|       |- ...
|- dump_[ljspeech/kss/baker/libritts/thorsten]/
|   |- train/
|       |- ids/
|           |- LJ001-0001-ids.npy
|           |- ...
|       |- raw-feats/
|           |- LJ001-0001-raw-feats.npy
|           |- ...
|       |- raw-f0/
|           |- LJ001-0001-raw-f0.npy
|           |- ...
|       |- raw-energies/
|           |- LJ001-0001-raw-energy.npy
|           |- ...
|       |- norm-feats/
|           |- LJ001-0001-norm-feats.npy
|           |- ...
|       |- wavs/
|           |- LJ001-0001-wave.npy
|           |- ...
|   |- valid/
|       |- ids/
|           |- LJ001-0009-ids.npy
|           |- ...
|       |- raw-feats/
|           |- LJ001-0009-raw-feats.npy
|           |- ...
|       |- raw-f0/
|           |- LJ001-0001-raw-f0.npy
|           |- ...
|       |- raw-energies/
|           |- LJ001-0001-raw-energy.npy
|           |- ...
|       |- norm-feats/
|           |- LJ001-0009-norm-feats.npy
|           |- ...
|       |- wavs/
|           |- LJ001-0009-wave.npy
|           |- ...
|   |- stats.npy
|   |- stats_f0.npy
|   |- stats_energy.npy
|   |- train_utt_ids.npy
|   |- valid_utt_ids.npy
|- examples/
|   |- melgan/
|   |- fastspeech/
|   |- tacotron2/
|   ...

stats.npy contains the mean and std from the training split mel spectrograms
stats_energy.npy contains the mean and std of energy values from the training split
stats_f0.npy contains the mean and std of F0 values in the training split
train_utt_ids.npy / valid_utt_ids.npy contains training and validation utterances IDs respectively

We use suffix (ids, raw-feats, raw-energy, raw-f0, norm-feats, and wave) for each input type.

IMPORTANT NOTES:

This preprocessing step is based on ESPnet so you can combine all models here with other models from ESPnet repository.
Regardless of how your dataset is formatted, the final structure of the dump folder SHOULD follow the above structure to be able to use the training script, or you can modify it by yourself 😄.

Training models

To know how to train model from scratch or fine-tune with other datasets/languages, please see detail at example directory.

For Tacotron-2 tutorial, pls see examples/tacotron2
For FastSpeech tutorial, pls see examples/fastspeech
For FastSpeech2 tutorial, pls see examples/fastspeech2
For FastSpeech2 + MFA tutorial, pls see examples/fastspeech2_libritts
For MelGAN tutorial, pls see examples/melgan
For MelGAN + STFT Loss tutorial, pls see examples/melgan.stft
For Multiband-MelGAN tutorial, pls see examples/multiband_melgan
For Parallel WaveGAN tutorial, pls see examples/parallel_wavegan
For Multiband-MelGAN Generator + HiFi-GAN tutorial, pls see examples/multiband_melgan_hf
For HiFi-GAN tutorial, pls see examples/hifigan

Abstract Class Explaination

Abstract DataLoader Tensorflow-based dataset

A detail implementation of abstract dataset class from tensorflow_tts/dataset/abstract_dataset. There are some functions you need overide and understand:

get_args: This function return argumentation for generator class, normally is utt_ids.
generator: This function have an inputs from get_args function and return a inputs for models. Note that we return a dictionary for all generator functions with the keys that exactly match with the model's parameters because base_trainer will use model(**batch) to do forward step.
get_output_dtypes: This function need return dtypes for each element from generator function.
get_len_dataset: Return len of datasets, normaly is len(utt_ids).

IMPORTANT NOTES:

A pipeline of creating dataset should be: cache -> shuffle -> map_fn -> get_batch -> prefetch.
If you do shuffle before cache, the dataset won't shuffle when it re-iterate over datasets.
You should apply map_fn to make each element return from generator function have the same length before getting batch and feed it into a model.

Some examples to use this abstract_dataset are tacotron_dataset.py, fastspeech_dataset.py, melgan_dataset.py, fastspeech2_dataset.py

Abstract Trainer Class

A detail implementation of base_trainer from tensorflow_tts/trainer/base_trainer.py. It include Seq2SeqBasedTrainer and GanBasedTrainer inherit from BasedTrainer. All trainer support both single/multi GPU. There a some functions you MUST overide when implement new_trainer:

compile: This function aim to define a models, and losses.
generate_and_save_intermediate_result: This function will save intermediate result such as: plot alignment, save audio generated, plot mel-spectrogram ...
compute_per_example_losses: This function will compute per_example_loss for model, note that all element of the loss MUST has shape [batch_size].

All models on this repo are trained based-on GanBasedTrainer (see train_melgan.py, train_melgan_stft.py, train_multiband_melgan.py) and Seq2SeqBasedTrainer (see train_tacotron2.py, train_fastspeech.py).

End-to-End Examples

You can know how to inference each model at notebooks or see a colab (for English), colab (for Korean), colab (for Chinese), colab (for French), colab (for German). Here is an example code for end2end inference with fastspeech2 and multi-band melgan. We uploaded all our pretrained in HuggingFace Hub.

import numpy as np
import soundfile as sf
import yaml

import tensorflow as tf

from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

# initialize fastspeech2 model.
fastspeech2 = TFAutoModel.from_pretrained("tensorspeech/tts-fastspeech2-ljspeech-en")


# initialize mb_melgan model
mb_melgan = TFAutoModel.from_pretrained("tensorspeech/tts-mb_melgan-ljspeech-en")


# inference
processor = AutoProcessor.from_pretrained("tensorspeech/tts-fastspeech2-ljspeech-en")

input_ids = processor.text_to_sequence("Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning.")
# fastspeech inference

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
)

# melgan inference
audio_before = mb_melgan.inference(mel_before)[0, :, 0]
audio_after = mb_melgan.inference(mel_after)[0, :, 0]

# save to file
sf.write('./audio_before.wav', audio_before, 22050, "PCM_16")
sf.write('./audio_after.wav', audio_after, 22050, "PCM_16")

Contact

License

All models here are licensed under the Apache 2.0

Acknowledgement

We want to thank Tomoki Hayashi, who discussed with us much about Melgan, Multi-band melgan, Fastspeech, and Tacotron. This framework based-on his great open-source ParallelWaveGan project.

tensorflowtts's People

Contributors

Stargazers

Watchers

Forkers

carlfm01 xiaoyangnihao entn-at alokprasad ml2457 ashishpatel26 amirstudy yashugupta786 yyht myagues hccho2 bruinxiong avatarworld joyseyousa gzfffff 1511372 matpg superhg2012 gongchenghhu smilemcm hdlinh1808 mokkemeguru zouwei02 johnherry zhuxiaoxuhit hlp2819 suwoncjh hadryan meelement vsd-dev alapini maxcodextc dengliqun nianzu-ethan-zheng jerrypeng21cuhk srevinsaju calebfenton zumbalamambo c00renut jacklongking sdwivedi behmlein susube yaxinren kshitiz97 maniche04 folkevil kustomzone afrknchld 0x01001011 goswamig azraelkuan tchigher isabellarossi akospilisi deeplearning2012 bastey1024 jdc08161063 allensmile yi1024 jaeyoo renatoviolin rakamaru shkim1980 kokimishev hephaex unparalleled-ysj hs1003 rohitpandey13 zhoulinmin chaoyue729 stjordanis duyvuleo roysrijeet hobbit19 del18687058912 bikramjitroy zhyoung24 sujeendran zhaozeqing zdisket wangfn mapledxf fakegit segmentationfaults aragorntheking strongstella ryel erogol lbxcfx manmay-nakhashi learnerkun freenowill hotlize nguyenhothanhtam salikin crux153 abylouw elneel orikama

tensorflowtts's Issues

Saving entire model

Hello I tried to save the entire model for Tacotron-2, instead of the weights, as an h5 file. However, I am getting the following error

NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.

I used the following code to successfully load the weights:

tacotron2 = TFTacotron2(config=fs_config, name="tacotron2", training=False)
tacotron2._build()  
tacotron2.load_weights("tacotron2.h5")

Then I tried to call tacotron2.save("full_tacotron2.h5") and I got the afformentioned error. Should I modify the trainers/base_trainer.py as follows and re-train or is there another way to save the entire model as an h5 file?

def save_checkpoint(self):
      """Save checkpoint."""
      self.ckpt.steps.assign(self.steps)
      self.ckpt.epochs.assign(self.epochs)
      self.ckp_manager.save(checkpoint_number=self.steps)
      self.model.save_weights(self.saved_path + 'model-{}.h5'.format(self.steps))
      self.model.save_model(self.saved_path + 'model-total{}.h5'.format(self.steps))

fast speech mel normalization

when compute mean and scaler for mel-spectrogram before normization, mean and scaler are computed from all dataset and only the first frame mel?

mel = mel[0].numpy()

Docker image for this project.

Do you have any docker images for this project?

Thanks,
Daniel

converting fastspeech model to tflite

can we convert Fastspeech model to tflite ?

tensorflow-tts-preprocess: assert len(mel) == len(f0) == len(energy) AssertionError

My dataset which always worked properly gave me this error when running the preprocessing step.

2020-06-19 20:34:52.303453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[Preprocessing]:   0% 6/3431 [00:02<24:16,  2.35it/s]
[Preprocessing]:   0% 13/3431 [00:07<32:39,  1.74it/s]
[Preprocessing]:   0% 6/3431 [00:10<1:35:38,  1.68s/it]
[Preprocessing]:   2% 64/3431 [00:20<18:08,  3.09it/s]
[Preprocessing]:   1% 49/3431 [00:25<29:30,  1.91it/s]
[Preprocessing]:   2% 73/3431 [00:53<40:54,  1.37it/s]
[Preprocessing]:   3% 106/3431 [00:59<31:02,  1.78it/s]
[Preprocessing]:   0% 13/3431 [01:03<4:39:15,  4.90s/it]
[Preprocessing]:   2% 59/3431 [01:12<1:09:08,  1.23s/it]
[Preprocessing]:   6% 215/3431 [01:12<18:09,  2.95it/s]
[Preprocessing]:   6% 215/3431 [01:13<18:12,  2.94it/s]
[Preprocessing]:   0% 12/3431 [01:16<6:05:32,  6.41s/it]
[Preprocessing]:   1% 20/3431 [01:25<4:03:18,  4.28s/it]
[Preprocessing]:   2% 76/3431 [01:35<1:10:16,  1.26s/it]
[Preprocessing]:   5% 178/3431 [01:49<33:13,  1.63it/s]
[Preprocessing]:   6% 209/3431 [01:56<06:16,  8.55it/s]multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.6/dist-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 217, in save_to_file
    assert len(mel) == len(f0) == len(energy)
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/tensorflow-tts-preprocess", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 250, in main
[Preprocessing]:   6% 209/3431 [01:56<29:58,  1.79it/s]
    p.map(save_to_file, range(len(processor.items)))
  File "/usr/local/lib/python3.6/dist-packages/pathos/multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 644, in get
    raise self._value
AssertionError
[Preprocessing]:   0% 0/3431 [01:56<?, ?it/s]

When I run the normalization step it only does 1314 iterations instead of 3431 as it should. In addition, when trying to train I get this.

<ipython-input-12-8616bad8c9dc> in dotrain(inargs, ptpath, maxsteps)
    371         energy_stat=args.energy_stat,
    372         mel_length_threshold=mel_length_threshold,
--> 373         return_utt_id=False
    374     ).create(
    375         is_shuffle=config["is_shuffle"],

/content/TensorflowTTS/ttsexamples/fastspeech2/fastspeech2_dataset.py in __init__(self, root_dir, charactor_query, mel_query, duration_query, f0_query, energy_query, f0_stat, energy_stat, max_f0_embeddings, max_energy_embeddings, charactor_load_fn, mel_load_fn, duration_load_fn, f0_load_fn, energy_load_fn, mel_length_threshold, return_utt_id)
    101             mel_files = [mel_files[idx] for idx in idxs]
    102             charactor_files = [charactor_files[idx] for idx in idxs]
--> 103             duration_files = [duration_files[idx] for idx in idxs]
    104             mel_lengths = [mel_lengths[idx] for idx in idxs]
    105             f0_files = [f0_files[idx] for idx in idxs]

/content/TensorflowTTS/ttsexamples/fastspeech2/fastspeech2_dataset.py in <listcomp>(.0)
    101             mel_files = [mel_files[idx] for idx in idxs]
    102             charactor_files = [charactor_files[idx] for idx in idxs]
--> 103             duration_files = [duration_files[idx] for idx in idxs]
    104             mel_lengths = [mel_lengths[idx] for idx in idxs]
    105             f0_files = [f0_files[idx] for idx in idxs]

IndexError: list index out of range

I have no idea why this happens. My dataset is formatted exactly the same (like LJSpeech) as it was back when it was working.

Error when loading pretrained model ljspeech to training another language

When i tried to load pretrained model(model_65000.h5) like the tutorials in the examples/tacotron2/README.md to train on the other language, i got the error like this:

tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable encoder/embeddings/LayerNorm/beta_164 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/beta_164/N10tensorflow3VarE does not exist.

And when i used another pretrained(model_40000.h5), i got another error:

(0) Failed precondition: Error while reading resource variable encoder/embeddings/LayerNorm/gamma_157 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/gamma_157/N10tensorflow3VarE does not exist.
[[node encoder/embeddings/LayerNorm/batchnorm/mul/ReadVariableOp (defined at /data/linhld6/Project/TensorflowTTS-master/tensorflow_tts/models/tacotron2.py:163) ]]
[[decoder/while/LoopCond/_71/_86]]
(1) Failed precondition: Error while reading resource variable encoder/embeddings/LayerNorm/gamma_157 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/gamma_157/N10tensorflow3VarE does not exist.

Can anyone help me to fix this error?! Thanks!

SqueezeWave Implementation

Hi,
Will it be possible to add a TF2 implementation of SqueezeWave vocoder to this system? The performance is really fast and promising. I'm working on the same. But I'm not well versed with TF2 yet. I had quite a struggle trying to train the PyTorch implementation from the authors with my custom dataset even though it had almost the same characteristics as LJSpeech but double the size of dataset. I believe TF2 is more suitable for post training optimization and deployment.
Original Repo: https://github.com/tianrengao/SqueezeWave

FastSpeech 2

Hello, thank you for this project. I'm aware of two different implementations of FastSpeech, any plan to support the recent FastSpeech 2 architecture?

Thank you very much.

Error when loading data for train tacotron2 model

when i load my own dataset to training tacotron2 model, i got this error:

tensorflow.python.framework.errors_impl.DataLossError: Attempted to pad to a smaller size than the input element. [Op:IteratorGetNext]

Can anyone help me to fixed this error.

Running path problem

[root@localhost TensorflowTTS]# CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py --train-dir ./dump/train/ --dev-dir ./dump/valid/ --outdir ./examples/tacotron2/exp/train.tacotron2.v1/ --config ./examples/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mix_precision 0 --resume ""
Traceback (most recent call last):
File "examples/tacotron2/train_tacotron2.py", line 31, in
from examples.tacotron2.tacotron_dataset import CharactorMelDataset
ModuleNotFoundError: No module named 'examples'

Why do I get this error when running in the root directory ?

config length of the melspectrogram input

When i training tacotron2 model with my own data, some of mel-spetrogram has timestep > 2000 so I always get this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [16,2000,80] vs. [16,2023,80]

I dkn that where to set the timestep to higher value to adapt my data.
Thanks

optimizer further by using fake-quantize aware and pruning

Thanks for your this great job.
I have trained Tacotron2 model as your repository.
Now I am trying to convert our model to a int8 quantization model with Tensorflow Lite.
But I encountered some errors，when I use
converter = tf.lite.TFLiteConverter.from_saved_model("./test_saved")
or
converter = tf.lite.TFLiteConverter.from_keras_model(tacotron2).
And do you have any advise about optimizer further by using fake-quantize aware and pruning?
Thank you very much.

Enabling mixed_precision for training

Hi, I would like to turn on mixed_precision setting for training so set "1" for the command below, but mixed_precision is still False after that. Do I miss something?

2020-06-25 06:58:32,264 (train_tacotron2:413) INFO: mixed_precision = False

CUDA_VISIBLE_DEVICES=0 python3 examples_models/tacotron2/train_tacotron2.py \
  --train-dir ./dump_lisa/train/ \
  --dev-dir ./dump_lisa/valid/ \
  --outdir ./examples_models/tacotron2/exp_lisa/train.tacotron2.v2_mixed_precision/ \
  --config ./examples_models/tacotron2/conf/tacotron2.v1.yaml \
  --use-norm 1
  --mixed_precision 1 \
  --resume ""

How to continue running training from the checkpoint

Hi, My run is unexpectedly stopped so I need to continue my run.

Below is my command.

python3 examples_models/tacotron2/train_tacotron2.py --train-dir ./dump/train/ --dev-dir ./dump/valid/ --outdir ./examples_models/tacotron2/exp/train.tacotron2.v2_mixed_precision/ --config ./examples_models/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0 --resume "./examples_models/tacotron2/exp/train.tacotron2.v1/checkpoints/model-95000.h5"

But it failed as showing logs below.

2020-06-30 13:26:51.312649: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open ./examples_models/tacotron2/exp/train.tacotron2.v1/checkpoints/model-95000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line
95, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: Unable to open table file ./examples_models/tacotron2/exp/train.tacotron2.v1/checkpoints/model-95000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples_models/tacotron2/train_tacotron2.py", line 511, in <module>
    main()
  File "examples_models/tacotron2/train_tacotron2.py", line 504, in main
    resume=args.resume)
  File "examples_models/tacotron2/train_tacotron2.py", line 345, in fit
    self.load_checkpoint(resume)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 343, in load_checkpoint
    self.ckpt.restore(pretrained_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/util.py", line 2009, in restore
    status = self._saver.restore(save_path=save_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/util.py", line 1260, in restore
    reader = py_checkpoint_reader.NewCheckpointReader(save_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line
99, in NewCheckpointReader
    error_translator(e)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line
44, in error_translator
    raise errors_impl.DataLossError(None, None, error_message)
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file ./examples_models/tacotron2/exp/train.tacotron2.v1/checkpoints/model-95000.h5: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

Am I missing something or did I run in a wrong way?

fastspeech inference duration_outputs

when run fastspeech in inference mode, why use exp op after duration_output from duration predictor?

duration_outputs = tf.math.exp(duration_outputs) - 1.0

Optimizer error at the start of training

Hi,
I notice this error at the start of training every time after loading data although the training starts and progresses as usual. I have a bit of hard time trying to get the model to converge and that's when i started to doubt if the optimizer is having any issue. Is this normal?

Tacotron2 teacher forcing

@dathudeptrai Training Sampler used teacher forching(ground_truth mel-spectrogram frames) during all training period. Did you met Explosure bias problem? audio quality always good in inference time?

pretrain model of melgan

Hi, I tried to use run melgan on your pretrained model, but I found there is no ckpt data, only discriminator and generator's h5 files.
I read the train_melgan.py, found it needs --resume parameter, but I think it needs ckpt file to get steps and epochs.
Could you upload the ckpt files along with the pre-trained model or is there other way to do it ?

Thank you for your awesome repo. It is really well constructed and I can't wait to see its performance.

Problems creating Tensorflow-based Dataloader

Hi, I encountered the following problems when using it, do you know how to solve it?
tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml
[Preprocessing]: 0% 0/10000 [00:00<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:01<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:02<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:03<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:03<?, ?it/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/usr/local/lib/python3.6/dist-packages/pathos/helpers/mp_helper.py", line 15, in
func = lambda args: f(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 165, in save_to_file
f"{utt_id} seems to have a different sampling rate."
AssertionError: 000001 seems to have a different sampling rate.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/bin/tensorflow-tts-preprocess", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 228, in main
p.map(save_to_file, range(len(processor.items)))
File "/usr/local/lib/python3.6/dist-packages/pathos/multiprocessing.py", line 137, in map
return _pool.map(star(f), zip(*args)) # chunksize
File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 644, in get
raise self._value
AssertionError: 000001 seems to have a different sampling rate.
[Preprocessing]: 0% 0/10000 [00:03<?, ?it/s]
[Preprocessing]: 0% 0/10000 [00:03<?, ?it/s]

how to generate -durations.npy

there is no documentation about generating -durations.npy

Double Decoder Consistency for Tacotron 2

Hi, I found this interesting post while regularly checking up on erogol's blog. Any chances of having it here? Having a Tacotron 2 that is practically immune to alignment problems sounds very state of the art.
For reference, Mozilla/TTS has it.

decode_tacotron2.py Problem

Hi~
After training tacotron2 with my data, the following problems occur during decoding.

Does anyone know the answer to this problem?

CUDA_VISIBLE_DEVICES=3 python ./decode_tacotron2.py   --rootdir ./dump/valid/   --outdir ./prediction/tacotron2-75k/ 
--checkpoint ./examples/tacotron2/exp/train.tacotron2.v1/checkpoints/model-75000.h5 
--config ./examples/tacotron2/conf/tacotron2.v1.yaml   --batch-size 32

2020-06-30 13:31:07.511973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-30 13:31:07.512004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-06-30 13:31:07.512024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-06-30 13:31:07.516529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21397 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:08:00.0, compute capability: 6.1)
2020-06-30 13:31:28.713565: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-30 13:31:28.964408: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[Decoding]: 0it [00:00, ?it/s]2020-06-30 13:31:34.449436: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
[Decoding]: 0it [00:03, ?it/s]
Traceback (most recent call last):
  File "./decode_tacotron2.py", line 135, in <module>
    main()
  File "./decode_tacotron2.py", line 116, in main
    speaker_ids=tf.zeros(shape=[tf.shape(charactor)[0]]),
  File "/data/nlp/ihkim/miniconda2/envs/data_TensorflowTTS/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/data/nlp/ihkim/miniconda2/envs/data_TensorflowTTS/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 648, in _call
    *args, **kwds)
  File "/data/nlp/ihkim/miniconda2/envs/data_TensorflowTTS/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2238, in canonicalize_function_inputs
    self._flat_input_signature)
  File "/data/nlp/ihkim/miniconda2/envs/data_TensorflowTTS/lib/python3.6/site-
packages/tensorflow/python/eager/function.py", line 2305, in _convert_inputs_to_signature
    format_error_message(inputs, input_signature))
ValueError: Python inputs incompatible with input_signature:
  inputs: (
    tf.Tensor(
[[ 0 13 27 ...  0  0  0]
 [ 0 20 27 ...  0  0  0]
 [ 0  8 26 ...  0  0  0]
 ...
 [ 0  2 29 ...  0  0  0]
 [ 0 11 34 ...  0  0  0]
 [ 0  9 21 ...  0  0  0]], shape=(32, 134), dtype=int32),
    tf.Tensor(
[ 29  25  17  26  23  52  11  21  22  37 102  76  54  45 130  59  39  36
  20  47  31  43  45 134 101  53  80  73 112 114  16 114], shape=(32,), dtype=int32),
    tf.Tensor(
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0.], shape=(32,), dtype=float32))
  input_signature: (
    TensorSpec(shape=(None, None), dtype=tf.int32, name=None),
    TensorSpec(shape=(None,), dtype=tf.int32, name=None),
    TensorSpec(shape=(None,), dtype=tf.int32, name=None))

melgan missing weightnorm.

Maybe we can add using
https://www.tensorflow.org/addons/tutorials/layers_weightnormalization

Deployment of frozen models with tf-addons

Hi,
For freezing the graph and deploying on target device, is it necessary to have Tensorflow-Addons installed on the device? Or will it be possible for just tensorflow 2.x package to perform inference from the frozen graph. Sorry but I have never used Tensorflow-Addons before and dont have much knowledge about it.

TypeError: Input 'filter' of 'Conv2D' Op has type float32 that does not match type int32 of argument 'input'.

I am training on CPU machine to make sure everything is okay before moving to GPU machine.
I get to 500 steps (first evaluation) and get the following type error in the attached
log.txt

I figure out which feature is the issue?

Does it support multi-speaker?

Seems like the papers you implemented don't have multi-speaker support.

Data enhancement leads to increased training time

The first thing to note is that tensorflow originally had a very useful library for data augmentation.
from tensorflow.keras.preprocessing.image import ImageDataGenerator

However, in the 2.0 official version, this library and the corresponding training method model.fit_generator() had problems, and the training time increased by 3-4 times. The answer given by the official staff of tensorflow is that this is indeed a bug（issue #33177）

And they decided to abandon this method instead of repairing it. The solution given is to directly use model.fit() to receive the data generated by ImageDataGenerator, but this creates an additional problem, the program prompts ‘Filling up shuffle buffer (this may take a while)’ before each epoch.

This is also a considerable extra time overhead. For me, my machine needs 10s, and 10s per epoch is unacceptable. Is there any better way to deal with it?

Difference btw decode_tacotron2.py vs tacotron2_inference.ipynb

The procedure of running inference of tacotron2 is not clear. It appears that running only "tacotron2_inference.ipynb" looks enough to me. What is "decode_tacotron2.py" for? Even running this code gives me errors as below.

2020-07-01 01:59:24.756772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow d
evice (/job:localhost/replica:0/task:0/device:GPU:0 with 299 MB memory) -> physical GPU (device: 0, name:
 GeForce GTX 1080 Ti, pci bus id: 0000:08:00.0, compute capability: 6.1)
2020-07-01 01:59:43.496259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully 
opened dynamic library libcublas.so.10                                 
2020-07-01 01:59:43.723366: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully 
opened dynamic library libcudnn.so.7
2020-07-01 01:59:44.157435: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn han
dle: CUDNN_STATUS_INTERNAL_ERROR                                  
2020-07-01 01:59:44.167314: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn han
dle: CUDNN_STATUS_INTERNAL_ERROR                                       
2020-07-01 01:59:44.172894: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn han
dle: CUDNN_STATUS_INTERNAL_ERROR                    
2020-07-01 01:59:44.175197: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn han
dle: CUDNN_STATUS_INTERNAL_ERROR   
Traceback (most recent call last):                                                             [105/1355]
  File "examples_models/tacotron2/decode_tacotron2.py", line 136, in <module>                            
    main()                                                                                               
  File "examples_models/tacotron2/decode_tacotron2.py", line 103, in main                                
    tacotron2._build()  # build model to be able load_weights.                                           
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/models/tacotron2.py", line 677, in _build  
    self(input_ids, input_lengths, speaker_ids, mel_outputs, mel_lengths, 10, training=True)             
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 968, $
n __call__                                                                                               
    outputs = self.call(cast_inputs, *args, **kwargs)                                                    
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 580, in __$
all__                                                                                                    
    result = self._call(*args, **kwds)                                                                   
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 644, in _c$
ll                                                                                                       
    return self._stateless_fn(*args, **kwds)                                                             
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2420, in __cal$
__                           
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1665, in _filt$
red_call                     
    self.captured_inputs)    
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call$
flat                         
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)                
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_exe
cute                        
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,
 so try looking to see if a warning log message was printed above.
         [[node encoder/conv_batch_norm/tf_tacotron_conv_batch_norm/conv_._0/conv1d (defined at /usr/loca
l/lib/python3.6/dist-packages/tensorflow_tts/models/tacotron2.py:86) ]]
         [[decoder/while/body/_1/decoder_cell/assert_positive/assert_less/Assert/AssertGuard/pivot_f/_265
/_47]]                      
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,
 so try looking to see if a warning log message was printed above.
         [[node encoder/conv_batch_norm/tf_tacotron_conv_batch_norm/conv_._0/conv1d (defined at /usr/loca
l/lib/python3.6/dist-packages/tensorflow_tts/models/tacotron2.py:86) ]]
0 successful operations.     
0 derived errors ignored. [Op:__inference_call_8385]
                                
Function call stack:               
call -> call

How to implement Tacotron2 model without using tensorflow_addons library ?

Api in tensorflow_addons seems a little complicated, can somene implement tacotron2 model, especically the Attention module and Decode model, just like the way in Pytorch?

Fine-tuning Multi-Band MelGAN yields pure noise as soon as discriminator starts training

As the title says, I started fine-tuning the published mb-melgan v1 model and set the discriminator to start at 10k steps. As soon as it started, all the sample audios became pure noise.
I can confirm it's not the dataset since I could train MelGAN-STFT. If it helps, I'm using mixed precision for the generator but not the discriminator. I also had this problem with kan-bayashi's mb-melgan v2.

pqmf synthesis filter

Hi, why synthesis filter still use h_analysis here?

  # [subbands, 1, taps + 1] == [filter_width, in_channels, out_channels]
        analysis_filter = np.expand_dims(h_analysis, 1)
        analysis_filter = np.transpose(analysis_filter, (2, 1, 0))
        synthesis_filter = np.expand_dims(h_analysis, 0)
        synthesis_filter = np.transpose(synthesis_filter, (2, 1, 0))

melgan residual stack config

Hi, In original melgan paper, the kernel size of the second conv1d layer is 3, I see that in this melgan residual config it's 1. Is it a better config?

griffin-lim implementation for quick sanity check of tacotron output

It'd be helpful if there is some griffin-lim code to check if tacotron training is OK before training vocoders like MelGAN.

Error following Tacotron2 tutorial

First, great project. Thanks a ton for maintaining it. I'm learning a lot going through the code.

I tried following the Tacotron2 tutorial -- downloaded the dataset, ran the preprocessing steps, and tried training. I ran into this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  [_Derived_]  Trying to access element 156 in a list with 156 elements.
         [[{{node while_21/body/_1/TensorArrayV2Read_1/TensorListGetItem}}]]
         [[tacotron2/StatefulPartitionedCall/encoder/bilstm/forward_lstm/StatefulPartitionedCall]] [Op:__inference__one_step_tacotron2_406287]

Function call stack:
_one_step_tacotron2 -> _one_step_tacotron2 -> _one_step_tacotron2

Is this an error you've seen before? Unless I've made a dumb mistake, I imagine anyone might experience this error since I'm following the tutorial. Any pointers on how I can debug this? I don't mind trying to solve it myself but I'm fairly new to Tensorflow2.

The full log is below:

$ CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py \
>   --train-dir /media/usb0/tts/ljspeech_dump/train/ \
>   --dev-dir /media/usb0/tts/ljspeech_dump/valid/ \
>   --outdir ./examples/tacotron2/exp/train.tacotron2.v1/ \
>   --config ./examples/tacotron2/conf/tacotron2.v1.yaml \
>   --use-norm 1 \
>   --mixed_precision 0 \
>   --resume ""

2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: hop_size = 256
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: format = npy
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: tacotron2_params = {'embedding_hidden_size': 512, 'initializer_range': 0.02, 'embedding_dropout_prob': 0.1, 'n_speakers': 1, 'n_conv_encoder': 5, 'encoder_conv_filters': 512, 'encoder_conv_kernel_sizes': 5, 'encoder_conv_activation': 'relu', 'encoder_conv_dropout_rate': 0.5, 'encoder_lstm_units': 256, 'n_prenet_layers': 2, 'prenet_units': 256, 'prenet_activation': 'relu', 'prenet_dropout_rate': 0.5, 'n_lstm_decoder': 1, 'reduction_factor': 1, 'decoder_lstm_units': 1024, 'attention_dim': 128, 'attention_filters': 32, 'attention_kernel': 31, 'n_mels': 80, 'n_conv_postnet': 5, 'postnet_conv_filters': 512, 'postnet_conv_kernel_sizes': 5, 'postnet_dropout_rate': 0.1, 'attention_type': 'lsa'}
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: batch_size = 32
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: remove_short_samples = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: allow_cache = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: mel_length_threshold = 32
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: is_shuffle = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: use_fixed_shapes = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 1e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: train_max_steps = 200000
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: save_interval_steps = 5000
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: eval_interval_steps = 500
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: log_interval_steps = 100
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: start_schedule_teacher_forcing = 200001
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: start_ratio_value = 0.5
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: schedule_decay_steps = 50000
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: end_ratio_value = 0.0
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: num_save_intermediate_results = 1
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: train_dir = /media/usb0/tts/ljspeech_dump/train/
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: dev_dir = /media/usb0/tts/ljspeech_dump/valid/
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: use_norm = True
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: outdir = ./examples/tacotron2/exp/train.tacotron2.v1/
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: config = ./examples/tacotron2/conf/tacotron2.v1.yaml
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: resume =
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: verbose = 1
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: mixed_precision = False
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: version = 0.6.1
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: max_mel_length = 871
2020-06-27 17:02:49,796 (train_tacotron2:440) INFO: max_char_length = 188
2020-06-27 17:02:49.799192: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-27 17:02:49.827610: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 17:02:49.827904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 46 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.23GiB/s
2020-06-27 17:02:49.828043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-27 17:02:49.828913: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-27 17:02:49.829804: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-27 17:02:49.829957: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-27 17:02:49.830830: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-27 17:02:49.831252: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-27 17:02:49.831346: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-06-27 17:02:49.831352: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-06-27 17:02:49.831534: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-27 17:02:49.835216: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 4200000000 Hz
2020-06-27 17:02:49.835388: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f5318000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-27 17:02:49.835400: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-27 17:02:49.836162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-27 17:02:49.836171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]
Model: "tacotron2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
encoder (TFTacotronEncoder)  multiple                  8218624
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple                  18246402
_________________________________________________________________
post_net (TFTacotronPostnet) multiple                  5460480
_________________________________________________________________
residual_projection (Dense)  multiple                  41040
=================================================================
Total params: 31,966,546
Trainable params: 31,956,306
Non-trainable params: 10,240
_________________________________________________________________
[train]:   0%|                                                                             | 0/200000 [00:00<?, ?it/s]2020-06-27 17:03:08.327047: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 4125 of 12445
2020-06-27 17:03:18.326677: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 8227 of 12445
2020-06-27 17:03:28.325737: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 12285 of 12445
2020-06-27 17:03:28.724872: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
Traceback (most recent call last):
  File "examples/tacotron2/train_tacotron2.py", line 513, in <module>
    main()
  File "examples/tacotron2/train_tacotron2.py", line 503, in main
    trainer.fit(train_dataset,
  File "examples/tacotron2/train_tacotron2.py", line 343, in fit
    self.run()
  File "/home/caleb/repos/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 72, in run
    self._train_epoch()
  File "/home/caleb/repos/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 94, in _train_epoch
    self._train_step(batch)
  File "examples/tacotron2/train_tacotron2.py", line 116, in _train_step
    self._one_step_tacotron2(charactor, char_length, mel, mel_length, guided_attention)
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1661, in _filtered_call
    return self._call_flat(
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1745, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 593, in call
    outputs = execute.execute(
  File "/home/caleb/repos/TensorflowTTS/venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  [_Derived_]  Trying to access element 156 in a list with 156 elements.
         [[{{node while_21/body/_1/TensorArrayV2Read_1/TensorListGetItem}}]]
         [[tacotron2/StatefulPartitionedCall/encoder/bilstm/forward_lstm/StatefulPartitionedCall]] [Op:__inference__one_step_tacotron2_406287]

Function call stack:
_one_step_tacotron2 -> _one_step_tacotron2 -> _one_step_tacotron2

[train]:   0%|                                                                             | 0/200000 [00:36<?, ?it/s]

Help me learn Melgan.

Melgan learning is not possible.
It says there is no voice file.
Help me.

Will it support Chinese?

Will it support Chinese training and synthisis?

Is this mb-melgan possible to use taco2's or fastspeech's mel output directly? and how about the RTF on cpu inference?

FastSpeech2 implementation

Hi, i just fast implemented FastSpeech2 to check the contribution of F0 embedding and Energy embedding. See PR #45 for detail.

Does this Tacotron2 support phonetic training?

Training with phonemes instead of raw text could improve the performance of the models. If it does, will there be a phonetic pretrained model like ESPNet's?

Can't train MelGAN-STFT discriminator

I have been training MelGAN-STFT by finetuning it on the LJSpeech model. When it gets to discriminator_train_start_steps, it stops and tells me to restart. When I restart with the discriminator on less than the steps of the latest checkpoint (210k vs 220k), I get this:

ValueError: in user code:

    <ipython-input-13-8bd0e2e9cdea>:110 _one_step_generator  *
        p_hat = self.discriminator(y_hat)
    /content/TensorflowTTS/tensorflow_tts/models/melgan.py:441 call  *
        outs += [f(x)]
    /content/TensorflowTTS/tensorflow_tts/models/melgan.py:386 call  *
        x = f(x)
    /content/TensorflowTTS/tensorflow_tts/utils/group_conv.py:283 call  *
        self._convolution_op = nn_ops.Convolution(
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py:1063 __init__  **
        filter_shape[num_spatial_dims]))

    ValueError: number of input channels does not match corresponding dimension of filter, 16 != 4

Also, all my predictions have heavy metallic noise. I'm assuming this is due to lack of discriminator training.

MelNet & other dataset than ljspeech

Hello,
thank you for this exellent and very intuitive implementation!

I am not familiar with TTS research, so my questions can probably be quite naive. :)

Do you also plan to implement MelNet? The audio results provided in the paper overview are quite impressive.
Is there any chance that in the long run you will train models with different dataset, other than ljspeech? This implementation is great, but the commercial applications (Google, Microsoft) have models trained with much better datasets.

P.S: As I said I am not familiar with TTS research, but there is plenty of exellent readers on librivox - all of them being in public domain. I have done plenty of audio-text matching tasks with aeneas library in the past. Would it be enough to emulate the structure of ljspeech dataset? With the quantity and lenght of samples in librivox I could probably create a dataset with 30-50 single-speaker hours of samples...

Once again thanks for your great work!

Extract duration from tacotron2 model

I don't send any issue to this comment. Following to the tutorial in training model fastspeech2, we have to extract the duration from alignment of tacotron2 model( on function get_duration_from_alignment on file extract_duration.py).
I just want to know what exactly of this term "duration". Anyone help me to figure out this definition?!

Fine-tuning does not work

Hi,
I am trying to fine-tune the LJSpeech pretrained tacotron 2 model to fit custom voice dataset in English. I made the changes as mentioned for rebuilding with a new Embedding layer. But it fails to build it the second time. Since my dataset is also english but with different voice, I used the same vocab_size.
This is the change made for the finetuning code with respect to the training code.
`pretrained_config = Tacotron2Config(**config["tacotron2_params"])

tacotron2 = TFTacotron2(pretrained_config, training=True, name='tacotron2')

tacotron2._build()

tacotron2.summary()

tacotron2.load_weights(path)

pretrained_config.vocab_size = len(symbols)

new_embedding_layers = TFTacotronEmbeddings(pretrained_config, name='embeddings')

tacotron2.encoder.embeddings = new_embedding_layers

# re-build model

tacotron2._build() #BREAKS HERE

tacotron2.summary()`

Error:
`Total params: 31,966,546
Trainable params: 31,956,306
Non-trainable params: 10,240

2020-06-22 10:26:33.976375: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at resource_variable_ops.cc:637 : Not found: Resource localhost/encoder/embeddings/character_embeddings/weight_147/N10tensorflow3VarE does not exist.
Traceback (most recent call last):
File "examples/tacotron2/finetune_tacotron.py", line 518, in
main()
File "examples/tacotron2/finetune_tacotron.py", line 471, in main
tacotron2._build()
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow_tts/models/tacotron2.py", line 677, in _build
self(input_ids, input_lengths, speaker_ids, mel_outputs, mel_lengths, 10, training=True)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 968, in call
outputs = self.call(cast_inputs, *args, **kwargs)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in call
result = self._call(*args, **kwds)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 611, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2420, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
self.captured_inputs)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 598, in call
ctx=ctx)
File "/anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
(0) Failed precondition: Error while reading resource variable encoder/embeddings/LayerNorm/beta_164 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/beta_164/N10tensorflow3VarE does not exist.
[[node encoder/embeddings/LayerNorm/batchnorm/ReadVariableOp (defined at /anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow_tts/models/tacotron2.py:153) ]]
[[decoder/while/body/_1/decoder_cell/assert_positive/assert_less/Assert/AssertGuard/pivot_f/_289/_53]]
(1) Failed precondition: Error while reading resource variable encoder/embeddings/LayerNorm/beta_164 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/encoder/embeddings/LayerNorm/beta_164/N10tensorflow3VarE does not exist.
[[node encoder/embeddings/LayerNorm/batchnorm/ReadVariableOp (defined at /anaconda/envs/py37_tensorflow/lib/python3.7/site-packages/tensorflow_tts/models/tacotron2.py:153) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_call_8492]

Function call stack:
call -> call`

Since the end goal for me is to just change the voice, i initially tried to train the model further from the pretrained checkpoint with my dataset since this worked for me with PyTorch Tacotron2 implementation from Nvidia. But somehow the model's voice doesn't change here while the quality is getting better. That is why i decided to redo the embedding layer fine-tune.
Should I train it from scratch?

I want to create organization repo and need enroll more member :D

after an open source time, I see a lot of things to do sụch as support more model like flow/glow, GAN, tensorrt, ... but 1 or 2 people may not be enough to do that. I am thinking about creating a organization repo and enroll more members with us to develop. In the future, I want to change the repo name to TensorSpeech and support speech-related problems like voice conversion, speech recognition, ... Does anyone want to join?: D.

inference time is more then pytorch version fastspeech , is there any additional layers added?

Error

I ran into an error when I run

tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml

The error is

Traceback (most recent call last):
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/multiprocess/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/tensorflow_tts/bin/preprocess.py", line 215, in save_to_file
    np.save(os.path.join(args.outdir, subdir, "wavs", f"{utt_id}-wave.npy"),
  File "<__array_function__ internals>", line 5, in save
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/numpy/lib/npyio.py", line 541, in save
    fid = open(file, "wb")
FileNotFoundError: [Errno 2] No such file or directory: './dump/train/wavs/LJ001-0001-wave.npy'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/trunk/miniconda3/envs/tts_env/bin/tensorflow-tts-preprocess", line 8, in <module>
    sys.exit(main())
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/tensorflow_tts/bin/preprocess.py", line 228, in main
    p.map(save_to_file, range(len(processor.items)))
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/pathos/multiprocessing.py", line 137, in map
[Preprocessing]:   0%|                                | 0/13100 [00:04<?, ?it/s]
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/multiprocess/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/trunk/miniconda3/envs/tts_env/lib/python3.8/site-packages/multiprocess/pool.py", line 768, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: './dump/train/wavs/LJ001-0001-wave.npy'
^[[A^[[A^[[B

I am running Manjaro Linux 20.3 XFCE.
My folder structure is

datasets
│   ├── LJSpeech-1.1--
│   ├── metadata.csv
│   ├── README
│   └── wavs
├── dump
│   ├── train_utt_ids.npy
│   └── valid_utt_ids.npy
├── examples
│   ├── fastspeech
│   ├── melgan
│   ├── melgan.stft
│   ├── multiband_melgan
│   └── tacotron2
├── google9b8578adaee731be.html
├── LICENSE
├── notebooks
│   └── tacotron2_inference.ipynb
├── preprocess
│   └── ljspeech_preprocess.yaml
├── README.md
├── setup.cfg
├── setup.py
├── tensorflow_tts
│   ├── bin
│   ├── configs
│   ├── datasets
│   ├── init.py
│   ├── losses
│   ├── models
│   ├── optimizers
│   ├── processor
│   ├── trainers
│   └── utils
├── test
│   ├── test_fastspeech.py
│   ├── test_mb_melgan.py
│   ├── test_melgan_layers.py
│   ├── test_melgan.py
│   └── test_tacotron2.py
└── tts
├── bin
├── include
├── lib
├── lib64 -> lib
└── pyvenv.cfg

An error was encountered during data preprocessing

[root@localhost TensorflowTTS]# tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --stats ./dump/stats.npy --config preprocess/ljspeech_preprocess.yaml
2020-06-10 11:07:41.220189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-10 11:07:41.472070: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-06-10 11:07:41.472150: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: localhost.localdomain
2020-06-10 11:07:41.472174: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: localhost.localdomain
2020-06-10 11:07:41.472313: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 418.39.0
2020-06-10 11:07:41.472366: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 418.39.0
2020-06-10 11:07:41.472382: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 418.39.0
2020-06-10 11:07:41.473387: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-06-10 11:07:41.489617: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2100000000 Hz
2020-06-10 11:07:41.492554: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abc9916ef0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-10 11:07:41.492595: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/ysj/bin/tensorflow-tts-normalize", line 8, in
sys.exit(main())
File "/usr/local/anaconda3/envs/ysj/lib/python3.7/site-packages/tensorflow_tts/bin/normalize.py", line 107, in main
mel = scaler.transform(mel)
File "/usr/local/anaconda3/envs/ysj/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 794, in transform
force_all_finite='allow-nan')
File "/usr/local/anaconda3/envs/ysj/lib/python3.7/site-packages/sklearn/base.py", line 436, in _validate_data
self._check_n_features(X, reset=reset)
File "/usr/local/anaconda3/envs/ysj/lib/python3.7/site-packages/sklearn/base.py", line 373, in check_n_features
"The reset parameter is False but there is no "
RuntimeError: The reset parameter is False but there is no n_features_in attribute. Is this estimator fitted?

Tacotron2 end2end sample

Could you provide a end2end sample showing how to use the Tacotron2 pretrained model? I tried to reach the source code but couldn't figure out what to feed to each parameter.

Thanks!

tensorflow-tts-normalize: "UnboundLocalError: local variable 'subdir' referenced before assignment"

I've formatted my dataset like the LJSpeech one in the README so I can skip writing a dataloader for finetuning.
This is my directory

And this is my metadata.csv. I've made it fileid|transcription|transcription because in ljspeech.py there was text = parts[2] which was giving me index out of range errors with just fileid|trans

And this is a small portion of os.listdir("wavs")

file0816.wav
file0039.wav
file2292.wav
file2433.wav
file0794.wav
file1314.wav
file2486.wav
file0695.wav
file2564.wav

All the preprocessing steps run fine until the normalization one:

Traceback (most recent call last):
  File "/usr/local/bin/tensorflow-tts-normalize", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/normalize.py", line 115, in main
    np.save(os.path.join(args.outdir, subdir, "norm-feats", f"{utt_id}-norm-feats.npy"),
UnboundLocalError: local variable 'subdir' referenced before assignment

Am I doing something wrong?

Some questions

Are the mel outputs generated compatible with kan-bayashi's ParallelWaveGAN?
There's a FastSpeech synthesis example, but not Tacotron2. How to generate speech with the Tacotron2 pretrained model and MelGAN-STFT?