nvidia / flowtron Goto Github PK

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer

Home Page: https://nv-adlr.github.io/Flowtron

License: Apache License 2.0

Python 15.34% Jupyter Notebook 84.62% Dockerfile 0.04%

speech-synthesis

flowtron's Introduction

Flowtron: an Autoregressive Flow-based Network for Text-to-Mel-spectrogram Synthesis

Rafael Valle, Kevin Shih, Ryan Prenger and Bryan Catanzaro

In our recent paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer. Flowtron borrows insights from Autoregressive Flows and revamps Tacotron in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple and stable. Flowtron learns an invertible mapping of data to a latent space that can be manipulated to control many aspects of speech synthesis (pitch, tone, speech rate, cadence, accent).

Our mean opinion scores (MOS) show that Flowtron matches state-of-the-art TTS models in terms of speech quality. In addition, we provide results on control of speech variation, interpolation between samples and style transfer between speakers seen and unseen during training.

Visit our website for audio samples.

Pre-requisites

NVIDIA GPU + CUDA cuDNN

Setup

Clone this repo: git clone https://github.com/NVIDIA/flowtron.git
CD into this repo: cd flowtron
Initialize submodule: git submodule update --init; cd tacotron2; git submodule update --init
Install PyTorch
Install python requirements or build docker image
- Install python requirements: pip install -r requirements.txt

Training from scratch

Update the filelists inside the filelists folder to point to your data
Train using the attention prior and the alignment loss (CTC loss) until attention looks good python train.py -c config.json -p train_config.output_directory=outdir data_config.use_attn_prior=1
Resume training without the attention prior once the alignments have stabilized python train.py -c config.json -p train_config.output_directory=outdir data_config.use_attn_prior=0 train_config.checkpoint_path=model_niters
(OPTIONAL) If the gate layer is overfitting once done training, train just the gate layer from scratch python train.py -c config.json -p train_config.output_directory=outdir train_config.checkpoint_path=model_niters data_config.use_attn_prior=0 train_config.ignore_layers='["flows.1.ar_step.gate_layer.linear_layer.weight","flows.1.ar_step.gate_layer.linear_layer.bias"]' train_config.finetune_layers='["flows.1.ar_step.gate_layer.linear_layer.weight","flows.1.ar_step.gate_layer.linear_layer.bias"]'
(OPTIONAL) tensorboard --logdir=outdir/logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence. Dataset dependent layers can be ignored

Download our published Flowtron LJS, Flowtron LibriTTS or Flowtron LibriTTS2K model
python train.py -c config.json -p train_config.ignore_layers=["speaker_embedding.weight"] train_config.checkpoint_path="models/flowtron_ljs.pt"

Fine-tuning for few-shot speech synthesis

Download our published Flowtron LibriTTS2K model
python train.py -c config.json -p train_config.finetune_layers=["speaker_embedding.weight"] train_config.checkpoint_path="models/flowtron_libritts2k.pt"

Multi-GPU (distributed) and Automatic Mixed Precision Training (AMP)

python -m torch.distributed.launch --use_env --nproc_per_node=NUM_GPUS_YOU_HAVE train.py -c config.json -p train_config.output_directory=outdir train_config.fp16=true

Inference demo

Disable the attention prior and run inference:

python inference.py -c config.json -f models/flowtron_ljs.pt -w models/waveglow_256channels_v4.pt -t "It is well know that deep generative models have a rich latent space!" -i 0

Related repos

WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis

Acknowledgements

This implementation uses code from the following repos: Keith Ito, Prem Seetharaman and Liyuan Liu as described in our code.

flowtron's People

Contributors

Stargazers

Watchers

Forkers

shaun95 aihill user01 ahmadmysra hash2430 artemg mysticaltech ggsonic batikim09 yhgon entn-at aoe-khkhan erekgit charlottecuc juanspecht gzfffff manojkl edresson haifengzeng wgwangang alexlyzhov adrianastan stjordanis anhnktp tspannhw appalachianwine hephaex abhijitdalavi lsheiba samialsindi zivzone machineko karkirowle gloriouskilka christophyoon nianzu-ethan-zheng v-nhandt21 tree-ind suvrajeet01 astricks strongstella zer0x42 joyoyoyoyoyo sala7efelninja anandsuresh alanderex philipismyen annadeichler mridulav eniolasonowo justinhedge daiwenlin cuongnm5 sdlibowen andreschen33 hsali raytrac3r alphawho solomidhero guoyang94 azzamunza eublefar hongyuntw imuledx danspeech glociks dc-thanh taylor-ka tubbz-alt raikarsagar epochsimate holttechnologycorporation shehrum sshuster shokoufeh-monjezi pragalbha-patil ilya16 sciai-ai transformsai rookiejunchen zachwe partus kronus97 oytunturk lihua8848 deepsreddy stqc sadam1195 cherokeelanguage jxzhangjhu hbata 3i-hust-tts dmzubr liisaratsep hkmtechnology tthanhsang luan78zaoha sharma-vikram summerofailearning szaidimitre

flowtron's Issues

Cannot change speaker for interpolation

Hello,

I am trying to interpolate between two speakers. I am using the model pretrained on LibriTTS.

I have read the issue "How is interpolation between speakers performed?" #33 but I still cannot manage to make it work.

Here are the steps I have followed:

gate_threshold = 1 (as mentioned in #33)
set 'dummy_speaker_embedding = True` in config.json as in the paper is written "For the experiment without speaker embeddings we interpolate between Sally and Helen using the phrase “We are testing this model.”."
I have removed seeds torch.manual_seed(seed) and torch.cuda.manual_seed(seed) from inference.py
z_1 ∼ N(0, 0.5) (as in paper)
z_2 ∼ N(0, 0.5) (as in paper)
interpolation
reset gate_threshold = 0.5
model.infer
waveglow.infer

But when sampling z_1 and z_2, even multiple times, after generating the spectrogram with the pretrained Flowtron and generating the audio with the pretrained WaveGlow, the speaker sounds the same, only the audio quality seems to vary. (z_1 and z_2 have different values)

Could you tell me which of the above steps I have done wrong or if I have forgotten any steps?
Once I have found z_1 and z_2 that I want to interpolate, do I have to reset gate_threshold = 0.5 before interpolation?
Why did we have to set gate_threshold = 1 in the first place when looking for z_1 and z_2?

Thanks

Batch size?

As stated in the paper 8 GPUs were used for training the models. As the batch size in config is set to 1 this means that the batch size for each gradient step is 8 right? So when training on 1 V100 GPU is it recommended to have a batch size of 8? (using gradient accumulation as it does not fit in memory)

RuntimeError: CUDA error: device-side assert triggered

I'm using "train-clean-100" from LibriTTS to train it.
I also changed the sampling rate to 24000

`
❯ python train.py -c config.json -p train_config.output_directory=outdir
train_config.output_directory=outdir
output_directory=outdir
{'train_config': {'output_directory': 'outdir', 'epochs': 10000000, 'learning_rate': 0.0001, 'weight_decay': 1e-06, 'sigma': 1.0, 'iters_per_checkpoint': 5000, 'batch_size': 1, 'seed': 1234, 'checkpoint_path': '', 'ignore_layers': [], 'include_layers': ['speaker', 'encoder', 'embedding'], 'warmstart_checkpoint_path': '', 'with_tensorboard': True, 'fp16_run': False}, 'data_config': {'training_files': 'filelists/libritts_train_clean_100_audiopath_text_sid_shorterthan10s_atleast5min_train_filelist.txt', 'validation_files': 'filelists/libritts_train_clean_100_audiopath_text_sid_atleast5min_val_filelist.txt', 'text_cleaners': ['flowtron_cleaners'], 'p_arpabet': 0.5, 'cmudict_path': 'data/cmudict_dictionary', 'sampling_rate': 24000, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'max_wav_value': 32768.0}, 'dist_config': {'dist_backend': 'nccl', 'dist_url': 'tcp://localhost:54321'}, 'model_config': {'n_speakers': 1, 'n_speaker_dim': 128, 'n_text': 185, 'n_text_dim': 512, 'n_flows': 2, 'n_mel_channels': 80, 'n_attn_channels': 640, 'n_hidden': 1024, 'n_lstm_layers': 2, 'mel_encoder_n_hidden': 512, 'n_components': 0, 'mean_scale': 0.0, 'fixed_gaussian': True, 'dummy_speaker_embedding': False, 'use_gate_layer': True}}

got rank 0 and world size 1 ...
Flowtron(
(speaker_embedding): Embedding(1, 128)
(embedding): Embedding(185, 512)
(flows): ModuleList(
(0): AR_Step(
(conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,))
(lstm): LSTM(1664, 1024, num_layers=2)
(attention_lstm): LSTM(80, 1024)
(attention_layer): Attention(
(softmax): Softmax(dim=2)
(query): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=640, bias=False)
)
(key): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(value): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(v): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=1, bias=False)
)
)
(dense_layer): DenseLayer(
(layers): ModuleList(
(0): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
)
)
)
(1): AR_Back_Step(
(ar_step): AR_Step(
(conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,))
(lstm): LSTM(1664, 1024, num_layers=2)
(attention_lstm): LSTM(80, 1024)
(attention_layer): Attention(
(softmax): Softmax(dim=2)
(query): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=640, bias=False)
)
(key): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(value): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(v): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=1, bias=False)
)
)
(dense_layer): DenseLayer(
(layers): ModuleList(
(0): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
)
)
(gate_layer): LinearNorm(
(linear_layer): Linear(in_features=1664, out_features=1, bias=True)
)
)
)
)
(encoder): Encoder(
(convolutions): ModuleList(
(0): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
(1): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
(2): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
)
(lstm): LSTM(512, 256, batch_first=True, bidirectional=True)
)
)
Number of speakers : 123
output directory outdir
Epoch: 0
C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:141.)
return torch.from_numpy(data).float(), sampling_rate
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [9,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [10,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [11,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [12,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [13,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [14,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [15,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [16,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [17,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [18,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [19,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [20,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [21,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [22,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [24,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [42,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [43,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [44,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [45,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [46,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [47,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [48,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [49,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [50,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [51,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [52,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [53,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [54,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [65,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [66,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [67,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [68,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [69,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [70,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [71,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [72,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [73,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [74,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [75,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [76,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [77,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDiTraceback (most recent call last): mSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex. File "train.py", line 300, in
cu:218: block: [0,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/ THCTensorIndex.cu:218: block: [0,0,0], thretrain(n_gpus, rank, **train_config)ad: [84,0,0] Assert
File "train.py", line 225, in train
ion srcIndex < srcSelectDimSize failed.
mel, speaker_vecs, text, in_lens, out_lens)C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], th
read: [85,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000 File "C:\Users\vladc\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimresult = self.forward(*input, **kwargs) Size File "C:\AI_Research_Project\flowtron\flowtron.py", line 577, in forward failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu: text = self.encoder(text, in_lens)218: block: [0,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorInde
File "C:\Users\vladc\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
x.cu:218: block: [0,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_10 00000000000/work/aten/src/THC/THCTensorIndex.cu:218result = self.forward(*input, **kwargs): block: [0,0,0], thread: [91,0
,0 File "C:\AI_Research_Project\flowtron\flowtron.py", line 322, in forward
] Assertion srcx = F.dropout(F.relu(conv(x)), 0.5, self.training)Inde x < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [0,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
File "C:\Users\vladc\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:218: block: [ result = self.forward(*input, **kwargs)0,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
C:/cb/pytorch_1000000000000/work/aten/src/THC/THCTensorIndex.cu:21
8: b File "C:\Users\vladc\anaconda3\lib\site-packages\torch\nn\modules\container.py", line 117, in forward
lock: [0,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
input = module(input)
File "C:\Users\vladc\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\vladc\anaconda3\lib\site-packages\torch\nn\modules\instancenorm.py", line 57, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "C:\Users\vladc\anaconda3\lib\site-packages\torch\nn\functional.py", line 2038, in instance_norm
use_input_stats, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA error: device-side assert triggered`

Style transfer COLAB

I made a notebook for the style transfer (see #9) in the Flowtron, you can try the Colab here.

You can clone/fork this one here if you have local resources.

The improvement compared to the gist I already made available is that it takes advantage of the GPU optimisation. The code is also somewhat easier to adapt and run for beginners.

I hope this proves useful for your research/work. If you find it useful, please star the repo link above.
Also open to any discussion related to more robust style transfer.

Can n_frames_per_step only be 1?

I try to change n_frames_per_step from 1 to 4, I find that the shape of log_s is different from mel. So the computation of "mel*log_s+bias" can't run.

Replication

Thanks @rafaelvalle for making the code and the pre-trained models available. Also thanks karkirowle for your code on style transfer in issue #9. I am preparing a presentation on Flowtron and wanted to replicate some of the audio files. Since the Sally model is not publicly available, I use pre-trained LJS model. This also means that the results cannot be replicated exactly. During the replication, I came across the following questions/issues:

In the paper it says "During inference we used sigma = 0.7" (p. 4), however in inference.py and in karkirowle's gist it is set to 0.8. Which value should I use for replication?
In total I performed three experiments (listen to them here and view the code here)
- Experiment 1: the modification speech variation worked well with sigma.
- Experiment 2: I noticed leaving out the final '.' (dot) at the end of the sentence drastically changes the prosody and sometimes even words. For example in the dogs example the word 'door' is pronounced differently. In the well_known example breathing sounds suddenly occur but more strikingly is that "latent space" changes to "latence". @rafaelvalle, do you know why this is the case?
- Experiment 3: replicates transferring the 'surprised' style of a speaker (I used the same fragment as is presented on the demo page, btw. it is not speaker 03 as stated on the demo page. Speaker 03 is a male, the emotional prior on the demo site is a female speaker). Is it true you only used a single audio file as a prior? I tried to replicate it using both time-averaging and without time averaging. When leaving out time averaging, the produced speech is fuzzy (see surprised_humans_transfer_without_time_avg.wav). Applying time averaging makes the speech comprehensible, but does not really lead to a style transfer in my opinion. We can observe the transferred speech has a longer duration than the baseline, but the spectrogram does not resemble the reference at all (e.g. compare pitch excursions in reference signal and the pitch contour in both the baseline and the transferred fragment in the screenshots below). @rafaelvalle, do you know why these results are so different using the LJS instead of the Sally model?

Surprised prior

Baseline (random prior)

Transferred

Thanks in advance!

P.S.: For other users who are using the script, you always need to set the seed before performing model.infer() again otherwise it will not use that seed!

Steps to replicate pretrained models on LibriTTS

First of all, thank you for the amazing paper and for releasing the code.

I have read the instructions and all the issues, but I can't find a single place with the steps that would allow me to faithfully replicate the training of the models you shared. - The Flowtron LibriTTS -

Would it be possible to provide a detailed step by step guide to do that?
Something that would include exactly:

Your OS environment
CUDA libraries
seeds used for training
exactly how many steps the model was trained for, for each flow training.
Anything else that would make my training match exactly your training.

I am big fan of easy reproducibility :)

Thanks again.

Inference problem (non-english dataset)

Hello, thanks to your help I was able to get good alignments in tensorboard:

But when I try inference with:
python inference.py -c config.json -f flowtron_model -w waveglow_model -t "text in portuguese." -i 1
I get bad results, like the following alignments:

It almost always produces the maximum number of frames even if I lower the gate threshold to 0.1. Gate looks like this in tensorboard:

As I see it, the model repeats parts of the texts to fit into the maximum number of frames. Do you have any tips for what I might do to fix inference?

Unconditioned Flowtron

Hi,

Did anybody try to train the Flowtron flow architecture in an unconditioned manner, for density estimation for example? If so, any hints and tips you could share?

Thanks!

Accent transfer?

How did accent transfer was done (Queen's accent on extra samples)?
Its cool idea but i didnt see anything about it in paper 🗡️

Deploying model after training

Hello, I'm kind of beginner and I was wondering if it is possible to deploy the model to be run on CPU only, after training it. Any help is much appreciated. Thank you!

Transfer Obama's way of speaking to my voice

what changes to do I need to do to the code? Thanks!

Bad attention weights

Hello

I am trying to train Flowtron on LJSpeech
Unfortunately after 24 hours of training, the attention weights still bad
Server configuration: 4 instances with 8xV100

Do you have any ideas?

can we use an alternative vocoder?

Quick question: Is it possible to do inference with an alternative vocoder such as wavernn instead of using waveglow?

Thanks,

Synthesis by batches

Hi! Thank you for the great work!
I am wondering if there is a way to use flowtron for synthesis of MEL spectrograms (or WAV files, eventually) by batches.
Currently, I am only able to feed text sequence and speaker id one by one, which is quite time-consuming.

The addition of an empty dimension to text and speaker_vecs inputs at lines 65-66 of inference.py makes me think the model supports batch inputs, at least for training.

But when I feed bigger tensors to it (end-padding shorter text sequences with 0 so that I can stack them as a tensor) I get this error:

(102 is my sequence length)

Traceback (most recent call last):
File "inference.py", line 141, in
args.id, args.n_frames, args.sigma, args.gate, args.seed)
File "inference.py", line 82, in infer
residual, speaker_vecs, text, gate_threshold=gate_threshold)
File "/home/user/dev/flowtron/flowtron.py", line 612, in infer
[text, speaker_vecs.expand(text.size(0), -1, -1)], 2)
RuntimeError: The expanded size of the tensor (102) must match the existing size (2) at non-singleton dimension 0. Target sizes: [102, -1, -1]. Tensor sizes: [2, 1, 128]

KeyError: 'iteration' when training using a pre-trained model

Hi,

I am getting this error whenever I try this step to train my model using the pre-trained Flowtron LJS model that you publish.

$ python train.py -c config.json -p train_config.ignore_layers=["speaker_embedding.weight"] train_config.checkpoint_path="models/flowtron_ljs.pt"
/home/lim/.anaconda3/envs/flowtron/lib/python3.6/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
train_config.ignore_layers=[speaker_embedding.weight]
ignore_layers=[speaker_embedding.weight]
train_config.checkpoint_path=models/flowtron_ljs.pt
checkpoint_path=models/flowtron_ljs.pt
{'train_config': {'output_directory': 'outdir', 'epochs': 10000000, 'learning_rate': 0.0001, 'weight_decay': 1e-06, 'sigma': 1.0, 'iters_per_checkpoint': 5000, 'batch_size': 1, 'seed': 1234, 'checkpoint_path': 'models/flowtron_ljs.pt', 'ignore_layers': '[speaker_embedding.weight]', 'include_layers': ['speaker', 'encoder', 'embedding'], 'warmstart_checkpoint_path': '', 'with_tensorboard': True, 'fp16_run': False}, 'data_config': {'training_files': 'filelists/ljs_audiopaths_text_sid_train_filelist.txt', 'validation_files': 'filelists/ljs_audiopaths_text_sid_val_filelist.txt', 'text_cleaners': ['flowtron_cleaners'], 'p_arpabet': 0.5, 'cmudict_path': 'data/cmudict_dictionary', 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'max_wav_value': 32768.0}, 'dist_config': {'dist_backend': 'nccl', 'dist_url': 'tcp://localhost:54321'}, 'model_config': {'n_speakers': 1, 'n_speaker_dim': 128, 'n_text': 185, 'n_text_dim': 512, 'n_flows': 2, 'n_mel_channels': 80, 'n_attn_channels': 640, 'n_hidden': 1024, 'n_lstm_layers': 2, 'mel_encoder_n_hidden': 512, 'n_components': 0, 'mean_scale': 0.0, 'fixed_gaussian': True, 'dummy_speaker_embedding': False, 'use_gate_layer': True}}
> got rank 0 and world size 1 ...
Traceback (most recent call last):
  File "train.py", line 300, in <module>
    train(n_gpus, rank, **train_config)
  File "train.py", line 190, in train
    optimizer, ignore_layers)
  File "train.py", line 107, in load_checkpoint
    iteration = checkpoint_dict['iteration']
KeyError: 'iteration'

Of course I downloaded the flowtron_ljs.pt into models directory and followed your setup guide.

$ git clone https://github.com/NVIDIA/flowtron.git
$ cd flowtron
$ git submodule update --init; cd tacotron2; git submodule update --init

$ conda create -n flowtron python=3.6
$ conda activate flowtron
$ conda install pytorch
$ pip install -r requirements.txt
$ pip install numpy~=1.15.0   # because numba requires "numpy>=1.15"

As a result the list of my anaconda packages is:

$ conda list
# packages in environment at /home/lim/.anaconda3/envs/flowtron:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_pytorch_select           0.2                       gpu_0  
audioread                 2.1.8                    pypi_0    pypi
blas                      1.0                         mkl  
ca-certificates           2020.1.1                      0  
certifi                   2020.4.5.1               py36_0  
cffi                      1.14.0           py36he30daa8_1  
cudatoolkit               10.1.243             h6bb024c_0  
cudnn                     7.6.5                cuda10.1_0  
cycler                    0.10.0                   pypi_0    pypi
decorator                 4.4.2                    pypi_0    pypi
inflect                   0.2.5                    pypi_0    pypi
intel-openmp              2020.1                      217  
joblib                    0.15.1                   pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.3                  he6710b0_1  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
librosa                   0.6.0                    pypi_0    pypi
libstdcxx-ng              9.1.0                hdf63c60_0  
llvmlite                  0.32.1                   pypi_0    pypi
matplotlib                2.1.0                    pypi_0    pypi
mkl                       2020.1                      217  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.15           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
ncurses                   6.2                  he6710b0_1  
ninja                     1.9.0            py36hfd86e86_0  
numba                     0.49.1                   pypi_0    pypi
numpy                     1.15.4                   pypi_0    pypi
openssl                   1.1.1g               h7b6447c_0  
pillow                    7.1.2                    pypi_0    pypi
pip                       20.0.2                   py36_3  
protobuf                  3.12.1                   pypi_0    pypi
pycparser                 2.20                       py_0  
pyparsing                 2.4.7                    pypi_0    pypi
python                    3.6.10               h7579374_2  
python-dateutil           2.8.1                    pypi_0    pypi
pytorch                   1.4.0           cuda101py36h02f0884_0  
pytz                      2020.1                   pypi_0    pypi
readline                  8.0                  h7b6447c_0  
resampy                   0.2.2                    pypi_0    pypi
scikit-learn              0.23.1                   pypi_0    pypi
scipy                     1.0.0                    pypi_0    pypi
setuptools                46.4.0                   py36_0  
six                       1.14.0                   py36_0  
sqlite                    3.31.1               h62c20be_1  
tensorboardx              2.0                      pypi_0    pypi
threadpoolctl             2.0.0                    pypi_0    pypi
tk                        8.6.8                hbc83047_0  
unidecode                 1.0.22                   pypi_0    pypi
wheel                     0.34.2                   py36_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3

Is there something I missed? thank you.

How does one cut one up a longer text so it fits into the available frames?

When running inference.py, texts that do not fit into n_frames are getting cropped from the end, so parts of the beginning are lost. It also looks like the duration of the spoken text depends on the speaker id (when using libriTTS). Increasing n_frames to get longer outputs seems to be limited by GPU memory, so it looks like one has to split the text into sentences, but I am wondering if there is any method to estimate how many frames a given string will require?

So far my best guess is to home in by try-and-error, using the returned attentions:

attention = torch.cat(attentions[0]).cpu().numpy()
if attention[-1].argmax()>0:
    print("text does not fit into available frame")

KeyError: "state_dict" for every waveglow model I've tried

Hi :)

I've successfully trained a flowtron model and I'm now trying to use it for inference with inference.py. I've tried a few different waveglow versions (v5 in the waveglow repo, v3 from the nvidia NGC catalog), both with and without running convert_model.py on them, and I always get the same error:

Traceback (most recent call last):
File "inference.py", line 132, in
args.id, args.n_frames, args.sigma, args.gate, args.seed)
File "inference.py", line 55, in infer
state_dict = torch.load(flowtron_path, map_location='cpu')["state_dict"]
KeyError: 'state_dict'

Clearly I'm doing something wrong - I was hoping you might be able to help me out.

Thanks :)

EDIT: I think the problem has nothing to do with waveglow at all, but instead is (possibly) that I can't use a flowtron checkpoint for inference. Is this the case?

training issues

HI,I try to train the model with libretts, but get this error. Can you help me

/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [65,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [66,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [67,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [68,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [69,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [70,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [71,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [72,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [73,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [74,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [75,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [76,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [77,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [9,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [10,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [11,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [12,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [13,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [14,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [15,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [16,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [17,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [18,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [19,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [20,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [21,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [22,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [24,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed.
/tmp/pip-req-build-8yht7tdu/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize failed.

Traceback (most recent call last):
File "train.py", line 300, in
train(n_gpus, rank, **train_config)
File "train.py", line 224, in train
z, log_s_list, gate_pred, attn, mean, log_var, prob = model(
File "/home/developer/anaconda3/envs/flow/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/mnt/datadisk/vhos/wdfang/flowtron/flowtron.py", line 577, in forward
text = self.encoder(text, in_lens)
File "/home/developer/anaconda3/envs/flow/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/mnt/datadisk/vhos/wdfang/flowtron/flowtron.py", line 322, in forward
x = F.dropout(F.relu(conv(x)), 0.5, self.training)
File "/home/developer/anaconda3/envs/flow/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/developer/anaconda3/envs/flow/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/developer/anaconda3/envs/flow/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/mnt/datadisk/vhos/wdfang/flowtron/flowtron.py", line 120, in forward
conv_signal = self.conv(signal)
File "/home/developer/anaconda3/envs/flow/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/developer/anaconda3/envs/flow/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 201, in forward
return F.conv1d(input, self.weight, self.bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

Repetitive generation with punctuation

Hi,

Thanks for open sourcing the code and the models. I noticed that the model performs poorly with punctuations in the text. Specifically it repeats the text before the punctuation multiple times before proceeding further. Here is a sample -

Audio with punctuation

python inference.py -c config.json -f ~/model_snapshots/flowtron_ljs.pt -w ~/model_snapshots/waveglow.pt -t "Hey, I am just trying you out!" -i 0 -n 1000 -s 0.5

Audio without punctuation

python inference.py -c config.json -f ~/model_snapshots/flowtron_ljs.pt -w ~/model_snapshots/waveglow.pt -t "Hey I am just trying you out!" -i 0 -n 1000 -s 0.5

I would expect punctuation to give the appropriate prosody to a TTS system and hence i would like to use them. Any suggestions on how to make it work?

how to obtain models/waveglow_256channels_v4.pt for the Inference demo?

I'm trying to locate the approprate waveglow_256channels_v4.pt for flowtron inference demo. i googled and found a colab notebook and tried using it:

"%%bash
wget -N -q https://raw.githubusercontent.com/yhgon/colab_utils/master/gfile.py
python gfile.py -u 'https://drive.google.com/open?id=1ZesPPyRRKloltRIuRnGZ2LIUEuMSVjkI' -f 'mellotron_libritts.pt'
python gfile.py -u 'https://drive.google.com/open?id=1Rm5rV5XaWWiUbIpg5385l5sh68z2bVOE' -f 'waveglow_256channels_v4.pt'"

Results:

python inference.py -c config.json -f models/flowtron_ljs.pt -w models/waveglow_256channels_v4.pt -t "It is well know that deep generative models have a deep latent space!" -i 0

Loaded checkpoint 'models/flowtron_ljs.pt')
Number of speakers : 1
Traceback (most recent call last):
File "inference.py", line 122, in
args.n_frames, args.sigma, args.seed)
File "inference.py", line 80, in infer
audio = waveglow.infer(mels.half(), sigma=0.8).float()
File "tacotron2/waveglow\glow.py", line 276, in infer
output = self.WN[k]((audio_0, spect))
File "F:\Users\Erik.DESKTOP-E5E1V83\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "tacotron2/waveglow\glow.py", line 161, in forward
self.cond_layersi,
File "F:\Users\Erik.DESKTOP-E5E1V83\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 594, in getattr
type(self).name, name))
AttributeError: 'WN' object has no attribute 'cond_layers'

How can I implement style transfer with inference.py ?

@rafaelvalle, @karkirowle, I am a little confused how to implement style transfer with inference.py or with flowtron model. I just train this model with LJSpeech.

Speaker id argument

There is a Speaker id argument in inference.py : parser.add_argument('-i', '--id', help='Speaker id', type=int).

Whenever I try to change it to something other than 0, I get the following error :

Traceback (most recent call last):
  File "inference.py", line 122, in <module>
    args.n_frames, args.sigma, args.seed)
  File "inference.py", line 63, in infer
    speaker_vecs = trainset.get_speaker_id(speaker_id).cuda()
  File "/data/code/flowtron/data.py", line 83, in get_speaker_id
    return torch.LongTensor([self.speaker_ids[int(speaker_id)]])
KeyError: 2

WaveGlow v4 model

The inference demo instructions mention waveglow_256channels_v4.pt, but the latest published WaveGlow model on https://ngc.nvidia.com/catalog/models/nvidia:waveglow_ljs_256channels is waveglow_256channels_ljs_v3.pt.

Will v4 be published and are there any relevant differences compared to v3 with respect to FlowTron? ~~It seems that the WaveGlow source code has changed in the mean time since I get these warnings:~~

not relevant

These warnings are gone when I use PyTorch 1.0 instead of 1.5 as is stated in both the WaveGlow and the Tacotron2 repos.

why autoregressive doesn't need revertible?

hi, thanks for your useful codes. I wander why in affine transform, we don't need to invert the autoregressive in backward compuation? we reuse the autoregressive RNN in backward the same as in forward?

Training on single speaker (male) hindi dataset - unable to attend (flow=1)

Continuing the conversation from this somewhat related issue - #39 - but opening a new issue since I my model is unable to attend even with 1 flow.

My issue some also somewhat similar to #41 - I have now trained my model for 790,000 steps. Validation loss seems to have hit a minima at around 360k steps at which point attention was biased, and further training made attention vanish and validation loss slowly increase.

Attached below are attention plots for steps=215k, 360k, 790k; and the validation loss.

I am wondering how to proceed. The options I'm considering are:

increase flow=2 and warmstart with checkpoint 790k.
increase flow=2 and warmstart with checkpoint 360k. I deleted that checkpoint to save some diskspace and am now regretting it - I'll have to restart training and get to 360k steps again.
Train a new tacotron2 model using my hindi dataset and warmstart flowtron using that.

@rafaelvalle Would love some advice at this point.

Also attaching some inference files below. The speech is senseless though.

sid0_sigma0.5.wav.zip

Saving state_dict in the checkpoint?

Hi,

I was wondering why aren't you saving the state_dict in the checkpoints? It doesn't seem like you are saving it somewhere else. And this is also the case for Waveglow: https://github.com/NVIDIA/waveglow/blob/master/train.py

Thanks!

training issues

Hi, i tried to train model with only LJ data, and with only own data, with fp16 and with fr32, with 1 gpu and with 3 gpu, but everywhere i have this

Always los is Nan.
When i start with pretrained chekpoint your code return this:

I solve it by changing def load_checkpoint , but loss is nan(

do u have any ideas what am i doing wrong?

Attention is flat line

Training flowtron on custom data of shape:

/path_to/voice1/filename1.wav|Random text with punctuation.|0
/path_to/voice1/filename2.wav|Random text with punctuation.|1

Loss:

attention:

I am using default hparams.

result audio is just random rambling with both voices.

Successfully trained tacotron2 model on the same voices but transcripts were phonemized using festival / g2p prior to training.

The data I'm using is clean and high quality, over 10GB in size.

Only changes Ive done to the repo code is commenting out
dataformats='HWC'
in logger script and changing .byte() to .bool() in flowtron.py to make it work with my setup

Training distributed on 2x2080ti for ~600k iterations

Can't attend in the second flow (non-english dataset)

Hi there, thanks for your excellent work and sharing.
I'm training a non-English model from scratch, 2 speakers, both > 10h. I used the encoder and embedding layers from tacotron2, and started from n_flow=1, batch_size=1.
After 1000k steps (~25 epochs), the model can produce not bad sound. The loss curve and attention plot are shown below.

So I add an extra flow and warmed-up from the 1 flow model. However, after 500k steps, there are only chaos in the second attention layer. The plots are shown below.

From your point, should I continue the 2 flows training? Or did I start the 2 flow training too early?

Docker support

Hey!

Docker support mentioned in README.md. But I cannot find Dockerfile. Could you please share this file?

Thank you

inversed 2 flow alingment

A trained 1st flow until I had not bad alignment

then trained seconds flow as it was said in previous issues and after a while alignment of the seconds flow became to look like this

which is strange to me because I expected same picture as in 1st flow. Nevertheless, sound is good. So could anybody explain to me why there is such alignment?

Model training too long without alignment.

Hi,I appreciate you great work with flowtron, loved the paper. I have gone through all the issues and the paper but still have some problems with getting a proprietary female voice to have good alignment, I will first list some preassumptions I have so somebody can correct me if something is wrong.(this list can also be useful to help someone starting)

How I have understood the process of training should be done:

Better to use a pretrained model for warmstart, example flowtron_ljs.pt or tacotron2 checkpoint with the config.json include_layers set to ['encoder','embedding']... also dont set checkpoint_path but warmstart_checkpoint_path to the path of the model instead
the first training cycle should be with config.json flow=1 and +1 for each other cycle
for beggining cycles its better to have less speaker and more data and later you can specialize for more speakers and less data
pretrained model dataset dependant layers can be ignored so set train_config.ignore_layers=["speaker_embedding.weight"]
one should train a model until loss plateus/overfits/attention looks good, than stop training take the checkpoint put its path to warmstart_checkpoint_path (or just checkpoint_path?) on it with include_layers set to '' (nothing)
some suggest lowering the learning rate down to 9e-5
be sure to fix a few bugs, the one with the byte to bool and the one where model=warmstart(...) misses the include_layers argument so defaults to None which is bad.
when training set sigma to 1, when inferencing set sigma to 0.5
when inferencing set gate_threshold to 0.5
dont forget to set config.json n_speakers to match with the num of speakers in your dataset
dont forget to format to your dataset the right way... text file should include lines like 'path_to_wav|sentence|speaker_num' and the audio files should be 22050hz mono 16bit

What I am not sure about:

can the second flow be on same data. If so whats the advantage of doing more flows if the first produces good alignment?
in how many iterations can I expect the model to produce good alignment?
if the loss plateus for a long time and the aligment is not good what to do?
is validation loss representative of the aligment? and what validation loss did you achieve?
if you are trying to get good on a single speaker only is it better to use tacotron2 instead, or flowtron adds some stability with more speakers? (Please answer this one)

Now specific to my problem:

I have trained 2 seperate models with difference in preprocessed text but on same data.

Model A

The first model (lets call it A) is trained warm started on your flowtron_ljs.pt with 3 datasets (one of those is ljspeech and 2 are proprietary with about 40000 sentences all combined).I will list the config.json and the run command for the run. Trained up to 1,300,000 iterations for 5 days with 4x1080Ti and it produces no alignment.

config.json:
{
"train_config": {
"output_directory": "outdir",
"epochs": 10000000,
"learning_rate": 1e-4,
"weight_decay": 1e-6,
"sigma": 1.0,
"iters_per_checkpoint": 5000,
"batch_size": 1,
"seed": 1234,
"checkpoint_path": "",
"ignore_layers": [],
"include_layers": ["encoder", "embedding"],
"warmstart_checkpoint_path": "",
"with_tensorboard": true,
"fp16_run": false
},
"data_config": {
"training_files": "data/processed/combined/dataset.train",
"validation_files": "data/processed/combined/dataset.test",
"text_cleaners": ["flowtron_cleaners"],
"p_arpabet": 0.5,
"cmudict_path": "data/cmudict_dictionary",
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"mel_fmin": 0.0,
"mel_fmax": 8000.0,
"max_wav_value": 32768.0
},
"dist_config": {
"dist_backend": "nccl",
"dist_url": "tcp://localhost:54321"
},
"model_config": {
"n_speakers": 3,
"n_speaker_dim": 128,
"n_text": 185,
"n_text_dim": 512,
"n_flows": 1,
"n_mel_channels": 80,
"n_attn_channels": 640,
"n_hidden": 1024,
"n_lstm_layers": 2,
"mel_encoder_n_hidden": 512,
"n_components": 0,
"mean_scale": 0.0,
"fixed_gaussian": true,
"dummy_speaker_embedding": false,
"use_gate_layer": true
}
}

command:
python -m torch.distributed.launch --use_env --nproc_per_node=4 train.py -c config.json -p train_config.output_directory=outdir train_config.ignore_layers=["speaker_embedding.weight"] train_config.warm_checkpoint_path="models/flowtron_ljs.pt" train_config.fp16=true

graphs: (disclamer: all aligments are shown for a single proprietary speaker)

Model B

The second model (lets call it B) is trained warm started on our tacotron2 checkpoint with 3 datasets which were phonemized with my custom preprocessor so I turned off the preprocessing inside flowtron (one of those is ljspeech and 2 are proprietary with about 40000 sentences all combined).I will list the config.json and the run command for the run.Trained up to 1,300,000 iterations for 5 days with 4x1080Ti and it produced slight better aligment but this isnt it.

{
"train_config": {
"output_directory": "outdir",
"epochs": 10000000,
"learning_rate": 1e-4,
"weight_decay": 1e-6,
"sigma": 1.0,
"iters_per_checkpoint": 5000,
"batch_size": 1,
"seed": 1234,
"checkpoint_path": "",
"ignore_layers": [],
"include_layers": ["encoder", "embedding"],
"warmstart_checkpoint_path": "",
"with_tensorboard": true,
"fp16_run": false
},
"data_config": {
"training_files": "data/processed/combined/dataset.train",
"validation_files": "data/processed/combined/dataset.test",
"text_cleaners": ["flowtron_cleaners"],
"p_arpabet": 0.5,
"cmudict_path": "data/cmudict_dictionary",
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"mel_fmin": 0.0,
"mel_fmax": 8000.0,
"max_wav_value": 32768.0
},
"dist_config": {
"dist_backend": "nccl",
"dist_url": "tcp://localhost:54321"
},

"model_config": {
    "n_speakers": 3,
    "n_speaker_dim": 128,
    "n_text": 185,
    "n_text_dim": 512,
    "n_flows": 1,
    "n_mel_channels": 80,
    "n_attn_channels": 640,
    "n_hidden": 1024,
    "n_lstm_layers": 2,
    "mel_encoder_n_hidden": 512,
    "n_components": 0,
    "mean_scale": 0.0,
    "fixed_gaussian": true,
    "dummy_speaker_embedding": false,
    "use_gate_layer": true
}

}

graphs:(disclamer: all aligments are shown for a single proprietary speaker)

Is something wrong because it takes 5 days of training for this? Should I stop this and continue on a checkpoint with flow=2?

How to prepare a dataset for Flowtron transfer learning

Hi there!

I was wondering what steps I could take to prepare a dataset for transfer learning with Flowtron. I noticed there was some instructions to update the filelists within the filelists folder, but I was having difficulties understanding the filelists and how the model pipeline reads in the data. Would love to learn more!

Thanks,
Adam

How can I reproduce the performance of flowtron_ljs.pt?

How can I reproduce the performance of flowtron_ljs.pt? Have you trained the model flowtron_ljs.pt with the default configuration config.json? I found the given configuration a bit confusing due to the following oddities:

'train_config: epochs' is set to 10,000,000 (too much)
'train_config: batch_size' is set to 1 (too little)
'model_config: n_flows' is set to 2 (unlike in the paper where 3 is proposed)
'model_config: n_components' is set to 0 (shouldn't the lower bound for this param be 1?)

nan loss

I'm trying to train flowtron on LibriSpeech, I'm making everything as it is in repository, but there is always such result:
10232: nan WARNING:root:NaN or Inf found in input tensor. /opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. /opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead.
is it okey? if it's not, what am I doing wrong? (if the path to data samples was wrong, there would have been an error, and there is no error, and all input values doesn't contain nan or inf)

Higher number of frames inference issue

When doing inference on provided model with longer texts and setting higher number of frames causes some strange effects, like looping the same words several times, while last ~6seconds are correct.

python3 inference.py -c config.json -f flowtron_ljs.pt -w waveglow_256channels_universal_v5.pt -t "Invertible models like Flowtron can be easier to train, because they can learn the distribution of the real-world training data directly. As a result, the flow-based approach to text-to-spectrogram generation provides more realism and more expressivity than current state-of-the-art speech synthesis models. Flowtron achieves this by giving users control over non-textual characteristics, enabling them to make a monotonic speaker sound expressive." -i 0 -n 2000

attnlayer0:

attnlayer1:

Harcoded paths in the code

flowtron/text/acronyms.py

Line 37 in 452a9e6

'/home/scratch.adlr-gcf/audio_denoising/datasets/cmu_dict/cmudict-0.7b',

flowtron/text/__init__.py

Line 121 in 452a9e6

'/home/scratch.adlr-gcf/audio_denoising/datasets/cmu_dict/heteronyms'))

WARNING:root:NaN or Inf found in input tensor.

GPU: 1060 6Gb

`❯ python train.py -c config.json -p train_config.output_directory=outdir
train_config.output_directory=outdir
output_directory=outdir
{'train_config': {'output_directory': 'outdir', 'epochs': 10000000, 'learning_rate': 0.0001, 'weight_decay': 1e-06, 'sigma': 1.0, 'iters_per_checkpoint': 5000, 'batch_size': 1, 'seed': 1234, 'checkpoint_path': '', 'ignore_layers': [], 'include_layers': ['speaker', 'encoder', 'embedding'], 'warmstart_checkpoint_path': '', 'with_tensorboard': True, 'fp16_run': False}, 'data_config': {'training_files': 'filelists/ljs_audiopaths_text_sid_train_filelist.txt', 'validation_files': 'filelists/ljs_audiopaths_text_sid_val_filelist.txt', 'text_cleaners': ['flowtron_cleaners'], 'p_arpabet': 0.5, 'cmudict_path': 'data/cmudict_dictionary', 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'max_wav_value': 32768.0}, 'dist_config': {'dist_backend': 'nccl', 'dist_url': 'tcp://localhost:54321'}, 'model_config': {'n_speakers': 1, 'n_speaker_dim': 128, 'n_text': 185, 'n_text_dim': 512, 'n_flows': 2, 'n_mel_channels': 80, 'n_attn_channels': 640, 'n_hidden': 1024, 'n_lstm_layers': 2, 'mel_encoder_n_hidden': 512, 'n_components': 0, 'mean_scale': 0.0, 'fixed_gaussian': True, 'dummy_speaker_embedding': False, 'use_gate_layer': True}}

got rank 0 and world size 1 ...
Flowtron(
(speaker_embedding): Embedding(1, 128)
(embedding): Embedding(185, 512)
(flows): ModuleList(
(0): AR_Step(
(conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,))
(lstm): LSTM(1664, 1024, num_layers=2)
(attention_lstm): LSTM(80, 1024)
(attention_layer): Attention(
(softmax): Softmax(dim=2)
(query): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=640, bias=False)
)
(key): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(value): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(v): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=1, bias=False)
)
)
(dense_layer): DenseLayer(
(layers): ModuleList(
(0): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
)
)
)
(1): AR_Back_Step(
(ar_step): AR_Step(
(conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,))
(lstm): LSTM(1664, 1024, num_layers=2)
(attention_lstm): LSTM(80, 1024)
(attention_layer): Attention(
(softmax): Softmax(dim=2)
(query): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=640, bias=False)
)
(key): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(value): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(v): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=1, bias=False)
)
)
(dense_layer): DenseLayer(
(layers): ModuleList(
(0): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
)
)
(gate_layer): LinearNorm(
(linear_layer): Linear(in_features=1664, out_features=1, bias=True)
)
)
)
)
(encoder): Encoder(
(convolutions): ModuleList(
(0): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
(1): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
(2): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
)
(lstm): LSTM(512, 256, batch_first=True, bidirectional=True)
)
)
Number of speakers : 1
output directory outdir
Epoch: 0
C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:141.)
return torch.from_numpy(data).float(), sampling_rate
C:\AI_Research_Project\flowtron\flowtron.py:373: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:19.)
self.score_mask_value)
0: nan
WARNING:root:NaN or Inf found in input tensor.
C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:141.)
return torch.from_numpy(data).float(), sampling_rate
Mean None
LogVar None
Prob None
Validation loss 0: nan
WARNING:root:NaN or Inf found in input tensor.
Saving model and optimizer state at iteration 0 to outdir/model_0
1: nan
WARNING:root:NaN or Inf found in input tensor.
2: nan`

And that error keeps on going
EDIT: It just finished doing it's thing

After 319: nan

319: nan WARNING:root:NaN or Inf found in input tensor. Traceback (most recent call last): File "train.py", line 300, in <module> train(n_gpus, rank, **train_config) File "train.py", line 238, in train loss.backward() File "C:\Users\vladc\anaconda3\lib\site-packages\torch\tensor.py", line 185, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\vladc\anaconda3\lib\site-packages\torch\autograd\__init__.py", line 127, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure

NaN While Training

Hi :)

I'm trying to get flowtron working but I've been having this persistent issue where I get nan errors:

*** Stack Trace ***
/pytorch/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead.
... (this repeats many times) ...
/pytorch/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead.
Mean None
LogVar None
Prob None
Validation loss 0: nan
WARNING:root:NaN or Inf found in input tensor.
*** End Stack Trace ***

I've had this error on both my personal Windows machine, and a cloud Linux machine. I'm wondering whether there is some sort of Pytorch version compatibility issue or something?

I'd be very grateful for any assistance that could be provided!

Can Flowtron be used for small dataset(only a few minutes)?

As I know, LibriTTS dataset has some speakers with only several minutes data. Although speaker with <5m data were filtered out as the paper mentioned, there should still contain some speakers with lacking data.
So it has proved that small dataset can be studied well as far as it was trained with many other data together. But what if I only have a small dataset as well as a pre-trained LibriTTS model?
I tried to fine tune a 5-minutes dataset on the LibriTTS model, but can't get any reasonable result.
Is there any powerful method to fine tune or adapt those small datasets from a pre-trained model?

Back step problem

I can't understand. Why are you iterating from 1 in forward method of AR_Back_Step?

def forward(self, mel, text, mask, out_lens):
        mel = torch.flip(mel, (0, ))
        # backwards flow, send padded zeros back to end
        for k in range(1, mel.size(1)):
            mel[:, k] = mel[:, k].roll(out_lens[k].item(), dims=0)

        mel, log_s, gates, attn = self.ar_step(mel, text, mask, out_lens)

        # move padded zeros back to beginning
        for k in range(1, mel.size(1)):
            mel[:, k] = mel[:, k].roll(-out_lens[k].item(), dims=0)

        return torch.flip(mel, (0, )), log_s, gates, attn

How is interpolation between speakers performed?

I have success adjusting sigma values and now reading about the interpolation between speakers: "First, we perform inference by sampling z ∼ N (0, 0.5) until we find two z values, zh and zs, that produce mel-spectrograms with Helen’s and Sally’s voice respectively. We then generate samples by performing inference while linearly interpolating between zh and zs."

How would I go about doing this?

Style transfer

The bits related to the style transfer experiments are unclear to me. The prosodic control is now represented by the latent space instead of the GST compared to Mellotron/Tacotron 2, but I'm missing how you can project an utterance to the latent space, i.e Section 4.4, especially Section 4.4.4 in the paper.

My guess is the following, and please confirm if that's right:

getting z from style utterance ("prior evidence") you do flowtron.forward( mel, speaker_vecs, text, in_lens, out_lens). You do this with the style utterance's transcription. What's the correct way to assign speaker vecs then, if it is an unseen speaker?
running flowtron.inference() using that z (residual) from the code.

I tried using the style speaker's id and I found that the style is very nicely represented but the spoken text is gibberish.

I put the style example (angry.wav) and the synthesised example here. The utterance to synth: "How are you today?"

Here is what I changed in inference.py (sorry, my padding solution is criminal):

with torch.no_grad():
        if utterance is None:
            residual = torch.cuda.FloatTensor(1, 80, n_frames).normal_() * sigma
        else:

            utt_text = "Dogs are sitting by the door!"
            utt_text = trainset.get_text(utt_text).cuda()
            utt_text = utt_text[None]

            # loading mel spectra, in_lens, out_lens?
            audio, _ = load_wav_to_torch(utterance)
            mel = trainset.get_mel(audio).to(device="cuda")

            # You need to padd this because of the permute
            mel = mel[None]
            out_lens = torch.LongTensor(1).to(device="cuda")

            out_lens[0] = mel.size(2)
            in_lens = torch.LongTensor([utt_text.shape[1]]).to(device="cuda")
            residual, _, _, _, _, _, _ = model.forward(mel, speaker_vecs, utt_text, in_lens, out_lens)

        residual = residual.permute(1,2,0)
        # TODO: This is a horrible solution to pad once if needed
        if n_frames > residual.shape[2]:
            pad_len = n_frames - residual.shape[2]
            residual = torch.cat((residual,residual[:,:,:pad_len]),axis=2)
        else:
            residual = residual[:,:,:n_frames]

        mels, attentions = model.infer(residual, speaker_vecs, text)

Training issue with Male voice

I am trying to train flowtron model with Male voice. After training for about 270,000 steps, the audio generated is very random. Not a single word is getting generated properly. It's not even learning attention. Earlier I tried with LJ speech dataset, after about 170,000 steps of training, audio samples were not so bad, the pronunciation was not up to the mark, But I could understand what was being said.
I am attaching the attention plots here.
I have the same amount of data as LJ speech.

What are the attention_weights_1 and attention_weights_2 ?

I have success with using nvidia's tacotron2 repo and understanding alignment with that, but what are these 2 other alignments shown on tensorboard? I see that attention_weights_0 is being updated but the other 2 attentions are still at step 0. Thanks.

Warmstart using Tacotron 2

I want train Flowtron on LJSpeech using Tacotron 2 checkpoint. I get the following error :

Traceback (most recent call last):
  File "train.py", line 303, in <module>
    train(n_gpus, rank, **train_config)
  File "train.py", line 186, in train
    model = warmstart(warmstart_checkpoint_path, model, include_layers)
  File "train.py", line 100, in warmstart
    model.load_state_dict(model_dict)
  File "/home/dazenkov/dazenkov/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Flowtron:
        size mismatch for embedding.weight: copying a param with shape torch.Size([148, 512]) from checkpoint, the shape in current model is torch.Size([185, 512]).

Of course,I have commented out these rows
My config:

"train_config": {
        "output_directory": "/raid/flowtron/flow_1/flowtron_ljs",
        "epochs": 1000,
        "learning_rate": 1e-4,
        "weight_decay": 1e-6,
        "sigma": 1.0,
        "iters_per_checkpoint": 5000,
        "batch_size": 2,
        "seed": 1234,
        "checkpoint_path": "",
	"ignore_layers": [],
        "include_layers": ["encoder", "embedding"],
        "warmstart_checkpoint_path": "/raid/flowtron/tacotron2/tacotron2_statedict.pt",
	"with_tensorboard": true,
        "fp16_run": false
    }
...
"model_config": {
        "n_speakers": 1,
        "n_speaker_dim": 128,
        "n_text": 185,
        "n_text_dim": 512,
        "n_flows": 1,
        "n_mel_channels": 80,
        "n_attn_channels": 640,
        "n_hidden": 1024,
        "n_lstm_layers": 2,
        "mel_encoder_n_hidden": 512,
        "n_components": 0,
        "mean_scale": 0.0,
        "fixed_gaussian": true,
        "dummy_speaker_embedding": false,
        "use_gate_layer": true
    }

I download tacotron2_statedict.pt from this link (from Tacotron 2 repo).
Why embedding size different? Or I misunderstood how to use tacotron2 statedict. Thank you!

Changing speed of speech

Is there a way to sample z such that the speed of speech is changed, .i.e, make it faster or slower ?

I tried with different sigma but that seems to control variability in speech and couldn't really change speed.

The paper also didn't talk about it. Any thoughts on this ?

Unintelligible speech - inference on pre-trained models

I am trying to synthesise audio starting from the available pre-trained models

python3 inference.py -c config.json -f models/flowtron_ljs.pt -w models/waveglow_256channels_universal_v4.pt -t "Hey hello there" -o output_synth/ -i 0

but the output is not intelligible:

https://drive.google.com/file/d/1bWpbnMoRF5lm5RYwxZNj8bomiY_WF3mA/view?usp=sharing

The alignment also looks off:

I tried with both LJS and LibriTTS models.

Any idea why this happens?

Thanks!

Difference between flowtron and hierarchical generative GM-VAE by google

Hi guys,
First, thanks for the great works.
My background is from computer vision and am not really familiar with sequential data deep learning and Tacotron details.
My major question when I read both papers, is what is the major difference between the two models?
Can I have some hints on that? Thanks.
Regards,
Justin Tian

nvidia / flowtron Goto Github PK

flowtron's Introduction

Flowtron: an Autoregressive Flow-based Network for Text-to-Mel-spectrogram Synthesis

Rafael Valle, Kevin Shih, Ryan Prenger and Bryan Catanzaro

Pre-requisites

Setup

Training from scratch

Training using a pre-trained model

Fine-tuning for few-shot speech synthesis

Multi-GPU (distributed) and Automatic Mixed Precision Training (AMP)

Inference demo

Related repos

Acknowledgements

flowtron's People

Contributors

Stargazers

Watchers

Forkers

flowtron's Issues

How I have understood the process of training should be done:

What I am not sure about:

Now specific to my problem:

Model A

Model B

Recommend Projects

Recommend Topics

Recommend Org