Giter VIP home page Giter VIP logo

tacotron2's People

Contributors

cobr123 avatar dependabot[bot] avatar grzegorz-k-karch avatar jybaek avatar rafaelvalle avatar raulpuric avatar sih4sing5hong5 avatar taras-sereda avatar yoks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tacotron2's Issues

Can't Figure Out How to Take Mel-Spectrogram from nvidia/tacotron2 to nv-wavenet/pytorch (I'm new to Pytorch)

Got this error when I tried Tacotron2 inference.py (converted from ipynb) and took the mel-spectrogram to nv-wavenet/pytorch/inference.py:

RuntimeError: Expected 4-dimensional weight for 4-dimensional input [1, 1000, 513, 1], but got weight of size [80, 80, 1024] instead

I don't think I'm doing it right.
In tacotron2/inference.py, how do I use this?

mel = torch.load(spec_from_mel) #Or mel_outputs_postnet?
import audio_processing
from audio_processing import dynamic_range_decompression
mel = dynamic_range_decompression(mel)
mel = mel.cpu().numpy()
mel = mel.transpose()

And, how do I save that to a .pt tensor for nv-wavenet/pytorch/inference.py?
I'm doing this (and failing with the error above when I run the .pt file through nv-wavenet/pytorch/inference.py):

filename = '/opt/AITTSv3/nv-wavenet/pytorch/newdemopts/x.pt'
mel = torch.tensor(mel)
mel = torch.save(mel, filename)

Please help. Thanks!

'Parameter' object has no attribute '_execution_engine' when running with multi-GPUs

An error happened when the repo runned on 2-gpus of single node with the commandline:

CUDA_VISIBLE_DEVICES=2,3 python -m multiproc train.py --n_gpus=2 --rank=0 --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

The import info echoed at the begining is below:

['train.py', '--n_gpus=2', '--rank=0', '--output_directory=outdir', '--log_directory=logdir', '--hparams=distributed_run=True,fp16_run=True', '--n_gpus=2', '--group_name=group_2018_06_19-083304', '--rank=0']
['train.py', '--n_gpus=2', '--rank=0', '--output_directory=outdir', '--log_directory=logdir', '--hparams=distributed_run=True,fp16_run=True', '--n_gpus=2', '--group_name=group_2018_06_19-083304', '--rank=1']
FP16 Run: True
Dynamic Loss Scaling: True
Distributed Run: True
cuDNN Enabled: True
cuDNN Benchmark: False
Initializing distributed
Done initializing distributed

Then an error occured:

Traceback (most recent call last):
  File "train.py", line 291, in <module>
    args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
  File "train.py", line 222, in train
    optimizer.backward(loss)
  File "/home/mlwoo/repo/tacotron2/fp16_optimizer.py", line 362, in backward
    self.loss_scaler.backward(loss.float())
  File "/home/mlwoo/repo/tacotron2/loss_scaler.py", line 80, in backward
    scaled_loss.backward()
  File "/home/mlwoo/dev-bin/anaconda3/envs/wavenet/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/mlwoo/dev-bin/anaconda3/envs/wavenet/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/home/mlwoo/repo/tacotron2/distributed.py", line 95, in allreduce_hook
    param._execution_engine.queue_callback(allreduce_params)
AttributeError: 'Parameter' object has no attribute '_execution_engine'

And the significant dpt file was also not generated in the folder although the info showed the initialization was Done.

I also tried to run the repo with the commandline:

CUDA_VISIBLE_DEVICES=2,3 python train.py --n_gpus=2 --rank=0 --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

Then it hung at the initialization like what the issue #38 metioned. The content of dpt file is different from that of issue #38 :

group_name# 0 32861 6 10.128.2.14 172.17.0.1 10.128.1.14 fe80::ae1f:6bff:fe27:b358 fe80::526b:4bff:fe2f:21c8 fe80::1363:7bed:662d:ea3e

Could you help me to step out the dilema?

Trainning data for wavenet_vocoder

I trained this model with my own dataset and want to train wavenet_vocoder after this. But I am confused what is the training data for wavenet_vocoder? Mel_spectrogram generated by this model or generated by sftf?
Besides, what do you mean by "When performing Mel-Spectrogram to Audio synthesis with a WaveNet model, make sure Tacotron 2 and WaveNet were trained on the same mel-spectrogram representation"?

Training speed too slow

My training speed is about 5s / iteration. Is that normal (Single GPU)? I tried training with 4 GPU but the speed seems not change much.

I noticed that there is load_mel_from_disk option which may be the key factor of training speed. Can I use the preprocess.py from https://github.com/r9y9/wavenet_vocoder/ to generate mels to be loaded?

Besides, I multiplied the learning rate by 0.01 after 150000 steps, however, the validation loss seems not decreasing. The audio generated by GL_vocoder isn't clear enough like demo.wav.
tacotron_audio.wav.zip
Actually, the quality of audio is not improving after 150000 steps.

The images are shown below. Any guidance or suggestions would be appreciated.
training_loss
validation_loss
grad norm
alignment
Mel Predicted
mel_predicted
Target Mel
mel_target

Windows Support?

Can I install this on windows, since most of these commands are for linux? If I can, what would I need to do to install it?

Hard to train on one small GPU

I wrote this issue in response to this one.

The issue is that I am using an 8Gb GPU (NVIDIA GeForce GTX 980M) so I've brought down batch size to 24 with the LJ Speech dataset.
And after 4 days (I am now at 31k iterations), I do not see any improvement in the attention alignement. The loss is stucked between 0.7 and 0.5 almost since the start of the training.

At this point, do I have to wait longer or do you consider it is a failure ?

Here are my curves and alignement plot at this point:
Logs
alignement

half-precision support on single GPU?

Thanks for this repo. I have a Pascal-based card (GTX1080) and I switched training to using floating-point-16; to see if it improves training times. I've set fp16_run to True but I get an error.

`
Epoch: 0

Traceback (most recent call last):
File "train.py", line 285, in
args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
File "train.py", line 210, in train
y_pred = model(x)
File "/tacotron2-master/model.py", line 80, in forward
alignment.data.masked_fill_(mask, self.score_mask_value)
RuntimeError: value cannot be converted to type Half without overflow: -inf
`

Since I only have one card I left distributed_run to False.

Running the repo with the default floating-point precision works fine. As others remarked the training times on single GPU is rather slow, on average it takes about 5.5 seconds per iteration with the default single-GPU settings and on LJ-Speech.

about pretrained wavenet_vocoder?

I downloaded the latest code and trained 118 epochs, and then downloaded the pre-trained wavnet vocoder(https://github.com/r9y9/wavenet_vocoder/) which trained with LJSpeech, but i could not synthesize audio similar to human voice. Can you share which pre-trained wavnet vocoder you used to synthesize the "demo.wav"? Thank you very much!

Memory Requirement

Hi!
Thanks for the code! I tried to run train.py, but got a memory error. I have a 1080Ti with 11Gb. What hardware do you use? Do you know what are the memory requirements?

Thank you!

Got 'Incomplete wav chunk' exception through scipy.io.wavfile.read

I have got an Incomplete wav chunk exception through scipy.io.wavfile.read. However it would be all right through librosa.core.load method. I suggest the latter instead in util.py and then we do not need max_wav_value to normalize the wav data.

/home/leoma/miniconda3/envs/tensorflow/lib/python3.6/site-packages/scipy/io/wavfile.py:273: WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning)
Traceback (most recent call last):
File "train.py", line 285, in
args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
File "train.py", line 203, in train
for i, batch in enumerate(train_loader):
File "/home/leoma/miniconda3/envs/tensorflow/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/home/leoma/miniconda3/envs/tensorflow/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
File "/home/leoma/miniconda3/envs/tensorflow/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/leoma/miniconda3/envs/tensorflow/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/leoma/tacotron2_nvidia/data_utils.py", line 51, in getitem
return self.get_mel_text_pair(self.audiopaths_and_text[index])
File "/home/leoma/tacotron2_nvidia/data_utils.py", line 34, in get_mel_text_pair
mel = self.get_mel(audiopath)
File "/home/leoma/tacotron2_nvidia/data_utils.py", line 38, in get_mel
audio = load_wav_to_torch(filename, self.sampling_rate)
File "/home/leoma/tacotron2_nvidia/utils.py", line 16, in load_wav_to_torch
sampling_rate, data = read(full_path)
File "/home/leoma/miniconda3/envs/tensorflow/lib/python3.6/site-packages/scipy/io/wavfile.py", line 248, in read
raise ValueError("Incomplete wav chunk.")
ValueError: Incomplete wav chunk.

inference.ipynb doesn't work

Hi,

with PyTorch 0.4 and Python 3.5 inference.ipynb fails on 5th cell:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-021867fe58d0> in <module>()
      2 model = load_model(hparams)
      3 model.load_state_dict(torch.load(checkpoint_path)['state_dict'])
----> 4 model = model.module
      5 _ = model.eval()

/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
    530                 return modules[name]
    531         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 532             type(self).__name__, name))
    533 
    534     def __setattr__(self, name, value):

AttributeError: 'Tacotron2' object has no attribute 'module'

And then (if model = model.module commented out) on 8th cell:

~/tacotron2/model.py in parse_decoder_outputs(self, mel_outputs, gate_outputs, alignments)
    327         alignments = torch.stack(alignments).transpose(0, 1)
    328         # (T_out, B) -> (B, T_out)
--> 329         gate_outputs = torch.stack(gate_outputs).transpose(0, 1)
    330         gate_outputs = gate_outputs.contiguous()
    331         # (T_out, B, n_mel_channels) -> (B, T_out, n_mel_channels)

RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

Thanks!

Possible GPU Memory Leak - Mem Usage Keeps on Increasing with Every Iteration

I'm on Ubuntu 16.04, CUDA 9.1, CUDNN 7.1, one P6000 GPU (compute 6.1) with 16 Gb VRAM, Python 3.6, Pytorch 0.4 and all other required libraries are successfully installed and properly working.
Tried batch sizes 4, 8, 12, 16, 24, 44 and 48.
Always get out of memory error after some time.

Training starts, and GPU VRAM usage keeps on increasing with every iteration.
I tried correcting a couple of lines in layers.py and another line in train.py following what was said in deprecation warnings.
Still get out of memory errors.

Tried disabling CUDNN in hparams after reading something about memory leaks with Pytorch and CUDNN.
Still get out of memory errors.

I have a private dataset that mimicks the exact attributes, properties and structure of LJSpeech.
Tried using all 480K WAV train files, 20K WAV test files and 5K WAV validation files.
Also tried using 12.5K WAV training files, 1K WAV test files and 500 WAV validation files (same as LJSpeech).
Still get out of memory errors.

What could be the problem?
Would greatly appreciate some help regarding this.
Thanks in advance!

Failed to interpret wav file as pickle.

I am getting following error when I ran the train script.
I don't see any preprocessing step to convert WAV to pickle files in README. Any leads ?

Traceback (most recent call last):
File "train.py", line 288, in
args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
File "train.py", line 206, in train
for i, batch in enumerate(train_loader):
File "/nfs_home/dvooturi/venvs/tacotron/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/nfs_home/dvooturi/venvs/tacotron/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
OSError: Traceback (most recent call last):
File "/nfs_home/dvooturi/venvs/tacotron/lib/python3.6/site-packages/numpy/lib/npyio.py", line 428, in load
return pickle.load(fid, **pickle_kwargs)
_pickle.UnpicklingError: unpickling stack underflow

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/nfs_home/dvooturi/venvs/tacotron/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/nfs_home/dvooturi/venvs/tacotron/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/nfs_home/dvooturi/projects/tts/tacotron2/data_utils.py", line 60, in getitem
return self.get_mel_text_pair(self.audiopaths_and_text[index])
File "/nfs_home/dvooturi/projects/tts/tacotron2/data_utils.py", line 36, in get_mel_text_pair
mel = self.get_mel(audiopath)
File "/nfs_home/dvooturi/projects/tts/tacotron2/data_utils.py", line 48, in get_mel
melspec = torch.from_numpy(np.load(filename))
File "/nfs_home/dvooturi/venvs/tacotron/lib/python3.6/site-packages/numpy/lib/npyio.py", line 431, in load
"Failed to interpret file %s as a pickle" % repr(file))
OSError: Failed to interpret file 'LJSpeech-1.1/wavs/LJ038-0104.wav' as a pickle

Problem with the Mel Spectrogram Representation

Request from issue #24:
After training the model (70,000 iterations, val loss: 0.46, 0.34<= loss <= 0.49) and converting the obtained mel spectrogram to be feeded to the r9y9's wavenet vocoder, it turns out to sound like the voice has the flu.

  • Text: 'This is an example of text to speech synthesis after 9 days training. This may sound awful, but it is a start.'
  • Mel Spec File
  • Audio File
  • Convertion process before feeding to r9y9's wavenet vocoder:
mel = torch.load(conditional_path)
mel = dynamic_range_decompression(mel)
mel = mel.cpu().numpy()
mel = mel.transpose()

mel = audio._amp_to_db(mel) - hparams.ref_level_db
if not hparams.allow_clipping_in_normalization:
    assert mel.max() <= 0 and mel.min() - hparams.min_level_db >= 0
mel = audio._normalize(mel)

Mel Spec

compile error!

Hi, When I run "sudo pip install -r requirements.txt", I get an error " Could not find a version that satisfies the requirement torch==0.2.0.post3" . I successfully install pytorch by anaconda, command likes "sudo ./bin/conda install pytorch torchvision -c pytorch". My python's version is 2.7 in anaconda. How can I resolve the error?

train.py fails to train

Hi @rafaelvalle , thanks for the great implementation!

But I can't start the training right as stated in your README (with pytorch-0.4.0):

$ python train.py -o outdir -l logdir
FP16 Run: False
Dynamic Loss Scaling True
Distributed Run: False
cuDNN Enabled: True
cuDNN Benchmark: False
/root/tacotron2/layers.py:35: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  self.conv.weight, gain=torch.nn.init.calculate_gain(w_init_gain))
/root/tacotron2/layers.py:15: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  gain=torch.nn.init.calculate_gain(w_init_gain))
Epoch: 0
Traceback (most recent call last):
  File "train.py", line 272, in <module>
    args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
  File "train.py", line 197, in train
    y_pred = model(x)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 110, in forward
    inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 121, in scatter
    return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs
    inputs = scatter(inputs, target_gpus, dim) if inputs else []
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 29, in scatter
    return scatter_map(inputs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 16, in scatter_map
    return list(zip(*map(scatter_map, obj)))
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 16, in scatter_map
    return list(zip(*map(scatter_map, obj)))
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 14, in scatter_map
    return Scatter.apply(target_gpus, None, dim, obj)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/_functions.py", line 74, in forward
    outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
  File "/usr/local/lib/python3.5/dist-packages/torch/cuda/comm.py", line 143, in scatter
    chunks = tensor.chunk(len(devices), dim)
RuntimeError: chunk expects at least a 1-dimensional tensor

Thanks!

Audio examples?

Very cool work, this! However, it would be ideal to also provide examples of input text + output audio from a trained system, alongside held-out examples from the database. This will give an impression of what kind of results the code is capable of generating with the LJSpeech data, and is standard practise in the text-to-speech field.

Aside from synthesising held-out sentences from LJSpeech, Google's speech examples for Tacotron 2 provide another set of challenging text prompts to generate.

Are there any plans to do this? Or are synthesised speech examples already available somewhere?

How is teacher forcing being annealed?

Hi guys,
Some really great work here!

I'm trying to understand how tf is being applied in this model as compared to the Tacotron2 paper by Google.

As I understand it, in the paper, tf = 1.0 is applied for 50k steps before it is annealed (I would image, it goes all the way to 0)

Whereas in the code here, we can see tf being applied all the time during training.

decoder_input = decoder_inputs[len(mel_outputs) - 1]

And from some of the discussions on this repo we see that the Tacotron2 model does seem to generate(?) good samples when trained with full teacher forcing.

Quite confused how this could work unless there is some form of tf annealing being used.

@rafaelvalle could you please confirm whether I'm missing something?

Thanks!

Applying dropout at inference time

Hello,

Fantastic implementation, and thank you for open-sourcing it!

I've noticed your implementation applies dropouts during inference time to several different layers, but to my understanding, dropouts should only be applied to the decoder prenet layers at inference time. I was wondering if there was any reasoning behind the current implementation.

Thanks!

CUDA Runtime Error: Out of Memory

I finally got all the errors resolved, but then this new one came up:
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

Here is the full log:
~/tacotron2$ python3 train.py --output_directory=outdir --log_directory=logdir FP16 Run: False Dynamic Loss Scaling: True Distributed Run: False cuDNN Enabled: True cuDNN Benchmark: False /home/mrbreadwater/tacotron2/layers.py:35: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. self.conv.weight, gain=torch.nn.init.calculate_gain(w_init_gain)) /home/mrbreadwater/tacotron2/layers.py:15: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. gain=torch.nn.init.calculate_gain(w_init_gain)) Epoch: 0 THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "train.py", line 291, in <module> args.warm_start, args.n_gpus, args.rank, args.group_name, hparams) File "train.py", line 216, in train y_pred = model(x) File "/home/mrbreadwater/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/mrbreadwater/tacotron2/model.py", line 510, in forward encoder_outputs, targets, memory_lengths=input_lengths) File "/home/mrbreadwater/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/mrbreadwater/tacotron2/model.py", line 403, in forward decoder_input) File "/home/mrbreadwater/tacotron2/model.py", line 363, in decode attention_weights_cat, self.mask) File "/home/mrbreadwater/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/mrbreadwater/tacotron2/model.py", line 77, in forward attention_hidden_state, processed_memory, attention_weights_cat) File "/home/mrbreadwater/tacotron2/model.py", line 60, in get_alignment_energies processed_query + processed_attention_weights + processed_memory)) RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

I'm using a GTX 1050 Ti. Anything I can do to fix it?

EDIT: I'm running Ubuntu 18.04

How to control the style or intonation of the synthesized speech?

Hi, I found the synthesized speech files are different by running inference many times. Their style or intonation always are diverse. How to contral the style or intonation of the synthesized speech?

Another question:
How to choose the best model from a series of checkpoint models? I found it is not sure that more iteration step is corresponding to better synthesis performance.

RuntimeError: torch/csrc/autograd/variable.cpp:138: get_grad_fn: Assertion `output_nr_ == 0` failed.

Hello,

Centos 7
Python 3.6
CUDA 9

Used conda for installing pytorch for this configuration.

Had to remove the version specification for numpy and torch in your requirements.txt as I got the error message logged in issue #5

Now,
numpy = 1.14.3
torch = 0.4.0
tensorflow = 1.6.0

After the requirements installation, when I run train.py, I get the following error.

Please help:


python train.py --output_directory=./outdir --log_directory=./logdir
/home/tacotron2/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from .conv import register_converters as register_converters
FP16 Run: False
Dynamic Loss Scaling True
Distributed Run: False
cuDNN Enabled: True
cuDNN Benchmark: False
/home/tacotron2/tacotron2/layers.py:35: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform
.
self.conv.weight, gain=torch.nn.init.calculate_gain(w_init_gain))
/home/tacotron2/tacotron2/layers.py:15: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform
.
gain=torch.nn.init.calculate_gain(w_init_gain))
Epoch: 0
Traceback (most recent call last):
File "train.py", line 285, in
args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
File "train.py", line 210, in train
y_pred = model(x)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
raise output
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
output = module(*input, **kwargs)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/tacotron2/tacotron2/model.py", line 506, in forward
encoder_outputs = self.encoder(embedded_inputs, input_lengths)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/tacotron2/tacotron2/model.py", line 188, in forward
outputs, _ = self.lstm(x)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 192, in forward
output, hidden = func(input, self.all_weights, hx, batch_sizes)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 323, in forward
return func(input, *fargs, **fkwargs)
File "/home/tacotron2/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 287, in forward
dropout_ts)
RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion output_nr == 0 failed.

inference failed

when eval the model ! has a error!

RuntimeError Traceback (most recent call last)
in ()
5 except:
6 pass
----> 7 model.load_state_dict({k.replace('module.',''):v for k,v in torch.load(checkpoint_path)['state_dict'].items()})
8 _ = model.eval()

/home/btows/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
719 if len(error_msgs) > 0:
720 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 721 self.class.name, "\n\t".join(error_msgs)))
722
723 def parameters(self):

RuntimeError: Error(s) in loading state_dict for Tacotron2:
Unexpected key(s) in state_dict: "encoder.convolutions.0.1.num_batches_tracked", "encoder.convolutions.1.1.num_batches_tracked", "encoder.convolutions.2.1.num_batches_tracked", "postnet.convolutions.0.1.num_batches_tracked", "postnet.convolutions.1.1.num_batches_tracked", "postnet.convolutions.2.1.num_batches_tracked", "postnet.convolutions.3.1.num_batches_tracked", "postnet.convolutions.4.1.num_batches_tracked".

Incomprehensible Audio generated by GL-vocoder at 169K Iterations

Hi, I have trained a model after 169K Iterations by tacotron2 with batch-size=8 on a 1080TI GPU. The training data is LJ-speech. But I get incomprehensible Audio synthesized by GL-vocoder on this model. The text to be synthesized is "All the wardsmen alike were more or less irresponsible."
165k

The synthesis waveform is incomprehensible.
165k.zip

Distributed training error

When I train the model with
"python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True"

I got this error:
Traceback (most recent call last):
File "/export/gpudata/deli/env/envs/py36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/export/gpudata/deli/env/envs/py36/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/export/docker/JXQ-23-46-49.h.chinabank.com.cn/deli/tacotron2/multiproc.py", line 15, in
stdout = None if i == 0 else open("logs/{}GPU{}.log".format(job_id, i), "w")
FileNotFoundError: [Errno 2] No such file or directory: 'logs/2018_07_10-102521_GPU_1.log'

This error occurs when running multiproc.py. And the traning process hangs after showing:
Initializing distributed.

I checked the multiproc.py code and find that that file should be created by the code. Is there anywhere goes wrong?

Any help would be appreciated.

ImportError: numpy.core.multiarray failed to import

python train.py --output_directory=outdir --log_directory=logdir
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
Traceback (most recent call last):
File "train.py", line 15, in
from model import Tacotron2
File "/home/adm101/tacotron2/model.py", line 5, in
from layers import ConvNorm, LinearNorm
File "/home/adm101/tacotron2/layers.py", line 2, in
from librosa.filters import mel as librosa_mel_fn
File "/anaconda/envs/py35/lib/python3.5/site-packages/librosa/init.py", line 12, in
from . import core
File "/anaconda/envs/py35/lib/python3.5/site-packages/librosa/core/init.py", line 102, in
from .time_frequency import * # pylint: disable=wildcard-import
File "/anaconda/envs/py35/lib/python3.5/site-packages/librosa/core/time_frequency.py", line 10, in
from ..util.exceptions import ParameterError
File "/anaconda/envs/py35/lib/python3.5/site-packages/librosa/util/init.py", line 70, in
from . import decorators
File "/anaconda/envs/py35/lib/python3.5/site-packages/librosa/util/decorators.py", line 9, in
from numba.decorators import jit as optional_jit
File "/anaconda/envs/py35/lib/python3.5/site-packages/numba/init.py", line 12, in
from .special import typeof, prange
File "/anaconda/envs/py35/lib/python3.5/site-packages/numba/special.py", line 4, in
from .parfor import prange
File "/anaconda/envs/py35/lib/python3.5/site-packages/numba/parfor.py", line 23, in
from numba import array_analysis, postproc, typeinfer
File "/anaconda/envs/py35/lib/python3.5/site-packages/numba/array_analysis.py", line 26, in
from numba.extending import intrinsic
File "/anaconda/envs/py35/lib/python3.5/site-packages/numba/extending.py", line 15, in
from .pythonapi import box, unbox, reflect, NativeValue
File "/anaconda/envs/py35/lib/python3.5/site-packages/numba/pythonapi.py", line 14, in
from numba import types, utils, cgutils, lowering, _helperlib
ImportError: numpy.core.multiarray failed to import

can't lanch RuntimeError

why I run train.py after I prepared data, it can't work. an issue called OOM?

File "train.py", line 79, in load_model
model = Tacotron2(hparams).cuda()
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCTensorRandom.cu:25

How to generate wave or mel-spectrum by tacotron2?

Hi, I have trained some models on LJ database by tacotron2 successfully. Now I want to synthesis waveform or generate mel-spectrum feature. I don't know how to use jupyter notebook because I use remote terminal to work. So my questions are :

  1. How to synthesize waveform and save it to given file on remote server?
  2. How to generate mel-spectrum feature and save it to given file on remote server?

Any help would be greatly appreciated! Thanks in advance!

Impossible to change n_frames_per_step

First of all I would like to thank you for this incredible and clean repository.

I am actualy working on a school project and I need to perform TTS. The problem is that I only have a 8Go GPU (NVIDIA GeForce GTX 980M) and I can't afford a cloud instance or some sort of a AI Machine. So in order to make the training work I had to bring down the batch size to 24. I also tried, after reading the paper and the Rayhan's implementation, to set the n_frames_per_step to 5 and encountered some issues.

Here are some of the issues I have:

1. n_frames_per_step can't be changed:
If i try to set n_frames_per_step to any value other than 1 (here 5) I get this error.

Epoch: 0
Traceback (most recent call last):
  File "train.py", line 288, in <module>
    args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
  File "train.py", line 213, in train
    y_pred = model(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yliess/Storage/Projects/tacotron2-master/model.py", line 508, in forward
    encoder_outputs, targets, memory_lengths=input_lengths)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yliess/Storage/Projects/tacotron2-master/model.py", line 395, in forward
    decoder_inputs = self.parse_decoder_inputs(decoder_inputs)
  File "/media/yliess/Storage/Projects/tacotron2-master/model.py", line 307, in parse_decoder_inputs
    int(decoder_inputs.size(1)/self.n_frames_per_step), -1)
RuntimeError: invalid argument 2: View size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at /pytorch/aten/src/THC/generic/THCTensor.c:276

2. I have trained on the LJ Speech dataset for about 4 days and the loss is stucked between 0.7 and 0.5 after some iterations and does not go further:

I am actually at the 27500 iterations and the loss is stucked. I also don't see any improvement on the attention alignement. How long do you think it will take ?

Train loss 28001 0.573519 Grad Norm 1.283919 5.50s/it
Train loss 28002 0.616863 Grad Norm 2.458358 5.09s/it
Train loss 28003 0.562185 Grad Norm 1.225074 5.42s/it
Train loss 28004 0.551899 Grad Norm 1.262375 5.45s/it
Train loss 28005 0.545977 Grad Norm 0.926398 5.34s/it
Train loss 28006 0.611665 Grad Norm 0.943002 5.68s/it
Train loss 28007 0.599455 Grad Norm 0.933666 5.64s/it
Train loss 28008 0.632478 Grad Norm 0.780330 5.38s/it
Train loss 28009 0.648359 Grad Norm 0.839933 5.14s/it
Train loss 28010 0.605244 Grad Norm 1.070807 5.34s/it
Train loss 28011 0.582034 Grad Norm 0.945326 5.23s/it

This is the training loss from the 8k step to the 28k:
loss

Not sure how to generate mels to be used with a pre-trained wavenet vocoder.

I put the inference.ipynb code into inference.py code so that it can run on gpu and save plots, audio (Griffin_lim) and mel_spectrgram.
The plots and audio are good, but when I put the generated Mel into pretrained wavenet_vocoder(https://github.com/r9y9/wavenet_vocoder) , the audio generated is not reasonable.
I wonder if this way of save mel-spectrogram is wrong.

mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)
np.save("test_mel.npy", mel_outputs.data.cpu().numpy()[0].T, allow_pickle=False)

How to use r9y9's pretrained wavenet vocoder to synthesize speech?

Hi, I use r9y9's pre-trained wavenet vocoder to synthesize speech, but I get noisy audio(
20180510_mixture_lj_checkpoint_step000320000_ema.wav.zip
). My GL-vocoder generated audio is normal(
test.wav.zip
).

I use the following code to save the mel-spectrum generated by tacotron2:
np.save("./works/test_mel.npy", mel_outputs_postnet.data.cpu().numpy()[0].T, allow_pickle=False)

And I use the following code to convert the mel-spectrum generated by tacotron2 into one matched with r9y9's pre-trained wavenet vocoder:
filter_length = 1024
hop_length = 256
win_length = 1024
sampling_rate = 22050
mel_fmin = 125
mel_fmax = 7600

taco_stft_other = TacotronSTFT(
filter_length, hop_length, win_length,
sampling_rate=sampling_rate, mel_fmin=mel_fmin, mel_fmax=mel_fmax)

#Project from Spectrogram to r9y9's WaveNet Mel-Spectrogram
mel_minmax = taco_stft_other.spectral_normalize(
torch.matmul(taco_stft_other.mel_basis, spec_from_mel))

np.save("./works/test_mel_wnet.npy", mel_minmax.data.cpu().numpy()[0].T, allow_pickle=False)

And the code below is used to synthesize speech:
python synthesis.py --preset=./pre-trained_models/20180510_mixture_lj_checkpoint_step000320000_ema.json --conditional=/qwork1/hlw74/ttswork/tacotron2/works/test_mel_wnet.1.npy ./pre-trained_models/20180510_mixture_lj_checkpoint_step000320000_ema.pth ./works

Incomprehensible Audio After 70K Iterations - What Could I be Doing Wrong?

Have tested with batch size 48, but same result after 26K iters.
Continued with batch size 8 up to 70K iters, but still same result.
I'm using a private augmented American English dataset (500++ hours) with same audio properties as LJSpeech WAVs (22,050 KHz, 16-bit, max length 10092).
Saved GL-synthesized WAV and fig found below for text "I am testing text to speech synthesis of this Tacotron2 model that has been trained for seventy thousand iterations." ...

https://soundcloud.com/user-345563487/70k-iters-nvidia-t2-gl/s-6XaCo
70kiters_t2gl

Zoneout

Hi, great repo and many thanks for putting your work up here for everyone.

One thing I just wanted to mention - I had a quick look at your code and it doesn't look like there is zoneout regularisation on the lstm decoders. Do you plan on adding it or indeed perhaps I missed it when I read the code?

Hangs when operating in multi-gpu mode.

I got this to train with a single GTX 1080TI but when trying to run it on machine with 4 * GTX 1080TI

it never returns from:

torch.distributed.init_process_group(
backend=hparams.dist_backend, init_method=hparams.dist_url,
world_size=n_gpus, rank=rank, group_name=group_name)

I've tried a few different parameters for backend (nccl, tcp, gloo). I installed nccl from nvidia.

These are the parameters in the function:

init method: file://distributed.dpt
backend: nccl
rank: 0
group: group_2018_06_12-204429
n_gpus: 4

Is it correct to use distributed_run=True even if the gpus are in the same machine?

I'm using Cuda 8.0, Ubuntu 16.04.

Some questions about the implementation

Hi, thank you for sharing the great work.
I have some questions about the implementation.

First question is about padding in data preparation.
https://github.com/NVIDIA/tacotron2/blob/master/data_utils.py#L103-107
The value of gated_padded[i] will be all of 0 when the i-th length is the maximum length.
Is it correct?

Second question is about network architecture of decoder.
https://github.com/NVIDIA/tacotron2/blob/master/model.py#L341
The lstm structure looks like different from the original.
Is there any specific reason?

Thank you in advance!

scaling Mel Spectrogram output for Wavenet Vocoder

Hello,

First of all thanks for the nice Tacotron 2 implementation.

I'm trying to use the trained Tacotron 2 outputs as inputs to r9r9's Wavenet vocoder. However his pre-trained wavenet works on scaled Mel Spectrogram between [0, 1].

What is the range for this tacotron 2 implementation, I'm having a hard time finding this out to use it for scaling.

For reference, this is r9r9's normalization function that he applies to the Mel Spectrogram before using it for training, which scales it between 0 and 1:

def _normalize(S): return np.clip((S - hparams.min_level_db) / -hparams.min_level_db, 0, 1)

Tensorflow dependency

Hi!
Is tensorflow used something else than the hyper parameter object and tensorboard?

thx!

ModuleNotFoundError: No module named 'PIL'

Hi, When I run command "/usr/bin/python train.py --output_directory=works --log_directory=works/log --hparams=batch_size=4" , I get a error "ModuleNotFoundError: No module named 'PIL'". Full log is as follows:
FP16 Run: False
Dynamic Loss Scaling: True
Distributed Run: False
cuDNN Enabled: True
cuDNN Benchmark: False
/qwork1/hlw74/ttswork/tacotron2/layers.py:35: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
self.conv.weight, gain=torch.nn.init.calculate_gain(w_init_gain))
/qwork1/hlw74/ttswork/tacotron2/layers.py:15: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
gain=torch.nn.init.calculate_gain(w_init_gain))
Epoch: 0
train.py:219: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
if hparams.distributed_run else loss.data[0]
train.py:227: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
model.parameters(), hparams.grad_clip_thresh)
Train loss 0 50.199665 Grad Norm 23.629131 3.03s/it
train.py:144: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
if distributed_run else loss.data[0]
Validation loss 0: 39.167839
Traceback (most recent call last):
File "train.py", line 291, in
args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
File "train.py", line 250, in train
reduced_val_loss, model, y, y_pred, iteration)
File "/qwork1/hlw74/ttswork/tacotron2/logger.py", line 34, in log_validation
iteration)
File "/usr/local/python3/lib/python3.6/site-packages/tensorboardX/writer.py", line 338, in add_image
self.file_writer.add_summary(image(tag, img_tensor), global_step)
File "/usr/local/python3/lib/python3.6/site-packages/tensorboardX/summary.py", line 158, in image
image = make_image(tensor)
File "/usr/local/python3/lib/python3.6/site-packages/tensorboardX/summary.py", line 164, in make_image
from PIL import Image
ModuleNotFoundError: No module named 'PIL'
*** Error in `/usr/bin/python': double free or corruption (!prev): 0x00000000025cf470 ***
Aborted (core dumped)

Distributed training hangs after "Initializing distributed" message

Hello everyone

I try to run Tacotron2 training in distributed mode. But when I run the command - it hangs and never reaches the training phase. No errors are shown.

The command I use:

CUDA_VISIBLE_DEVICES=0,1 python3 -m multiproc train.py --output_directory=outdir --log_directory=logdir -c=outdir/checkpoint_86000  --hparams=batch_size=36,distributed_run=True

What I tried so far:

  1. Switch distributed.py file to this one: https://github.com/yhgon/tacotron2/blob/master/distributed.py
  2. Use apex.parallel.multiproc instead of multiproc from the repo. In this case I get this error:
Traceback (most recent call last):
  File "/home/soul/anaconda3/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/soul/anaconda3/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/home/soul/anaconda3/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/__init__.py", line 2, in <module>
    from . import reparameterization
  File "/home/soul/anaconda3/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/reparameterization/__init__.py", line 1, in <module>
    from .weight_norm import WeightNorm
  File "/home/soul/anaconda3/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/reparameterization/weight_norm.py", line 3, in <module>
    from ..fp16_utils import Fused_Weight_Norm
  File "/home/soul/anaconda3/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/fp16_utils/__init__.py", line 13, in <module>
    from .fused_weight_norm import Fused_Weight_Norm
  File "/home/soul/anaconda3/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/fp16_utils/fused_weight_norm.py", line 4, in <module>
    import apex_C
ImportError: /home/soul/anaconda3/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE
  1. Follow advices from this issue: https://devtalk.nvidia.com/default/topic/1036195/container-pytorch/imagenet-hang-on-dgx-1-when-using-multiple-gpus-/
    with no luck
  2. Reboot the server

My set-up:
Ubuntu 16.04.1
PyTorch 0.4
cuda 9.0 with cudnn

Any ideas would be greatly appreciated.

P.S. As far as I can tell, the code hangs on this line of code:
torch.distributed.init_process_group()
from train.py, but I'm not really familiar with PyTorch to further debug it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.