Hello I am trying to train Flowtron on LJSpeech Unfortunately af

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yes <a target="_blank" rel="noopener noreferrer nofollow" href="http

Bad attention weights about flowtron HOT 23 OPEN

nvidia commented on May 23, 2024

Bad attention weights

from flowtron.

Comments (23)

rafaelvalle commented on May 23, 2024

Yes, train it by progressively adding steps of flow as the model learns to attend on each step of flow.
Start with 1 step of flow, train it until it learns attention, use this model to warm-start a model with 2 steps of flow, and so on...

from flowtron.

dmitrii-obukhov commented on May 23, 2024

@rafaelvalle Thanks for your reply

I ran new training with option model_config.n_flows=1, but after 16 hours attention weights look bad again:

In one of the threads I read that good alignment is produced in less than 24 hours.

So, what could be wrong?

from flowtron.

rafaelvalle commented on May 23, 2024

Can you share your tensorboard plots?

from flowtron.

dmitrii-obukhov commented on May 23, 2024

Yes

from flowtron.

rafaelvalle commented on May 23, 2024

Does it have good attention around 60k iters
?

from flowtron.

dmitrii-obukhov commented on May 23, 2024

No. Attention on all iterations looks the same

from flowtron.

rafaelvalle commented on May 23, 2024

Make sure you trim silences from the beginning and end of your audio files

from flowtron.

dmitrii-obukhov commented on May 23, 2024

I use LJSpeech dataset for training. Any instructions on how to trim them?

Could the problem be that I use distributed training?

Also, I set the flag fp16_run=true

from flowtron.

adrianastan commented on May 23, 2024

Make sure you trim silences from the beginning and end of your audio files

Should there be no silence at all in the beginning and end, or should there be at least, let's say 0.1 seconds of silence?

from flowtron.

adrianastan commented on May 23, 2024

I use LJSpeech dataset for training. Any instructions on how to trim them?

The simplest way would be to use librosa.effects.trim()

from flowtron.

rafaelvalle commented on May 23, 2024

there should be no silence at all at the beginning and end of each audio file.
sox and librosa.effects.trim can be used to trim silences from beginning and end

from flowtron.

kurbobo commented on May 23, 2024

I have similar problem

I use LJSpeech dataset for training. Any instructions on how to trim them?

Could the problem be that I use distributed training?

Also, I set the flag fp16_run=true

Have you solved this problem?
Also I tried to predict not mel spectrogram but lpc spectrogram but always got such picture, does anybody know what is the problem?

from flowtron.

dmitrii-obukhov commented on May 23, 2024

The problem remained unresolved.
I tried to trim silences from the beginning and end of audio files with the librosa.effects.trim(), but the picture remains the same.

from flowtron.

rafaelvalle commented on May 23, 2024

@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33

from flowtron.

rafaelvalle commented on May 23, 2024

@adrianastan There should be no silence at the beginning or at the end of an audio file.

from flowtron.

zjFFFFFF commented on May 23, 2024

@rafaelvalle Can you tell me what the meaning of no silence is? If i use librosa.effects.trim(), top_db should be set to what.
For my data set, if I set top_db to 20, some sounds will also be cut off. setting it a little higher, it seems that some audio files still have silence at the beginning.

from flowtron.

dmitrii-obukhov commented on May 23, 2024

@kurbobo The problem was solved when I used encoder and embedding layers from pretrained model.

@zjFFFFFF In my case top_db = 30 works well enough

from flowtron.

zjFFFFFF commented on May 23, 2024

@DLeos

In fact, I got the same plot as you(training from scratch). But the validation loss does not seem to affect the experimental results. Between iterations: 800,000-950,000(when the iteration is 1,000,000, i can't get acceptable results), the model can generate acceptable sounds. So you can try different checkpoints one by one.

from flowtron.

kurbobo commented on May 23, 2024

@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33
@rafaelvalle
No, it's not always appears and I had already fixed the "type/bool" problem before train, but nevertheless sometimes such problem happens.
I have one more question: am I right, that flowtron in this repo converts every sentence to arpabet transciption and then train to map sequence of arpabet transcriptions to sqauence of frequency frames?

from flowtron.

kurbobo commented on May 23, 2024

@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33
@rafaelvalle
No, it's not always appears and I had already fixed the "type/bool" problem before train, but nevertheless sometimes such problem happens.
I have one more question: am I right, that flowtron in this repo converts every sentence to arpabet transciption and then train to map sequence of arpabet transcriptions to sqauence of frequency frames?

@rafaelvalle

from flowtron.

Liujingxiu23 commented on May 23, 2024

@kurbobo @rafaelvalle I tried mels , train n-flows =1 first and then use the model to warmup n-flows = 2 model , the two alignment weights are both right, and the wavs synthsized are good. But to lpc parameter that used in lpcnet vocoder, when n-flows=1, everythings seems good, loss is good, alignment is right, however when I train n-flows=2 with the trained n-flows=1 model as warmup, the second alignment failed, and the loss just vibrate without any descend.

from flowtron.

rafaelvalle commented on May 23, 2024

@Liujingxiu23 please share training, validations losses and attention for 1 step of flow model and 2 steps of flow model.

from flowtron.

rafaelvalle commented on May 23, 2024

Did you warmup the 2 flows model with the 1 flow model from a checkpoint around 200k?

from flowtron.

Bad attention weights about flowtron HOT 23 OPEN

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent