Giter VIP home page Giter VIP logo

Comments (23)

rafaelvalle avatar rafaelvalle commented on May 23, 2024

Yes, train it by progressively adding steps of flow as the model learns to attend on each step of flow.
Start with 1 step of flow, train it until it learns attention, use this model to warm-start a model with 2 steps of flow, and so on...

from flowtron.

dmitrii-obukhov avatar dmitrii-obukhov commented on May 23, 2024

@rafaelvalle Thanks for your reply

I ran new training with option model_config.n_flows=1, but after 16 hours attention weights look bad again:

image

In one of the threads I read that good alignment is produced in less than 24 hours.

So, what could be wrong?

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

Can you share your tensorboard plots?

from flowtron.

dmitrii-obukhov avatar dmitrii-obukhov commented on May 23, 2024

Yes

image

image

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

Does it have good attention around 60k iters
?

from flowtron.

dmitrii-obukhov avatar dmitrii-obukhov commented on May 23, 2024

No. Attention on all iterations looks the same

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

Make sure you trim silences from the beginning and end of your audio files

from flowtron.

dmitrii-obukhov avatar dmitrii-obukhov commented on May 23, 2024

I use LJSpeech dataset for training. Any instructions on how to trim them?

Could the problem be that I use distributed training?

Also, I set the flag fp16_run=true

from flowtron.

adrianastan avatar adrianastan commented on May 23, 2024

Make sure you trim silences from the beginning and end of your audio files

Should there be no silence at all in the beginning and end, or should there be at least, let's say 0.1 seconds of silence?

from flowtron.

adrianastan avatar adrianastan commented on May 23, 2024

I use LJSpeech dataset for training. Any instructions on how to trim them?

The simplest way would be to use librosa.effects.trim()

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

there should be no silence at all at the beginning and end of each audio file.
sox and librosa.effects.trim can be used to trim silences from beginning and end

from flowtron.

kurbobo avatar kurbobo commented on May 23, 2024

I have similar problem

I use LJSpeech dataset for training. Any instructions on how to trim them?

Could the problem be that I use distributed training?

Also, I set the flag fp16_run=true

Have you solved this problem?
Also I tried to predict not mel spectrogram but lpc spectrogram but always got such picture, does anybody know what is the problem?
image

from flowtron.

dmitrii-obukhov avatar dmitrii-obukhov commented on May 23, 2024

The problem remained unresolved.
I tried to trim silences from the beginning and end of audio files with the librosa.effects.trim(), but the picture remains the same.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

@adrianastan There should be no silence at the beginning or at the end of an audio file.

from flowtron.

zjFFFFFF avatar zjFFFFFF commented on May 23, 2024

@rafaelvalle Can you tell me what the meaning of no silence is? If i use librosa.effects.trim(), top_db should be set to what.
For my data set, if I set top_db to 20, some sounds will also be cut off. setting it a little higher, it seems that some audio files still have silence at the beginning.

from flowtron.

dmitrii-obukhov avatar dmitrii-obukhov commented on May 23, 2024

@kurbobo The problem was solved when I used encoder and embedding layers from pretrained model.

@zjFFFFFF In my case top_db = 30 works well enough

from flowtron.

zjFFFFFF avatar zjFFFFFF commented on May 23, 2024

@DLeos

In fact, I got the same plot as you(training from scratch). But the validation loss does not seem to affect the experimental results. Between iterations: 800,000-950,000(when the iteration is 1,000,000, i can't get acceptable results), the model can generate acceptable sounds. So you can try different checkpoints one by one.

from flowtron.

kurbobo avatar kurbobo commented on May 23, 2024

@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33
@rafaelvalle
No, it's not always appears and I had already fixed the "type/bool" problem before train, but nevertheless sometimes such problem happens.
I have one more question: am I right, that flowtron in this repo converts every sentence to arpabet transciption and then train to map sequence of arpabet transcriptions to sqauence of frequency frames?

from flowtron.

kurbobo avatar kurbobo commented on May 23, 2024

@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33
@rafaelvalle
No, it's not always appears and I had already fixed the "type/bool" problem before train, but nevertheless sometimes such problem happens.
I have one more question: am I right, that flowtron in this repo converts every sentence to arpabet transciption and then train to map sequence of arpabet transcriptions to sqauence of frequency frames?

@rafaelvalle

from flowtron.

Liujingxiu23 avatar Liujingxiu23 commented on May 23, 2024

@kurbobo @rafaelvalle I tried mels , train n-flows =1 first and then use the model to warmup n-flows = 2 model , the two alignment weights are both right, and the wavs synthsized are good. But to lpc parameter that used in lpcnet vocoder, when n-flows=1, everythings seems good, loss is good, alignment is right, however when I train n-flows=2 with the trained n-flows=1 model as warmup, the second alignment failed, and the loss just vibrate without any descend.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

@Liujingxiu23 please share training, validations losses and attention for 1 step of flow model and 2 steps of flow model.

from flowtron.

rafaelvalle avatar rafaelvalle commented on May 23, 2024

Did you warmup the 2 flows model with the 1 flow model from a checkpoint around 200k?

from flowtron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.