Comments (23)
Yes, train it by progressively adding steps of flow as the model learns to attend on each step of flow.
Start with 1 step of flow, train it until it learns attention, use this model to warm-start a model with 2 steps of flow, and so on...
from flowtron.
@rafaelvalle Thanks for your reply
I ran new training with option model_config.n_flows=1, but after 16 hours attention weights look bad again:
In one of the threads I read that good alignment is produced in less than 24 hours.
So, what could be wrong?
from flowtron.
Can you share your tensorboard plots?
from flowtron.
Yes
from flowtron.
Does it have good attention around 60k iters
?
from flowtron.
No. Attention on all iterations looks the same
from flowtron.
Make sure you trim silences from the beginning and end of your audio files
from flowtron.
I use LJSpeech dataset for training. Any instructions on how to trim them?
Could the problem be that I use distributed training?
Also, I set the flag fp16_run=true
from flowtron.
Make sure you trim silences from the beginning and end of your audio files
Should there be no silence at all in the beginning and end, or should there be at least, let's say 0.1 seconds of silence?
from flowtron.
I use LJSpeech dataset for training. Any instructions on how to trim them?
The simplest way would be to use librosa.effects.trim()
from flowtron.
there should be no silence at all at the beginning and end of each audio file.
sox and librosa.effects.trim can be used to trim silences from beginning and end
from flowtron.
I have similar problem
I use LJSpeech dataset for training. Any instructions on how to trim them?
Could the problem be that I use distributed training?
Also, I set the flag fp16_run=true
Have you solved this problem?
Also I tried to predict not mel spectrogram but lpc spectrogram but always got such picture, does anybody know what is the problem?
from flowtron.
The problem remained unresolved.
I tried to trim silences from the beginning and end of audio files with the librosa.effects.trim(), but the picture remains the same.
from flowtron.
@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33
from flowtron.
@adrianastan There should be no silence at the beginning or at the end of an audio file.
from flowtron.
@rafaelvalle Can you tell me what the meaning of no silence is? If i use librosa.effects.trim(), top_db should be set to what.
For my data set, if I set top_db to 20, some sounds will also be cut off. setting it a little higher, it seems that some audio files still have silence at the beginning.
from flowtron.
@kurbobo The problem was solved when I used encoder and embedding layers from pretrained model.
@zjFFFFFF In my case top_db = 30 works well enough
from flowtron.
In fact, I got the same plot as you(training from scratch). But the validation loss does not seem to affect the experimental results. Between iterations: 800,000-950,000(when the iteration is 1,000,000, i can't get acceptable results), the model can generate acceptable sounds. So you can try different checkpoints one by one.
from flowtron.
@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33
@rafaelvalle
No, it's not always appears and I had already fixed the "type/bool" problem before train, but nevertheless sometimes such problem happens.
I have one more question: am I right, that flowtron in this repo converts every sentence to arpabet transciption and then train to map sequence of arpabet transcriptions to sqauence of frequency frames?
from flowtron.
@kurbob does the attention map always look like that? You might have to change from byte to bool
https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33
@rafaelvalle
No, it's not always appears and I had already fixed the "type/bool" problem before train, but nevertheless sometimes such problem happens.
I have one more question: am I right, that flowtron in this repo converts every sentence to arpabet transciption and then train to map sequence of arpabet transcriptions to sqauence of frequency frames?
from flowtron.
@kurbobo @rafaelvalle I tried mels , train n-flows =1 first and then use the model to warmup n-flows = 2 model , the two alignment weights are both right, and the wavs synthsized are good. But to lpc parameter that used in lpcnet vocoder, when n-flows=1, everythings seems good, loss is good, alignment is right, however when I train n-flows=2 with the trained n-flows=1 model as warmup, the second alignment failed, and the loss just vibrate without any descend.
from flowtron.
@Liujingxiu23 please share training, validations losses and attention for 1 step of flow model and 2 steps of flow model.
from flowtron.
Did you warmup the 2 flows model with the 1 flow model from a checkpoint around 200k?
from flowtron.
Related Issues (20)
- Inference starting repeat itself. HOT 5
- List index out of range
- Request for clarification on some of the readme scripts. HOT 8
- Custom model resumed from pre-trained model has a stuttering problem.
- How would one keep the model loaded for immediate synthesis? HOT 17
- Inference on pre-trained model (flowtron_ljs) speaking nonsense. HOT 4
- Inference Demo "Hitting gate limit" HOT 2
- .
- inference speed on CPU
- Accelerated inference with TensorRT HOT 2
- Single word input leads to ValueError: Expected more than 1 spatial element when training, got input size torch.Size([1, 512, 1]) HOT 1
- Error on loading training model "_pickle.UnpicklingError: invalid load key, '<'"
- Custom trained model and dataset problem
- Index out of range for custom dataset.
- value error while training custom dataset
- TypeError: guvectorize() missing 1 required positional argument 'signature' HOT 1
- _pickle.UnpicklingError: invalid load key, '<'. in inference.py in colab HOT 3
- What's the filelist used to train LibriTTS2k pretrained embedding?
- Unable to train on custom data with multiple speakers HOT 6
- Which torch version to use?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flowtron.