Hi, Did anybody try to train the Flowtron flow architecture in an un

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Unconditioned Flowtron about flowtron HOT 6 OPEN

nvidia commented on May 24, 2024

Unconditioned Flowtron

from flowtron.

Comments (6)

rafaelvalle commented on May 24, 2024

Train a model with 1 step of flow first.
Then use this model to warm-start a model with 2 steps of flow.

from flowtron.

adrianastan commented on May 24, 2024

Hi,

Thanks for your reply. I indeed started training a 1-flow using the LibriSpeech train-clean-100 data using a modified unconditioned version of Flowtron. I then used the trained flow to warm-start a 2-flow architecture. However at inference there is nothing but noise: https://drive.google.com/file/d/1V7sX3Ma3RFBo6lNSCUxSsNjP3Y_HmAZo/view?usp=sharing.

I was expecting at least some babble noise.

Any hints on when is a goot point to start the second-flow training? Should I train more? Should I lower the learning rate?
Below are the loss curves for the 1st flow:

Thanks!

from flowtron.

rafaelvalle commented on May 24, 2024

The validation loss for your 1-step of flow model is starting to plateau.
Use this model to warm-start a 2-steps of flow model. I assume the validation loss will go down.
You can alternatively try the same experiment on LJS.

from flowtron.

adrianastan commented on May 24, 2024

I warmstarted a 2 flow model from the 1 flow weights and continued training. Training and validation losses are as below:

Still no speech-like output at inference.
https://drive.google.com/file/d/19OC2cSfPgfvrS0mrRx73bkLLKp0yt0v8/view?usp=sharing

I additionally started a subsequent 3 flow model, as well:

The output is as follows:

https://drive.google.com/file/d/1F7lXcEqx5_gqMDog4KgyahfKDGx7-175/view?usp=sharing

So I assume that this architecture might not be complex enough to estimate a multispeaker latent space. I will try to do the same thing on LJSpeech -- perhaps the conditions are simpler.

Thanks!

from flowtron.

rafaelvalle commented on May 24, 2024

@adrianastan if you trained a model with speaker embeddings, what happens if do this:
flowtron.infer(flowtron.forward(audio, speaker), other_speaker)

from flowtron.

adrianastan commented on May 24, 2024

I did not use speaker embeddings, just a multispeaker dataset. I removed all conditionings of the flow.

from flowtron.

Recommend Projects

Unconditioned Flowtron about flowtron HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent