Giter VIP home page Giter VIP logo

Comments (9)

keonlee9420 avatar keonlee9420 commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

from comprehensive-e2e-tts.

BridgetteSong avatar BridgetteSong commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

from comprehensive-e2e-tts.

mayfool avatar mayfool commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

from comprehensive-e2e-tts.

BridgetteSong avatar BridgetteSong commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

from comprehensive-e2e-tts.

mayfool avatar mayfool commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

Only add Normalized Flow like postnet? Or also add posterior encoder as vits?

from comprehensive-e2e-tts.

skyler14 avatar skyler14 commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

Can you post your checkpoint so we can see what amazing results look like?

from comprehensive-e2e-tts.

keonlee9420 avatar keonlee9420 commented on May 25, 2024

Hey guys, thank you all for your great efforts and discussion.

I've been resolving that issue, and finally make it work! Currently, I'm building a new open-source tts project for the general purpose, which is improved a lot and much easier to use, and I will share it soon including what @BridgetteSong suggested as well. Please stay tuned!

from comprehensive-e2e-tts.

skyler14 avatar skyler14 commented on May 25, 2024

Hey guys, thank you all for your great efforts and discussion.

I've been resolving that issue, and finally make it work! Currently, I'm building a new open-source tts project for the general purpose, which is improved a lot and much easier to use, and I will share it soon including what @BridgetteSong suggested as well. Please stay tuned!

I was wondering if you had some general advice for radtts, it seems you implemented that into your code base but driving something even with well-trained models has been a daunting task.

from comprehensive-e2e-tts.

15755841658 avatar 15755841658 commented on May 25, 2024

@keonlee9420 How to solve this problem? I encountered the same synthesis result.

from comprehensive-e2e-tts.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.