Hi, thanks for your nice jobs. I used your codes for ny own datasets and the synthesiz

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="

Hi <a class="user-mention notranslate" data-hov

Hi <a class="user-mention notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

severe metallic sound about comprehensive-e2e-tts HOT 9 OPEN

keonlee9420 commented on May 25, 2024

severe metallic sound

from comprehensive-e2e-tts.

Comments (9)

keonlee9420 commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

from comprehensive-e2e-tts.

BridgetteSong commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

from comprehensive-e2e-tts.

mayfool commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

from comprehensive-e2e-tts.

BridgetteSong commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

from comprehensive-e2e-tts.

mayfool commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

Only add Normalized Flow like postnet? Or also add posterior encoder as vits?

from comprehensive-e2e-tts.

skyler14 commented on May 25, 2024

Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.

@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.

I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.

in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.

Can you post your checkpoint so we can see what amazing results look like?

from comprehensive-e2e-tts.

keonlee9420 commented on May 25, 2024

Hey guys, thank you all for your great efforts and discussion.

I've been resolving that issue, and finally make it work! Currently, I'm building a new open-source tts project for the general purpose, which is improved a lot and much easier to use, and I will share it soon including what @BridgetteSong suggested as well. Please stay tuned!

from comprehensive-e2e-tts.

skyler14 commented on May 25, 2024

Hey guys, thank you all for your great efforts and discussion.

I've been resolving that issue, and finally make it work! Currently, I'm building a new open-source tts project for the general purpose, which is improved a lot and much easier to use, and I will share it soon including what @BridgetteSong suggested as well. Please stay tuned!

I was wondering if you had some general advice for radtts, it seems you implemented that into your code base but driving something even with well-trained models has been a daunting task.

from comprehensive-e2e-tts.

15755841658 commented on May 25, 2024

@keonlee9420 How to solve this problem? I encountered the same synthesis result.

from comprehensive-e2e-tts.

severe metallic sound about comprehensive-e2e-tts HOT 9 OPEN

Comments (9)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent