Comments (9)
Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.
from comprehensive-e2e-tts.
Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.
@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.
from comprehensive-e2e-tts.
Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.
@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.
I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.
from comprehensive-e2e-tts.
Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.
@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.
I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.
in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.
from comprehensive-e2e-tts.
Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.
@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.
I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.
in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.
Only add Normalized Flow like postnet? Or also add posterior encoder as vits?
from comprehensive-e2e-tts.
Hi @GuangChen2016 , thanks for your attention. Yes, I met the same issue and I'm still figuring out what's going on. In my opinion, it may be from feature matching loss since it shows extremely lower value as time goes, which is not that normal under GAN training. Of course some configurations or model architectures may cause the issue, too. I'd update the project if I could handle it, but It would be better if you can contribute by your side! Thanks.
@keonlee9420 In your code, GAN's inputs is always predicted mels, which I think will cause unstable training. I think in initial training stage, we use true mels and compute loss between mel_true and mel_predict, and after some steps, we use predicted mels. Maybe this can help some.
I tried to train generator first for 50k steps, but it didn't work.I will try to train as you said.Hope it can solve this problem.
in my recent experiment, I found what I said can't solve the problem. In the training, I found mel_loss of vocoder is very big, I think use acoustic model outputs as inputs of vocoder will increase the difficulty of training. So now I add a Normalized Flow with the same as VITS, I get amazing results.
Can you post your checkpoint so we can see what amazing results look like?
from comprehensive-e2e-tts.
Hey guys, thank you all for your great efforts and discussion.
I've been resolving that issue, and finally make it work! Currently, I'm building a new open-source tts project for the general purpose, which is improved a lot and much easier to use, and I will share it soon including what @BridgetteSong suggested as well. Please stay tuned!
from comprehensive-e2e-tts.
Hey guys, thank you all for your great efforts and discussion.
I've been resolving that issue, and finally make it work! Currently, I'm building a new open-source tts project for the general purpose, which is improved a lot and much easier to use, and I will share it soon including what @BridgetteSong suggested as well. Please stay tuned!
I was wondering if you had some general advice for radtts, it seems you implemented that into your code base but driving something even with well-trained models has been a daunting task.
from comprehensive-e2e-tts.
@keonlee9420 How to solve this problem? I encountered the same synthesis result.
from comprehensive-e2e-tts.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comprehensive-e2e-tts.