Comments (12)
more clear now, thanks again
from tensorflowtts.
@superhg2012 audio quality is good after 60k training steps. before that steps, the model bias on mel-groundtruth so the quality is not good when inference.
from tensorflowtts.
if training to 200K, what about inferenced audio quality? will it be better?
from tensorflowtts.
@superhg2012 i trained to 120k and i saw the valid loss didn't decrease so i stopped it. I think the quality is very good at 120k now, don't u think so ?
from tensorflowtts.
@superhg2012 i trained to 120k and i saw the valid loss didn't decrease so i stopped it. I think the quality is very good at 120k now, don't u think so ?
I trained Tacotron without window when inference, audio quality is statisfying. I want to know that whether window or monotonic constraint during inference help improve audio quanlity or only useful for fastspeech alignment
from tensorflowtts.
@superhg2012 window contraints use in case an alignment explode when inferencing very long sentences. But in my model, somehow it can be inference the samples > 3000 decoder steps ^^. I see the model without window contraint is better. For training fastspeech, you can đecode tacotron without alignment contraint and use output mel spectrogram of tacotron for fastspeech training, it's my FastSpeech V3 (significantly improve over FastSpeech V1 (window contraint + teacher forcing + mel groundtruth))
from tensorflowtts.
Do you mean training Fastspeech V3 use predicted mel-spectrogram from Tacotron2 and alignments from Tacotron2 instead of ground truth mel-spectrorgam?
from tensorflowtts.
@superhg2012 yes. alignment from tacotron-2 120K without window masking trick and use predicted mel for fastspeech training. You can hear a audio samples on valid set, this is a significant improvement.
from tensorflowtts.
thank u , your alignment were generated same way as mel-spectrogram? or genreated with GTA mode?
from tensorflowtts.
no teacher forcing, no window masking, save durations and mels at the same time, you need modify the code a bit :d
from tensorflowtts.
All right, thank you! There are many differences with other open implementions of Fastspeech like Fastspeech. it's alignment are generated with teacher forcing. Also, In Fastspeech2 , they pointed out that using predicted mel-spectrogram from Teacher model(TransformerTTS) have some information loss compared with ground-truth ones, since the quality of the audio synthesized from the generated mel-spectrograms is usually worse than that from the ground-truth ones. So, puzzling....But I will try your idea.
from tensorflowtts.
@superhg2012 you can compared my results with other implementation to make decision :))). there are no puzzling here. On FastSpeech 1 they use alignment extracted from tacotron2 so they use predicted mel from tacotron to train fastspeech, it's make sense. On fastspeech 2, they use duration extracted from mel groundtruth by Montreal Forced Aligner so they use mel groundtruth to train :)). So, a durations and a mels should come from the same source :))).
from tensorflowtts.
Related Issues (20)
- Multi Speaker Training HOT 1
- Support Arabic Language HOT 2
- Tacotron2 Pre-training have difficulties
- Training Tacotron2 model became so slow after update HOT 1
- How do I get the RTF index HOT 1
- Japanese TTS model HOT 2
- Preprocessing error with ljspeech HOT 6
- tacotron2 parameter confusing, hop size configuration for databaker dataset is 256, not 300 HOT 1
- Installation on MacOS HOT 1
- Hifi-Gan config for Baker dataset HOT 1
- tensorflow-gpu==2.7.0 HOT 15
- Dose it support mutil speaker of chinese language ? HOT 1
- Android release as TTS engine HOT 7
- Train with another dataset HOT 2
- No module named 'tensorflow_tts' HOT 2
- Inference on MB MelGAN sounds great until testing on iOS HOT 3
- TensorFlowTTS support vietnamese HOT 2
- [MB_Melgan] Why is a model trained only generator is better than trained on both?
- support chinese HOT 2
- How to config CMakeLists.txt ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflowtts.