Hey, thanks for releasing the code. I came across this after reading the paper. I just

[Q] Training the all the components together. about styletts HOT 1 CLOSED

yl4579 commented on September 15, 2024

[Q] Training the all the components together.

from styletts.

Comments (1)

yl4579 commented on September 15, 2024 1

The w/o augmentation in our ablation study is pretty much training all components together. We didn't specifically test training together, but we believe there should be no difference between training everything together and w/o augmentation because in most TTS systems the decoder does not depend on the gradient of variance predictor (there is a nograd operation after the text encoder output). When the decoder converges, the predictor should also converge, the same as in stage 2 of training, but we are not sure exactly what will happen.

You can also train two stages together with augmentation. It is not impossible to apply the duration-invariant data augmentation when you train them E2E, although you will need the decoder output with stretched or compressed representations to reconstruct the mel-spectrogram. If your decoder is not well-trained, however, this will derail the predictor and make it converge slower or maybe to a worse minimum, so I don't believe it should be better than 2-stage training as there is a nograd operation to the predictor (i.e., no other components need the gradient from predictor).

If you do not apply nograd operation, I don't know what will happen. You may try it and see. However, I do believe there is a reason why nograd operation is applied in most TTS systems between the variance predictor and the rest of the components. This is likely because the F0 predicted by the variance predictor cannot be exactly the ground truth F0, so if you force the decoder to reconstruct the mel-spectrogram with incorrect F0 and also force it to reconstruct with correct F0, it will find a point in-between as the optimal solution and lead to worse sound quality, or may not use the F0 information at all.

from styletts.

Recommend Projects

[Q] Training the all the components together. about styletts HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent