I am wondering how loss value looks like. Could you give some pictures of the loss dur

Loss value about glow-tts HOT 4 OPEN

jaywalnut310 commented on July 18, 2024

Loss value

from glow-tts.

Comments (4)

jaywalnut310 commented on July 18, 2024 4

Sorry for the dense calculation of the MLE loss...

I'll let you know when I clean up the clutter in the code.
Temporarily, I'll explain the loss one by one.

The original line I implemented was:

l_mle = 0.5 * math.log(2 * math.pi) 
    + (torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2) - torch.sum(logdet)) 
    / (torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels)

It can be decomposed as

l_mle_normal = torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2)
l_mle_jacob = -torch.sum(logdet)
l_mle_sum = l_mle_normal + l_mle_jacob
denom = torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels
l_mle = 0.5 * math.log(2 * math.pi)  + l_mle_sum / denom

l_mle_normal is the negative log likelihood of normal distribution N(z| y_m, y_logs) (except the constant term: 0.5*log(2pi)), where y_m and y_logs are the mean and logarithm of standard deviation of the prior distribution. Please see Equation 2 in the paper.

l_mle_normal = torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2)

l_mle_jacob denotes the negative log determinant of jacobian of flows. Please see Equation 1 in the paper.

l_mle_jacob = -torch.sum(logdet)

l_mle_sum denotes the total negative log likelihood of the model, and denom is a denominator to average the total negative log likelihood across batch, time steps and mel channels (Our model force mel-spectrogram lengths y_lengths to be a multiple of n_sqz.).

l_mle_sum = l_mle_normal + l_mle_jacob
denom = torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels

Add the the constant term, 0.5*log(2pi), excluded in step 1.

l_mle = 0.5 * math.log(2 * math.pi)  + l_mle_sum / denom

from glow-tts.

jaywalnut310 commented on July 18, 2024 3

Yes the constant term is ignored in backpropgation. I just left it for exact calculation of log likelihood. And I saw AlignTTS, which also proposes an alignment search algorithm similar to Glow-TTS. I think it is clever, thanks for the heads up! Btw, I hope you enjoy the interesting characteristics of our model such as manipulating the latent representation of speech :)

from glow-tts.

trfnhle commented on July 18, 2024

Thanks for your detailed explanation. I think you could ignore the constant term, it does not contribute to backpropagation. Btw, I found another paper that has the same idea of learning implicitly duration of each character but in a different approach AlignTTS.

from glow-tts.

RKorzeniowski commented on July 18, 2024

Just wanted to say amazing work! Love the controllability of length and expressiveness. I wanted to try a few ideas of my own using your repository as a codebase by I've run into a strange phenomenon. It's related to the loss function so maybe you could help me understand what is the cause. The strange thing is that the value of l_mle(g0) loss depends on the value range of Mel spectrograms.

Orange - LJSpeech wavs transformed into melspecs using default paramters. Melspec values range from 0.5 to -11.5
Pink - My data transformed the same way as LJSpeech
Blue - My data transformed to melspecs with different sfft parameters and then scaled to 0.5 to -11.5 range
Gray - My data transformed to melspecs with different sfft parameters. Value range from 0. to 0.76 (the same results if multiplied by -1)

From what I was able to check in the case of data in the range of 0 to 0.76 values differ in the following way
l_mle_jacob - is bigger for Mel spectrograms with smaller absolute values. I think it makes sense because jacobian is calculated based on weights and they have to be bigger to result in the same values.
l_mle_normal - about the same
denom - obviously the same
l_mle - with different proportion of l_mle_sum and denom l_mle no longer normalizes to 1. I think it's a problem because the balance between g0 and g1 is disturbed and alignment gets worse

Also I find it quite strange that grad norm keeps increasing on both Blue and Gray curves. The only thing that they have in common is different than default melspec sfft parameters

from glow-tts.

Loss value about glow-tts HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent