Giter VIP home page Giter VIP logo

Comments (4)

jaywalnut310 avatar jaywalnut310 commented on July 18, 2024 4

Sorry for the dense calculation of the MLE loss...

I'll let you know when I clean up the clutter in the code.
Temporarily, I'll explain the loss one by one.

The original line I implemented was:

l_mle = 0.5 * math.log(2 * math.pi) 
    + (torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2) - torch.sum(logdet)) 
    / (torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels)

It can be decomposed as

l_mle_normal = torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2)
l_mle_jacob = -torch.sum(logdet)
l_mle_sum = l_mle_normal + l_mle_jacob
denom = torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels
l_mle = 0.5 * math.log(2 * math.pi)  + l_mle_sum / denom
  1. l_mle_normal is the negative log likelihood of normal distribution N(z| y_m, y_logs) (except the constant term: 0.5*log(2pi)), where y_m and y_logs are the mean and logarithm of standard deviation of the prior distribution. Please see Equation 2 in the paper.
l_mle_normal = torch.sum(y_logs) + 0.5 * torch.sum(torch.exp(-2 * y_logs) * (z - y_m)**2) 
  1. l_mle_jacob denotes the negative log determinant of jacobian of flows. Please see Equation 1 in the paper.
l_mle_jacob = -torch.sum(logdet)
  1. l_mle_sum denotes the total negative log likelihood of the model, and denom is a denominator to average the total negative log likelihood across batch, time steps and mel channels (Our model force mel-spectrogram lengths y_lengths to be a multiple of n_sqz.).
l_mle_sum = l_mle_normal + l_mle_jacob
denom = torch.sum(y_lengths // hps.model.n_sqz) * hps.model.n_sqz * hps.data.n_mel_channels
  1. Add the the constant term, 0.5*log(2pi), excluded in step 1.
l_mle = 0.5 * math.log(2 * math.pi)  + l_mle_sum / denom

from glow-tts.

jaywalnut310 avatar jaywalnut310 commented on July 18, 2024 3

Yes the constant term is ignored in backpropgation. I just left it for exact calculation of log likelihood. And I saw AlignTTS, which also proposes an alignment search algorithm similar to Glow-TTS. I think it is clever, thanks for the heads up! Btw, I hope you enjoy the interesting characteristics of our model such as manipulating the latent representation of speech :)

from glow-tts.

trfnhle avatar trfnhle commented on July 18, 2024

Thanks for your detailed explanation. I think you could ignore the constant term, it does not contribute to backpropagation. Btw, I found another paper that has the same idea of learning implicitly duration of each character but in a different approach AlignTTS.

from glow-tts.

RKorzeniowski avatar RKorzeniowski commented on July 18, 2024

Just wanted to say amazing work! Love the controllability of length and expressiveness. I wanted to try a few ideas of my own using your repository as a codebase by I've run into a strange phenomenon. It's related to the loss function so maybe you could help me understand what is the cause. The strange thing is that the value of l_mle(g0) loss depends on the value range of Mel spectrograms.

Orange - LJSpeech wavs transformed into melspecs using default paramters. Melspec values range from 0.5 to -11.5
Pink - My data transformed the same way as LJSpeech
Blue - My data transformed to melspecs with different sfft parameters and then scaled to 0.5 to -11.5 range
Gray - My data transformed to melspecs with different sfft parameters. Value range from 0. to 0.76 (the same results if multiplied by -1)

Screenshot 2020-07-29 at 12 23 09

From what I was able to check in the case of data in the range of 0 to 0.76 values differ in the following way
l_mle_jacob - is bigger for Mel spectrograms with smaller absolute values. I think it makes sense because jacobian is calculated based on weights and they have to be bigger to result in the same values.
l_mle_normal - about the same
denom - obviously the same
l_mle - with different proportion of l_mle_sum and denom l_mle no longer normalizes to 1. I think it's a problem because the balance between g0 and g1 is disturbed and alignment gets worse

Also I find it quite strange that grad norm keeps increasing on both Blue and Gray curves. The only thing that they have in common is different than default melspec sfft parameters
Screenshot 2020-07-29 at 12 57 10

from glow-tts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.