Giter VIP home page Giter VIP logo

Comments (3)

jhauret avatar jhauret commented on June 27, 2024

This point has also triggered my attention. The change has occured between SEANet that introduced $\mathbb{E}[\textrm{max}(0,1-D(\hat{x}))]$ and MelGAN that used $-\mathbb{E}[D(\hat{x})]$. After that, many similar papers have used it, including its direct parent SoundStream.

For my case with EBEN, I just figured out that this criterion was working well (better than LSGAN, WGAN, classic GAN, or original geometric GAN formulation). A possible explanation may be the symmetric use case of the discriminator that should only output values in the range [-1,1], helping to stabilize training by avoiding overconfidence.

from encodec.

turian avatar turian commented on June 27, 2024

@jhauret it's worth noting that BigVGAN, which is also SOTA, uses an LSGAN loss.

I am not aware that the discriminators only output values in the range [-1, 1]. Why do you say that? It appears to me that many discriminators do not apply a squashing function at the last layer, in order to avoid vanishing gradient to the generator.

With that said, to answer @BakerBunker's question why
image isn't used, one good reason is the loss balancer that encodec uses, so that many reconstruction, generator, and feature map losses can be combined elegantly. It's not clear how loss balancing should work if any of those loss values are negative, which could be the case with the hinge loss or the LSGAN loss.

from encodec.

jhauret avatar jhauret commented on June 27, 2024

Thanks for pointing out the use of LSGAN loss in BigVGAN.

Sorry if I was unclear. In fact, the values of the discriminators can be outside [-1, 1], but if you minimize $\mathbb{E}[\textrm{max}(0, 1-D(\hat{x}))]$ there is no further optimization needed if $D(\hat{x}))>1$ and vice versa for $\mathbb{E}[\textrm{max}(0,1+D(\hat{x}))]$ if $D(\hat{x}))<-1$. So the values of the discriminators tend to be in [-1,1] once they are trained.

You are also right about your last point, but this loss change has been seen in other papers before such a loss balancer was introduced.

from encodec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.