❓ Questions In the <a href="https://arxiv.org/abs/2210.13438" rel=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

A question about adversarial loss about encodec HOT 3 OPEN

facebookresearch commented on June 27, 2024 1

A question about adversarial loss

from encodec.

Comments (3)

jhauret commented on June 27, 2024

This point has also triggered my attention. The change has occured between SEANet that introduced $\mathbb{E}[\textrm{max}(0,1-D(\hat{x}))]$ and MelGAN that used $-\mathbb{E}[D(\hat{x})]$. After that, many similar papers have used it, including its direct parent SoundStream.

For my case with EBEN, I just figured out that this criterion was working well (better than LSGAN, WGAN, classic GAN, or original geometric GAN formulation). A possible explanation may be the symmetric use case of the discriminator that should only output values in the range [-1,1], helping to stabilize training by avoiding overconfidence.

from encodec.

turian commented on June 27, 2024

@jhauret it's worth noting that BigVGAN, which is also SOTA, uses an LSGAN loss.

I am not aware that the discriminators only output values in the range [-1, 1]. Why do you say that? It appears to me that many discriminators do not apply a squashing function at the last layer, in order to avoid vanishing gradient to the generator.

With that said, to answer @BakerBunker's question why
isn't used, one good reason is the loss balancer that encodec uses, so that many reconstruction, generator, and feature map losses can be combined elegantly. It's not clear how loss balancing should work if any of those loss values are negative, which could be the case with the hinge loss or the LSGAN loss.

from encodec.

jhauret commented on June 27, 2024

Thanks for pointing out the use of LSGAN loss in BigVGAN.

Sorry if I was unclear. In fact, the values of the discriminators can be outside [-1, 1], but if you minimize $\mathbb{E}[\textrm{max}(0, 1-D(\hat{x}))]$ there is no further optimization needed if $D(\hat{x}))>1$ and vice versa for $\mathbb{E}[\textrm{max}(0,1+D(\hat{x}))]$ if $D(\hat{x}))<-1$. So the values of the discriminators tend to be in [-1,1] once they are trained.

You are also right about your last point, but this loss change has been seen in other papers before such a loss balancer was introduced.

from encodec.

Recommend Projects

A question about adversarial loss about encodec HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent