Comments (3)
This point has also triggered my attention. The change has occured between SEANet that introduced
For my case with EBEN, I just figured out that this criterion was working well (better than LSGAN, WGAN, classic GAN, or original geometric GAN formulation). A possible explanation may be the symmetric use case of the discriminator that should only output values in the range [-1,1], helping to stabilize training by avoiding overconfidence.
from encodec.
@jhauret it's worth noting that BigVGAN, which is also SOTA, uses an LSGAN loss.
I am not aware that the discriminators only output values in the range [-1, 1]. Why do you say that? It appears to me that many discriminators do not apply a squashing function at the last layer, in order to avoid vanishing gradient to the generator.
With that said, to answer @BakerBunker's question why
isn't used, one good reason is the loss balancer that encodec uses, so that many reconstruction, generator, and feature map losses can be combined elegantly. It's not clear how loss balancing should work if any of those loss values are negative, which could be the case with the hinge loss or the LSGAN loss.
from encodec.
Thanks for pointing out the use of LSGAN loss in BigVGAN.
Sorry if I was unclear. In fact, the values of the discriminators can be outside [-1, 1], but if you minimize
You are also right about your last point, but this loss change has been seen in other papers before such a loss balancer was introduced.
from encodec.
Related Issues (20)
- Changing existing models and training them HOT 1
- RuntimeError: No audio I/O backend is available HOT 5
- Codebook expiration does not take effect at all HOT 2
- MS-STFTDiscriminator: Why concatenation is done in a form of padding the real part?
- RVQ's password book
- training dataset
- Low quality of the reconstruct audio๏ผ HOT 2
- Interpolation between two audio clips? HOT 1
- License of the weights?
- Streaming example? HOT 1
- Encoding Long Audio Clips HOT 1
- Pretrained weights for model without lstm?
- Couldn't reach 'hf-internal-testing/librispeech_asr_dummy' on the Hub
- How STFT Discriminator Output Logits are used in Hinge Loss?
- Converting encodec model to ONNX
- Export Audio
- RuntimeError Couldn't find appropriate backend to handle uri {uri} and format {format}. HOT 5
- High bit rates
- Preparing Train Dataset (mixing strategy)
- Distributed Section Error in Encodec on Termux
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from encodec.