Giter VIP home page Giter VIP logo

Comments (4)

hubertsiuzdak avatar hubertsiuzdak commented on May 24, 2024

This setting needs to be consistent with the feature extractor. I guess you specifically mean the vocos.yaml config file that takes mel-spectrograms as inputs. Note that input mel-spectrograms are "center" padded, so to compensate for this we can use torch.istft with center=True which trims the corresponding samples.

However, for features like EnCodec tokens that are not padded in that specific way, using torch.istft with center=True would trim too many samples. In the vocos-encodec.yaml config file, you'll find padding: same in the ISTFTHead.

It would certainly be simpler if we could use torch.istft with center=False and slice the output audio. However, PyTorch does not allow this (for specific windows) due to how the NOLA (Nonzero Overlap Add) is checked. You might want to check out this issue: pytorch/pytorch#91309

Hope it helps!

from vocos.

nukes avatar nukes commented on May 24, 2024

That's helpful~ Thanks!

from vocos.

HeJinLing avatar HeJinLing commented on May 24, 2024

Hi, i noticed that the parameter of n_fft in head is 1280, why did not you use 1024?

from vocos.

hubertsiuzdak avatar hubertsiuzdak commented on May 24, 2024

@HeJinLing

The key parameter here is hop_length. It should align with the resolution of your input features. Since EnCodec tokens are downsampled by a factor of 320, we've set the hop_length to 320.

Now, when you use a Hann window in the iSTFT, it's common to have a 75% overlap. This means our window_len should be four times the hop_length. That's why we set window_len (and n_fft) to 1280.

If you want to dive a bit deeper, I'd recommend looking into the constant overlap-add (COLA) constraint, there's a helpful discussion on this topic here: https://dsp.stackexchange.com/a/33615.

from vocos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.