Hi this is a great work. I noticed that the ISTFT head does not use the same paddi

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ISTFT head use center padding but the same padding about vocos HOT 4 OPEN

nukes commented on May 24, 2024

ISTFT head use center padding but the same padding

from vocos.

Comments (4)

hubertsiuzdak commented on May 24, 2024

This setting needs to be consistent with the feature extractor. I guess you specifically mean the vocos.yaml config file that takes mel-spectrograms as inputs. Note that input mel-spectrograms are "center" padded, so to compensate for this we can use torch.istft with center=True which trims the corresponding samples.

However, for features like EnCodec tokens that are not padded in that specific way, using torch.istft with center=True would trim too many samples. In the vocos-encodec.yaml config file, you'll find padding: same in the ISTFTHead.

It would certainly be simpler if we could use torch.istft with center=False and slice the output audio. However, PyTorch does not allow this (for specific windows) due to how the NOLA (Nonzero Overlap Add) is checked. You might want to check out this issue: pytorch/pytorch#91309

Hope it helps!

from vocos.

nukes commented on May 24, 2024

That's helpful~ Thanks!

from vocos.

HeJinLing commented on May 24, 2024

Hi, i noticed that the parameter of n_fft in head is 1280, why did not you use 1024?

from vocos.

hubertsiuzdak commented on May 24, 2024

@HeJinLing

The key parameter here is hop_length. It should align with the resolution of your input features. Since EnCodec tokens are downsampled by a factor of 320, we've set the hop_length to 320.

Now, when you use a Hann window in the iSTFT, it's common to have a 75% overlap. This means our window_len should be four times the hop_length. That's why we set window_len (and n_fft) to 1280.

If you want to dive a bit deeper, I'd recommend looking into the constant overlap-add (COLA) constraint, there's a helpful discussion on this topic here: https://dsp.stackexchange.com/a/33615.

from vocos.

Recommend Projects

ISTFT head use center padding but the same padding about vocos HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent