Hello, thank you for this project. I'm aware of two different implementations of FastS

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

FastSpeech 2 about tensorflowtts HOT 12 CLOSED

tensorspeech commented on May 15, 2024

FastSpeech 2

from tensorflowtts.

Comments (12)

dathudeptrai commented on May 15, 2024 1

@loretoparisi yes, i have a plan :))). But because it's a new paper so i won't public it soon when i finished implementation :D.

from tensorflowtts.

superhg2012 commented on May 15, 2024 1

Hi, I am reanding Fastspeech2 paper. could you help to explain how to perform quantize F0 and energy with log-scale bins for f0 and uniform bins for energy? @dathudeptrai

from tensorflowtts.

rishikksh20 commented on May 15, 2024 1

@dathudeptrai @superhg2012 Check this :
Energy quantize at Uniform bins

# Extract RMS energy
# y is raw wavefom
S = librosa.magphase(librosa.core.stft(y, n_fft=1024, hop_length=256))[0]
e = librosa.feature.rms(S=S)
bins = np.linspace(e.min(), e.max(), num=256)
e_quantize = np.digitize(e, bins)
e_quantize = torch.from_numpy(e_quantize-1).to(torch.device("cuda"))   # e_quantize-1 is to convert 1 to 256 --> 0 to 255
one_hot_e = F.one_hot(e_quantize.long(), 256).float()
one_hot_e.shape # torch.Size([1, 654, 256])

For Pitch

# Extract Pitch/f0 from raw waveform using PyWORLD
y = y.astype(np.float64)
f0, timeaxis = pw.harvest(y, 22050,f0_ceil=8000.0, frame_period=11.6) # For hop size 256 frame period is 11.6 ms
f0[f0 == 0] = 1 # Because log(0) is -ve infinite
log_f0 = np.log(f0)
bins = np.linspace(log_f0.min(), log_f0.max(), num=256)
p_quantize = np.digitize(log_f0, bins)
p_quantize = torch.from_numpy(p_quantize-1).to(torch.device("cuda"))
one_hot_p = F.one_hot(p_quantize.long(), 256).float()
one_hot_p.shape # torch.Size([654, 256])

from tensorflowtts.

dathudeptrai commented on May 15, 2024

@superhg2012 i haven't read paper in detail yet. This is what i think, when you compute F0 (f0 value in range 0 - 100 for example) then we have 10 bins (f0 from 0 - 9 in bin 1, f0 from 10-19 in bin 2...) then we will learn 10 f0 embeddings for each bin :))). The procedure is the same for energy. But note that before we separate bins we need apply function F to F0 and energy to rescale it, seems paper use log function :D

from tensorflowtts.

ZDisket commented on May 15, 2024

Is the latest released FastSpeech V3 = FastSpeech 2?

from tensorflowtts.

dathudeptrai commented on May 15, 2024

@ZDisket nope :)). it's fastspeech V3 :(. I'm trying to find free time to implement fastspeech v2, fastspeech v2 is not hard to implement, almost layers already implemented on this framework

from tensorflowtts.

superhg2012 commented on May 15, 2024

@superhg2012 i haven't read paper in detail yet. This is what i think, when you compute F0 (f0 value in range 0 - 100 for example) then we have 10 bins (f0 from 0 - 9 in bin 1, f0 from 10-19 in bin 2...) then we will learn 10 f0 embeddings for each bin :))). The procedure is the same for energy. But note that before we separate bins we need apply function F to F0 and energy to rescale it, seems paper use log function :D

In FastSpeech2 paper, they quantize F0 and energy of each frame to 256 possible values.

from tensorflowtts.

dathudeptrai commented on May 15, 2024

@superhg2012 yes, after you apply log function to F0, u can quantize it to any posible values. 256 is just 1 option :D. Example after apply log, min f0 is 0 and max f0 is 20, so you can calculate the range of each bins ((20 - 0) / 256) = 0.078125. So all f0 values in range 0, 0.078125 will quantize to 0, f0 values in range 0.078125, 0.078125*2 will quantize to 1, ... then we just need to learn 256 f0 embeddings.

from tensorflowtts.

superhg2012 commented on May 15, 2024

@dathudeptrai @rishikksh20 get it! thanks very much

from tensorflowtts.

dathudeptrai commented on May 15, 2024

@rishikksh20 @superhg2012 note that the min and max should be compute on all traing set :))).

from tensorflowtts.

rishikksh20 commented on May 15, 2024

Yes

from tensorflowtts.

superhg2012 commented on May 15, 2024

@dathudeptrai you mean the min and max should be computed from all dataset and then use it to compute bins and then apply bin to each wave?

from tensorflowtts.

FastSpeech 2 about tensorflowtts HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent