Giter VIP home page Giter VIP logo

Comments (12)

dathudeptrai avatar dathudeptrai commented on May 15, 2024 1

@loretoparisi yes, i have a plan :))). But because it's a new paper so i won't public it soon when i finished implementation :D.

from tensorflowtts.

superhg2012 avatar superhg2012 commented on May 15, 2024 1

Hi, I am reanding Fastspeech2 paper. could you help to explain how to perform quantize F0 and energy with log-scale bins for f0 and uniform bins for energy? @dathudeptrai

from tensorflowtts.

rishikksh20 avatar rishikksh20 commented on May 15, 2024 1

@dathudeptrai @superhg2012 Check this :
Energy quantize at Uniform bins

# Extract RMS energy
# y is raw wavefom
S = librosa.magphase(librosa.core.stft(y, n_fft=1024, hop_length=256))[0]
e = librosa.feature.rms(S=S)
bins = np.linspace(e.min(), e.max(), num=256)
e_quantize = np.digitize(e, bins)
e_quantize = torch.from_numpy(e_quantize-1).to(torch.device("cuda"))   # e_quantize-1 is to convert 1 to 256 --> 0 to 255
one_hot_e = F.one_hot(e_quantize.long(), 256).float()
one_hot_e.shape # torch.Size([1, 654, 256])

For Pitch

# Extract Pitch/f0 from raw waveform using PyWORLD
y = y.astype(np.float64)
f0, timeaxis = pw.harvest(y, 22050,f0_ceil=8000.0, frame_period=11.6) # For hop size 256 frame period is 11.6 ms
f0[f0 == 0] = 1 # Because log(0) is -ve infinite
log_f0 = np.log(f0)
bins = np.linspace(log_f0.min(), log_f0.max(), num=256)
p_quantize = np.digitize(log_f0, bins)
p_quantize = torch.from_numpy(p_quantize-1).to(torch.device("cuda"))
one_hot_p = F.one_hot(p_quantize.long(), 256).float()
one_hot_p.shape # torch.Size([654, 256])

from tensorflowtts.

dathudeptrai avatar dathudeptrai commented on May 15, 2024

@superhg2012 i haven't read paper in detail yet. This is what i think, when you compute F0 (f0 value in range 0 - 100 for example) then we have 10 bins (f0 from 0 - 9 in bin 1, f0 from 10-19 in bin 2...) then we will learn 10 f0 embeddings for each bin :))). The procedure is the same for energy. But note that before we separate bins we need apply function F to F0 and energy to rescale it, seems paper use log function :D

from tensorflowtts.

ZDisket avatar ZDisket commented on May 15, 2024

Is the latest released FastSpeech V3 = FastSpeech 2?

from tensorflowtts.

dathudeptrai avatar dathudeptrai commented on May 15, 2024

@ZDisket nope :)). it's fastspeech V3 :(. I'm trying to find free time to implement fastspeech v2, fastspeech v2 is not hard to implement, almost layers already implemented on this framework

from tensorflowtts.

superhg2012 avatar superhg2012 commented on May 15, 2024

@superhg2012 i haven't read paper in detail yet. This is what i think, when you compute F0 (f0 value in range 0 - 100 for example) then we have 10 bins (f0 from 0 - 9 in bin 1, f0 from 10-19 in bin 2...) then we will learn 10 f0 embeddings for each bin :))). The procedure is the same for energy. But note that before we separate bins we need apply function F to F0 and energy to rescale it, seems paper use log function :D

In FastSpeech2 paper, they quantize F0 and energy of each frame to 256 possible values.

from tensorflowtts.

dathudeptrai avatar dathudeptrai commented on May 15, 2024

@superhg2012 yes, after you apply log function to F0, u can quantize it to any posible values. 256 is just 1 option :D. Example after apply log, min f0 is 0 and max f0 is 20, so you can calculate the range of each bins ((20 - 0) / 256) = 0.078125. So all f0 values in range 0, 0.078125 will quantize to 0, f0 values in range 0.078125, 0.078125*2 will quantize to 1, ... then we just need to learn 256 f0 embeddings.

from tensorflowtts.

superhg2012 avatar superhg2012 commented on May 15, 2024

@dathudeptrai @rishikksh20 get it! thanks very much

from tensorflowtts.

dathudeptrai avatar dathudeptrai commented on May 15, 2024

@rishikksh20 @superhg2012 note that the min and max should be compute on all traing set :))).

from tensorflowtts.

rishikksh20 avatar rishikksh20 commented on May 15, 2024

Yes

from tensorflowtts.

superhg2012 avatar superhg2012 commented on May 15, 2024

@dathudeptrai you mean the min and max should be computed from all dataset and then use it to compute bins and then apply bin to each wave?

from tensorflowtts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.