Comments (12)
@loretoparisi yes, i have a plan :))). But because it's a new paper so i won't public it soon when i finished implementation :D.
from tensorflowtts.
Hi, I am reanding Fastspeech2 paper. could you help to explain how to perform quantize F0 and energy with log-scale bins for f0 and uniform bins for energy? @dathudeptrai
from tensorflowtts.
@dathudeptrai @superhg2012 Check this :
Energy quantize at Uniform bins
# Extract RMS energy
# y is raw wavefom
S = librosa.magphase(librosa.core.stft(y, n_fft=1024, hop_length=256))[0]
e = librosa.feature.rms(S=S)
bins = np.linspace(e.min(), e.max(), num=256)
e_quantize = np.digitize(e, bins)
e_quantize = torch.from_numpy(e_quantize-1).to(torch.device("cuda")) # e_quantize-1 is to convert 1 to 256 --> 0 to 255
one_hot_e = F.one_hot(e_quantize.long(), 256).float()
one_hot_e.shape # torch.Size([1, 654, 256])
For Pitch
# Extract Pitch/f0 from raw waveform using PyWORLD
y = y.astype(np.float64)
f0, timeaxis = pw.harvest(y, 22050,f0_ceil=8000.0, frame_period=11.6) # For hop size 256 frame period is 11.6 ms
f0[f0 == 0] = 1 # Because log(0) is -ve infinite
log_f0 = np.log(f0)
bins = np.linspace(log_f0.min(), log_f0.max(), num=256)
p_quantize = np.digitize(log_f0, bins)
p_quantize = torch.from_numpy(p_quantize-1).to(torch.device("cuda"))
one_hot_p = F.one_hot(p_quantize.long(), 256).float()
one_hot_p.shape # torch.Size([654, 256])
from tensorflowtts.
@superhg2012 i haven't read paper in detail yet. This is what i think, when you compute F0 (f0 value in range 0 - 100 for example) then we have 10 bins (f0 from 0 - 9 in bin 1, f0 from 10-19 in bin 2...) then we will learn 10 f0 embeddings for each bin :))). The procedure is the same for energy. But note that before we separate bins we need apply function F to F0 and energy to rescale it, seems paper use log function :D
from tensorflowtts.
Is the latest released FastSpeech V3 = FastSpeech 2?
from tensorflowtts.
@ZDisket nope :)). it's fastspeech V3 :(. I'm trying to find free time to implement fastspeech v2, fastspeech v2 is not hard to implement, almost layers already implemented on this framework
from tensorflowtts.
@superhg2012 i haven't read paper in detail yet. This is what i think, when you compute F0 (f0 value in range 0 - 100 for example) then we have 10 bins (f0 from 0 - 9 in bin 1, f0 from 10-19 in bin 2...) then we will learn 10 f0 embeddings for each bin :))). The procedure is the same for energy. But note that before we separate bins we need apply function F to F0 and energy to rescale it, seems paper use log function :D
In FastSpeech2 paper, they quantize F0 and energy of each frame to 256 possible values.
from tensorflowtts.
@superhg2012 yes, after you apply log function to F0, u can quantize it to any posible values. 256 is just 1 option :D. Example after apply log, min f0 is 0 and max f0 is 20, so you can calculate the range of each bins ((20 - 0) / 256) = 0.078125. So all f0 values in range 0, 0.078125 will quantize to 0, f0 values in range 0.078125, 0.078125*2 will quantize to 1, ... then we just need to learn 256 f0 embeddings.
from tensorflowtts.
@dathudeptrai @rishikksh20 get it! thanks very much
from tensorflowtts.
@rishikksh20 @superhg2012 note that the min and max should be compute on all traing set :))).
from tensorflowtts.
Yes
from tensorflowtts.
@dathudeptrai you mean the min and max should be computed from all dataset and then use it to compute bins and then apply bin to each wave?
from tensorflowtts.
Related Issues (20)
- Multi Speaker Training HOT 1
- Support Arabic Language HOT 2
- Tacotron2 Pre-training have difficulties
- Training Tacotron2 model became so slow after update HOT 1
- How do I get the RTF index HOT 1
- Japanese TTS model HOT 2
- Preprocessing error with ljspeech HOT 6
- tacotron2 parameter confusing, hop size configuration for databaker dataset is 256, not 300 HOT 1
- Installation on MacOS HOT 1
- Hifi-Gan config for Baker dataset HOT 1
- tensorflow-gpu==2.7.0 HOT 15
- Dose it support mutil speaker of chinese language ? HOT 1
- Android release as TTS engine HOT 7
- Train with another dataset HOT 2
- No module named 'tensorflow_tts' HOT 2
- Inference on MB MelGAN sounds great until testing on iOS HOT 3
- TensorFlowTTS support vietnamese HOT 2
- [MB_Melgan] Why is a model trained only generator is better than trained on both?
- support chinese HOT 2
- How to config CMakeLists.txt ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflowtts.