Comments (8)
Hi, with alignments you mean the durations? ForwardTacotron.generate() returns a tuple where durations are the last element.
from forwardtacotron.
Thanks! Closing
from forwardtacotron.
How do I correlate these durations with my input sequence? Are they values in milliseconds? The length of the wav file from Melgan does not match the length of my durations.
from forwardtacotron.
It's the number of frames per symbol, multiply with hop size to get the number of samples (and accordongly further divide by sampling rate if you need time).
from forwardtacotron.
Thanks for answering my beginner qustion @m-toman! Just to clarify, I have to
dur[0] * hop_length / sample_rate
for every dur value? which means dur[0] * 256 / 22050
as presented by the hparams.py?
these are my sample duartions:
[2.2612348e-02 9.4400036e-01 6.7674098e+00 5.0057626e+00 5.0080757e+00
4.6233215e+00 2.9760070e+00 2.3248816e+00 1.9487766e+00 3.7769635e+00
4.6194615e+00 2.9589453e+00 5.8335261e+00 3.9601390e+00 4.8922276e+00
3.4933169e+00 4.3110704e+00 3.4784627e+00 4.4144235e+00 3.7462690e+00
3.7650070e+00 6.9434915e+00 6.1158099e+00 8.0207386e+00 4.3046041e+00
4.8667979e+00 3.5990303e+00 5.2950010e+00 4.5506124e+00 4.6985164e+00
3.9566250e+00 3.9664533e+00 5.8161201e+00 4.8916755e+00 2.1811330e+00
5.6532249e+00 4.0426488e+00 2.8019662e+00 5.3324432e+00 5.3495054e+00
2.2381749e+00 5.9258566e+00 3.4014165e+00 3.8372374e+00 4.1451240e+00
5.8608556e+00 3.8050027e+00 4.6089807e+00 5.5579085e+00 4.4227767e+00
4.1547208e+00 3.6267126e+00 5.6249385e+00 5.5808525e+00 3.0357168e+00
4.4313054e+00 5.9691072e+00 5.0331984e+00 5.3593402e+00 4.9867053e+00
3.2402251e+00 1.3627540e+00 2.2513001e+00 3.8786397e+00 4.6491919e+00
3.3426161e+00 5.1169338e+00 6.7440410e+00 5.0903549e+00 3.9325664e+00
3.1786590e+00 7.3065720e+00 6.7319651e+00 5.6631103e+00 3.7352045e+00
3.5417433e+00 4.1710243e+00 4.9729519e+00 3.0776734e+00 3.7664495e+00
4.0060887e+00 2.8614111e+00 3.7265482e+00 5.6844616e+00 5.0201597e+00
1.9191515e+00 2.1967342e+00 4.0543704e+00 3.6931145e+00 4.0714145e+00
5.0415058e+00 4.0292721e+00 3.3990631e+00 4.0123186e+00 3.6286800e+00
3.1456099e+00 3.5747185e+00 4.1106396e+00 4.0564170e+00 4.4438252e+00
5.8869915e+00 4.4292011e+00 6.1011786e+00 7.4916244e+00 6.2005177e+00
3.5886323e+00 2.7478096e+00 3.5223446e+00 1.9621753e+00 3.6003213e+00
3.8641853e+00 3.1578965e+00 3.6853700e+00 5.5982189e+00 4.8571954e+00
2.8158152e+00 3.4190748e+00 2.6847386e+00 4.3211427e+00 2.9020162e+00
3.8492827e+00 7.1659880e+00 6.5406370e+00 5.6910257e+00 4.5328012e+00
5.2582273e+00 3.9277108e+00 4.6910095e+00 2.3240290e+00 4.4862123e+00
3.7534225e+00 8.3864613e+00 7.8354692e+00 4.1877365e+00 8.0944567e+00
7.5279021e+00 3.6475091e+00 3.7855072e+00 4.5493841e+00 3.5912817e+00
3.2720089e+00 2.9245596e+00 2.7519238e+00 3.4271433e+00 3.6931608e+00
3.2099092e+00 5.8654342e+00 5.5745549e+00 4.6561956e+00 3.4918787e+00
2.6784046e+00 3.9213536e+00 4.4136915e+00 5.0109935e+00 3.7836897e+00
3.4412189e+00 4.7609525e+00 2.6459830e+00 3.6646116e+00 3.5330944e+00
2.6928513e+00 3.7275863e+00 4.9359412e+00 3.2233644e+00 3.5892196e+00
2.1269495e+00 3.2069538e+00 5.3838339e+00 4.5727897e+00 3.7962928e+00
3.8841012e+00 3.8840828e+00 4.4379663e+00 4.4393778e+00 3.9065814e+00
2.8252225e+00 4.5640140e+00 2.2762156e+00 3.4408638e+00 5.4917574e+00
4.8258510e+00 4.1536679e+00 3.8375087e+00 4.8880949e+00 2.1367426e+00
1.8380073e+00 2.7101300e+00 4.7183986e+00 4.1725836e+00 3.7513199e+00
3.6671889e+00 3.1984258e+00 5.9751067e+00 4.5713787e+00 5.4638953e+00
3.4530182e+00 3.4203277e+00 3.3166778e+00 2.8922384e+00 4.1497169e+00
4.8107204e+00 5.7362590e+00 2.8947024e+00 3.9340632e+00 5.9751110e+00
5.0411377e+00 1.9212455e+00 2.3416703e+00 4.1463981e+00 4.0034518e+00
4.8253579e+00 4.4901481e+00 5.1589351e+00 4.1915011e+00 2.9115534e+00
3.3291063e+00 3.7888665e+00 4.7728491e+00 3.4904647e+00 3.9432387e+00
7.3696356e+00 6.9134116e+00 6.0367808e+00 4.4357910e+00 5.4384804e+00
2.8855910e+00 6.4224935e+00 4.1878357e+00 4.0874143e+00 4.2316189e+00
3.6370394e+00 5.9071083e+00 5.3159356e+00 5.2228103e+00 6.1559463e+00
5.7995853e+00 5.6163077e+00 5.2773423e+00 5.5773730e+00 7.1068287e+00
3.8198769e+00 4.1103125e+00 6.0597706e+00 3.9329143e+00 3.1912720e+00
5.0875068e+00 4.9033203e+00 4.8280387e+00 5.4035139e+00 3.6662331e+00
4.0772071e+00 4.5702014e+00 4.6465340e+00 4.1961226e+00 6.0504155e+00
3.8360379e+00 4.9358091e+00 4.1648808e+00 4.6339092e+00 2.9729676e+00
4.7386842e+00 5.6018043e+00 4.3810005e+00 6.0938888e+00 7.3011327e+00
6.1679420e+00 3.2874308e+00 3.4410133e+00 3.1851013e+00 3.7927074e+00
5.4336090e+00 5.0340385e+00 2.8764882e+00 3.5582292e+00 3.8865030e+00
3.0821476e+00 2.9812224e+00 4.0978575e+00 5.1852741e+00 5.3205285e+00
4.1200075e+00 4.0693798e+00 5.4547844e+00 3.2341957e+00 5.0997691e+00
6.1641455e+00 4.0800910e+00 4.8740706e+00 4.2244139e+00 4.9745173e+00
4.5002141e+00 4.5753255e+00 3.7428198e+00 2.3115499e+00 2.7880044e+00
4.1610107e+00 4.1307893e+00 4.7656217e+00 5.2930527e+00 3.4151716e+00
5.7608094e+00 5.2816525e+00 7.6552243e+00 5.6390800e+00 3.8849237e+00
3.4349124e+00 5.5267310e+00 5.7679639e+00 4.9572883e+00 3.7846608e+00
0.0000000e+00 1.5842640e-01 1.6065538e-03 1.7802417e-04]
from forwardtacotron.
Yes. To get the rounding to be the same as the model does it to expand the states you can use this line:
So this then gives positions instead of durations.
Edit: I'll have to check that line again tomorrow. too late already ;)
from forwardtacotron.
So is there no way to know the end timing of a word?
from forwardtacotron.
Got it working as @m-toman mentioned! Thank You!
duration_hop = dur * 256
dur_sample = dur_hop /22050 #based on your sampling frequency
phone_timings = np.cumsum(dur_sample)
The word timings can be generated by splitting the " " phones
from forwardtacotron.
Related Issues (20)
- Training ForwardTacotron on a dataset comprised of multiple male voices as a single speaker dataset? HOT 10
- results dont match HOT 1
- implement hifigan vocoder?
- Adding pauses to the input text HOT 2
- confuse about duration extract HOT 10
- preprocess.py issues - RAM usage close to 100% but CPU usage is nonexistant HOT 16
- ValueError not enough values to unpack (expected 2 got 0) HOT 2
- making the system available for use with assistive technologies on windows HOT 1
- Bad Alignment HOT 1
- ValueError: need at least one array to stack train_tacotron.py line 192 HOT 1
- Facing problem at preprocessing
- Need instructions for fine tunning
- Problems with attention for dataset consisting of longer samples
- how to train a dataset using a pre-trained model?
- preprocess.py misuses Espeak backend, resulting in slow performance and memory leak HOT 2
- preprocess.py: list index out of range HOT 5
- Multispeaker and new neural voice creation HOT 12
- Non-Latin alphabets
- Bad Attention!
- Training a model twice using a different dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from forwardtacotron.