Comments (6)
Hi, sounds interesting, thanks for sharing! I definitely gonna try this again (at first i actually fed the prenet outputs and found the mel loss to be higher, but I didn't really check for prosody). Do you have a fork I could test it with locally? I would also be interested whether a log for durations really helps, maybe it emphasizes short durations more (given you use l1 or l2 losses)?
from forwardtacotron.
Unfortunately I cannot share the repository since I am working for a private project, but I'll be glad to share ideas.
I am not sure on how much calculating the losses in the log domain helps, I guess not much, but seems reasonable to make the value range more compact (durations can go from 0 more than 100, which correspond to pauses in the audio). The LengthRegulator will convert them back to the linear domain before applying the expansion.
PD: Yes, I also switched to L2 loss for log-durations, forgot to mention.
from forwardtacotron.
Sounds good! I'll do another training soon with the mentioned changes.
from forwardtacotron.
Replaced with fastspeech duration model. Subjectively the improvement isn't that huge but noticable in longer sentences.
Prosody not yet at vanilla taco2 level in my case but close. Overall l quite a bit worse still but the taco2 model I extracted durations from was awful. Perhaps I'll see improvement with forced alignment.
from forwardtacotron.
Cool thx for sharing. Did you use a vocoder to compare the results?
from forwardtacotron.
Yeah when I used our preprocessing it worked with my melgan pretrained model although a bit noisy. I usually do a couple hours finetuning then. Also got a couple wavernn versions, which usually give slightly better results - but melgan is nicer for testing.
Hope I haven't got an error somewhere in the interface because the results are rather noisy. But not noisy enough for something serious ;)
from forwardtacotron.
Related Issues (20)
- Training ForwardTacotron on a dataset comprised of multiple male voices as a single speaker dataset? HOT 10
- results dont match HOT 1
- implement hifigan vocoder?
- Adding pauses to the input text HOT 2
- confuse about duration extract HOT 10
- preprocess.py issues - RAM usage close to 100% but CPU usage is nonexistant HOT 16
- ValueError not enough values to unpack (expected 2 got 0) HOT 2
- making the system available for use with assistive technologies on windows HOT 1
- Bad Alignment HOT 1
- ValueError: need at least one array to stack train_tacotron.py line 192 HOT 1
- Facing problem at preprocessing
- Need instructions for fine tunning
- Problems with attention for dataset consisting of longer samples
- how to train a dataset using a pre-trained model?
- preprocess.py misuses Espeak backend, resulting in slow performance and memory leak HOT 2
- preprocess.py: list index out of range HOT 5
- Multispeaker and new neural voice creation HOT 12
- Non-Latin alphabets
- Bad Attention!
- Training a model twice using a different dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from forwardtacotron.