Hi, I just found the current duration model suffers when synthesizin

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Changes on duration model about forwardtacotron HOT 6 CLOSED

as-ideas commented on June 30, 2024

Changes on duration model

from forwardtacotron.

Comments (6)

cschaefer26 commented on June 30, 2024

Hi, sounds interesting, thanks for sharing! I definitely gonna try this again (at first i actually fed the prenet outputs and found the mel loss to be higher, but I didn't really check for prosody). Do you have a fork I could test it with locally? I would also be interested whether a log for durations really helps, maybe it emphasizes short durations more (given you use l1 or l2 losses)?

from forwardtacotron.

alexdemartos commented on June 30, 2024

Unfortunately I cannot share the repository since I am working for a private project, but I'll be glad to share ideas.

I am not sure on how much calculating the losses in the log domain helps, I guess not much, but seems reasonable to make the value range more compact (durations can go from 0 more than 100, which correspond to pauses in the audio). The LengthRegulator will convert them back to the linear domain before applying the expansion.

PD: Yes, I also switched to L2 loss for log-durations, forgot to mention.

from forwardtacotron.

cschaefer26 commented on June 30, 2024

Sounds good! I'll do another training soon with the mentioned changes.

from forwardtacotron.

m-toman commented on June 30, 2024

Replaced with fastspeech duration model. Subjectively the improvement isn't that huge but noticable in longer sentences.

Prosody not yet at vanilla taco2 level in my case but close. Overall l quite a bit worse still but the taco2 model I extracted durations from was awful. Perhaps I'll see improvement with forced alignment.

from forwardtacotron.

cschaefer26 commented on June 30, 2024

Cool thx for sharing. Did you use a vocoder to compare the results?

from forwardtacotron.

m-toman commented on June 30, 2024

Yeah when I used our preprocessing it worked with my melgan pretrained model although a bit noisy. I usually do a couple hours finetuning then. Also got a couple wavernn versions, which usually give slightly better results - but melgan is nicer for testing.

Hope I haven't got an error somewhere in the interface because the results are rather noisy. But not noisy enough for something serious ;)

from forwardtacotron.

Recommend Projects

Changes on duration model about forwardtacotron HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent