I was using the inference notebook with the pre-trained models. I noticed that the syn

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Synthesized voice does not correspond to the speaker id about mellotron HOT 4 CLOSED

nvidia commented on August 23, 2024

Synthesized voice does not correspond to the speaker id

from mellotron.

Comments (4)

Sangkikim-77 commented on August 23, 2024

Hi,

Training using a pre-trained model can lead to faster convergence
By default, the speaker embedding layer is ignored

from mellotron.

mr-muyu commented on August 23, 2024

the pre-trained model does have speaker embedding as you can load the model and see that layer.
But it does seem to be quite picth/rythm related. you can try to extract pitch and rythm from a different wav to see/test

from mellotron.

paarthneekhara commented on August 23, 2024

Nevermind, I think there was a bug in loading the speaker dictionary in the inference on my end. Although for some speakers, the voice does not quite match the data. Maybe because of fewer corresponding speaker samples during training.

from mellotron.

deepuvikraman commented on August 23, 2024

@paarthneekhara - I am also facing this similar issue. I am using libritts pretrained model and trying to generate voice for a custom text using a reference style wav file. Though I specify a different speake_rid (in the example_filelists.txt along with style wav, its transcript), the voice generated is always same of a female voice. Do you know how to generate voice of a different speaker that is present in the pre-trained model?

from mellotron.

Recommend Projects

Synthesized voice does not correspond to the speaker id about mellotron HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent