Hi Jaehyeon, Could you please provide instructions how to use pretra

Add these two hparams: <div class="snippet-clipboard-content notranslate position-

Add these two hparams: <div class="snippet-clipboard-content notransl

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Add new speaker voice about glow-tts HOT 5 OPEN

jaywalnut310 commented on August 18, 2024

Add new speaker voice

from glow-tts.

Comments (5)

echelon commented on August 18, 2024 6

Add these two hparams:

"n_speakers": 10,
"gin_channels": 16

I'm not sure what the ideal value for gin_channels is to get a rich embedding, and I asked in another thread.

Your training data and validation CSVs should be in this format:

filename|numeric_speaker_id|transcript

You'll need to swap out the loader:

-from data_utils import TextMelLoader, TextMelCollate 
+from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate

You'll also need to change the forward function to accept the g speaker id parameter and unpack the speaker ids from the loader enumerations.

from glow-tts.

marlon-br commented on August 18, 2024

i meant not to retrain the whole model once again. only to add one more voice

from glow-tts.

dechubby commented on August 18, 2024

Add these two hparams:
"n_speakers": 10,
"gin_channels": 16     
I'm not sure what the ideal value for gin_channels is to get a rich embedding, and I asked in another thread.

Your training data and validation CSVs should be in this format:
filename|numeric_speaker_id|transcript
You'll need to swap out the loader:
-from data_utils import TextMelLoader, TextMelCollate 
+from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate       
You'll also need to change the forward function to accept the g speaker id parameter and unpack the speaker ids from the loader enumerations.

Sorry for jumping in, could you please elaborate the last part about changing the forward function? Thanks in advance!

from glow-tts.

ppanja commented on August 18, 2024

Add these two hparams:
"n_speakers": 10,
"gin_channels": 16     
I'm not sure what the ideal value for gin_channels is to get a rich embedding, and I asked in another thread.

Your training data and validation CSVs should be in this format:
filename|numeric_speaker_id|transcript
You'll need to swap out the loader:
-from data_utils import TextMelLoader, TextMelCollate 
+from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate       
You'll also need to change the forward function to accept the g speaker id parameter and unpack the speaker ids from the loader enumerations.

Hi @echelon ,
This information is really useful. I believe I've done necessary changes as suggested by you. In my case I've kept n_speakers = 24 and gin_channels = 256 and rest of the parameters in base.json is same. Number of samples in training records are 9102. I'm getting below runtime error.

RuntimeError: Given groups=1, weight of size 256 448 3, expected input[1, 192, 89] to have 448 channels, but got 192 channels instead

Can you please advice what is going wrong here.

from glow-tts.

ppanja commented on August 18, 2024

Hi @marlon-br, @dechubby ,
Were you able to run in multi speaker mode? Have you done any other changes apart from whatever mentioned by echelon?
I'm getting some issue which I'm not able to debug.

Any help will be really appreciated.

Regards,
Prasanta

from glow-tts.

Add new speaker voice about glow-tts HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent