My trained model is 4GB, how that? about waveglow HOT 11 CLOSED

nvidia commented on May 18, 2024

My trained model is 4GB, how that?

from waveglow.

Comments (11)

hcwu1993 commented on May 18, 2024

because the default config is huage. 12*8(512channel)

from waveglow.

RPrenger commented on May 18, 2024

We haven't done a lot of ablative analysis yet to see how few channels we could get away with or how few layers. A lot of architecture decisions were made based on the early parts of the training curves which seem to favor bigger models. But if smaller models were trained for 500k iterations they might sound essentially as good.

from waveglow.

rafaelvalle commented on May 18, 2024

@hcwu1993 the trained model or the checkpoint file that is saved during training and includes includes the optimizer states?

from waveglow.

hcwu1993 commented on May 18, 2024

it should save model parameters and structure information according to the pytorch doc.

from waveglow.

hcwu1993 commented on May 18, 2024

By the way, the given model is 2GB. So the config is different from the paper? And i got a unusual result using this model. The F0 of generation wav is lower than the natural, it sounds like male voice.

from waveglow.

rafaelvalle commented on May 18, 2024

@hcwu1993 Unlike the checkpoints saved during draining that include optimizer states, the checkpoint we shared with the pretrained model only contains the model. Hence the difference in size.

from waveglow.

hcwu1993 commented on May 18, 2024

thank you. And i have another question. I use this command to synthesis wav, the default sampling rate is 22050Hz. it sounds like male voice. is there any problem in my command?

from waveglow.

rafaelvalle commented on May 18, 2024

Your mel-spectrograms must have the same parameters (sampling_rate, filter_lenght, hop_length, win_length, mel_fmin, mel_fmax) as your model.

The pretrained model we share was trained with "mel_fmax": 8000.0.
We eventually updated the config.json file to have "mel_fmax": 8000.0. as the default. https://github.com/NVIDIA/waveglow/blob/master/config.json#L20

If you trained your model before this update, it is possible that your model was trained with librosa's default: "mel_fmax": sampling_rate/2.

from waveglow.

hdmjdp commented on May 18, 2024

@ Did your batch_size=24, train with fp16？

from waveglow.

rafaelvalle commented on May 18, 2024

No, we trained with FP32.

from waveglow.

rafaelvalle commented on May 18, 2024

Closing issue. Please re-open if needed.

from waveglow.

Recommend Projects

My trained model is 4GB, how that? about waveglow HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent