Comments (11)
because the default config is huage. 12*8(512channel)
from waveglow.
We haven't done a lot of ablative analysis yet to see how few channels we could get away with or how few layers. A lot of architecture decisions were made based on the early parts of the training curves which seem to favor bigger models. But if smaller models were trained for 500k iterations they might sound essentially as good.
from waveglow.
@hcwu1993 the trained model or the checkpoint file that is saved during training and includes includes the optimizer states?
from waveglow.
it should save model parameters and structure information according to the pytorch doc.
from waveglow.
By the way, the given model is 2GB. So the config is different from the paper? And i got a unusual result using this model. The F0 of generation wav is lower than the natural, it sounds like male voice.
from waveglow.
@hcwu1993 Unlike the checkpoints saved during draining that include optimizer states, the checkpoint we shared with the pretrained model only contains the model. Hence the difference in size.
from waveglow.
thank you. And i have another question. I use this command to synthesis wav, the default sampling rate is 22050Hz. it sounds like male voice. is there any problem in my command?
from waveglow.
Your mel-spectrograms must have the same parameters (sampling_rate, filter_lenght, hop_length, win_length, mel_fmin, mel_fmax) as your model.
The pretrained model we share was trained with "mel_fmax": 8000.0.
We eventually updated the config.json file to have "mel_fmax": 8000.0. as the default. https://github.com/NVIDIA/waveglow/blob/master/config.json#L20
If you trained your model before this update, it is possible that your model was trained with librosa's default: "mel_fmax": sampling_rate/2.
from waveglow.
@ Did your batch_size=24, train with fp16?
from waveglow.
No, we trained with FP32.
from waveglow.
Closing issue. Please re-open if needed.
from waveglow.
Related Issues (20)
- pose = list(self.allPose.keys())[list(self.allPose.values()).index(pose_ind)] ValueError: 'PD+00' is not in list
- GPU required or CPU compatible? HOT 1
- tts_waveglow_268m from_pretrained KeyError
- Waveglow Pretrained Model shall i use for male voice to do transfer learning? Need guidance
- I want to run a trained "yolov5s.pt" in a local pc for a usage of a opencv application .How can I do that in simples as a newbies to pytorch
- Is it possible to split denoiser module?
- Multispeaker trained model inferencing different voices HOT 1
- Can not load model HOT 1
- Converting audio samples to mel then back to audio just generates noise. HOT 3
- Convert log Mel bank energy to audio by your model
- Training different 'n_mel_channels' models HOT 2
- An important issue on multispeaker inference
- Continue Training from a checkpoint saved in checkpoints folder HOT 1
- spectrogram (image)-to-to wav
- how to make a list of the file names to use for training/testing? HOT 1
- why "audio = audio.astype('int16')" is uesd ? HOT 6
- Need help warm start model HOT 1
- ERROR: No matching distribution found for torch==1.0 HOT 2
- Can't install matplotlib
- WaveGlow: a Flow-based Generative Network for Speech Synthesis
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from waveglow.