Giter VIP home page Giter VIP logo

stabletts's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stabletts's Issues

Question about voice speaker Style

I am trying to train a TTS but I am wondering about the style of the speakers? My dataset contains multiple speakers with different speaking styles. Does the model retain the style for each voice or it uses only one style or it depends on the refer audio. For example In my dataset it contains Indian speaker who pauses nervously in conversation. When i train it with all the dataset and use one audio from that speaker and infer will it inhabit the nervous speaking style? Please I dearly wait for your response and thanks for this great repo

推理的声音质量?

很棒的项目,我训练后可以正常推理,发音也正常。
但和训练素材对比,音质听起来不是很明亮和清脆,(我确认不是训练素材质量问题)。

检查了训练素材音频采样率和配置保持一致 44100 。
如何改善推理的音质呢?
再次感谢~

Time required for the training of current pretrained models

Hi

I discovered this project and is pretty amazing the results provided, I saw that it got updated to get support for the Japanese language, and that gave me the curiosity of how many epochs or hours of training were required for the current pretrained models provided, so if is possible to know, I'll be very thankful since I'm gonna train Japanese from scratch and I would like to have some idea of the amount of time/hardware I'll require

Thanks!

NEW MODEL RELEASE DATE?

Please I have been working with your repo for while now and it is create and fast. I love it a lot. Please the model you release fails to pronounce some words correctly. Please when will you release the second Model. Thanks a lot for providing this repo to the community

ValueError: not enough values to unpack (expected 2, got 1)

python train.py

加载自定义词典成功
加载自定义词典成功
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0425 17:05:53.758841 4566 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=187465152
Traceback (most recent call last):
File "train.py", line 106, in
torch.multiprocessing.spawn(train, args=(world_size,), nprocs=world_size, join=True)
File "/usr/local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 246, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/usr/local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
while not context.join():
File "/usr/local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 145, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGBUS

env

pip list | grep torch

apex 0.1+2a4864d.abi0.dtk2310.torch2.1
torch 2.1.0a0+git793d2b5.abi0.dtk2310
torchaudio 2.1.2+4b32183.abi0.dtk2310.torch2.1.0a0
torchvision 0.16.0+git267eff6.abi0.dtk2310.torch2.1.0

Congratulations on the repo!

Hey hey @KdaiP,

Thanks for open-sourcing your implementation. I'm VB, I work in the open source audio team at Hugging Face. I'd love to know more and see how we can potentially help you with your experiments or share some of the learnings.

If you are interested then feel free to ping me at vaibhav[at]hf[dot]co

Looking forward to the release of the checkpoints!

Cheers,
VB

Is there a trick to stabilize training?

Hi! Thanks for open-sourcing your work! I like the idea, so I tried copying the CFM decoder and use it in my TTS setup. First I had some issues with NaN values after the attention in the estimator was computed for all the values that were padding. I fixed this issue using masked_fill with the x_mask instead of just multiplying with the x_mask, although I'm not sure why that was necessary for me.

But now that I could run a forward pass and calculate the CFM loss and it was not NaN, I thought it's good to go. However more problems came up. No matter what I tried, the CFM decoder would always produce a couple of NaN values after its first update during training. Do you have any idea what could cause this? Is there any specific thing I need to do to stabilize the training? The setup I am using works very well for lots of other architectures, including e.g. the normalizing flow decoder of PortaSpeech, which seems pretty similar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.