Giter VIP home page Giter VIP logo

Comments (10)

casic avatar casic commented on June 2, 2024 1

Thank you very much Martin. Have i nice YEAR.

from styletts2.

casic avatar casic commented on June 2, 2024 1

from styletts2.

martinambrus avatar martinambrus commented on June 2, 2024

You're running out of memory, quite possibly because of your config settings.

On a T4 GPU, you won't be able to train much, since it has a very limited VRAM capacity (16GB). The best I could do with a T4 on Google Colab was to fine-tune the LJSpeech model with my own set of 1 - 1.25 seconds long WAV files and settings batch_size to 2 and max_len to 100 in config.yml

You can check out the https://github.com/yl4579/StyleTTS2/blob/main/Colab/StyleTTS2_Finetune_Demo.ipynb Colab notebook for an example of how to fine-tune on a T4, or the other Colab notebooks ( https://github.com/yl4579/StyleTTS2/tree/main/Colab ) for an inspiration there.

from styletts2.

casic avatar casic commented on June 2, 2024

Thanks.
But is there a way to fine tune on all GPUS ???
Thanks in advance!

from styletts2.

martinambrus avatar martinambrus commented on June 2, 2024

No, this is not yet possible due to issue #7 and because fine-tuning script is built on top of phase 2 training script which suffers this issue. The best you can get is accelerated fine-tuning on a single processor which is marginally faster and uses a little bit less memory, but it's not a lot: accelerate launch --mixed_precision=fp16 --num_processes=1 train_finetune_accelerate.py --config_path ./Configs/config_ft.yml

from styletts2.

casic avatar casic commented on June 2, 2024

Thanks. One more , I have started first stage and is running about 10 hours , then terminal break process. I see it was stopped on 7/200 . Is there a way to continue after stop, not start from zero every time ???

from styletts2.

martinambrus avatar martinambrus commented on June 2, 2024

Yes, the training process saves checkpoint files - as many of them as you set in config via the save_freq option. It defaults to 2, so it saves files such as StyleTTS2/Models/LJSpeech/epoch_1st_00002.pth and you can use those to continue from where you finished.

To do so, just set the pretrained_model config option to a full path of the last checkpoint file (e.g. /home/user/StyleTTS2/Models/LJSpeech/epoch_1st_00007.pth), second_stage_load_pretrained parameter to false and load_only_params option to false. Then start training and it should pick up from the given checkpoint file.

If you wanted to resume 2nd stage training, you'll need to provide 2nd stage checkpoint file and set second_stage_load_pretrained parameter to true.

from styletts2.

yl4579 avatar yl4579 commented on June 2, 2024

Sorry for the late reply. I was quite busy recently. Finetuning should use all GPUs. I have tested the finetuning script on 4 NVidia A100 and it worked perfectly well. Have you checked using nvidia-smi that all GPUs were being used when you run the script, or only one of them?

from styletts2.

martinambrus avatar martinambrus commented on June 2, 2024

Sorry for the late reply. I was quite busy recently. Finetuning should use all GPUs. I have tested the finetuning script on 4 NVidia A100 and it worked perfectly well. Have you checked using nvidia-smi that all GPUs were being used when you run the script, or only one of them?

Then is the following statement from the README incorrect or did I simply misunderstand what you meant there?

The script is modified from train_second.py which uses DP, as DDP does not work for train_second.py. Please see the bold section above if you are willing to help with this problem. ... If you are using a single GPU (because the script doesn't work with DDP) and want to save training speed and VRAM, you can do (thank @korakoe for making the script at #100): accelerate launch --mixed_precision=fp16 --num_processes=1 train_finetune_accelerate.py --config_path ./Configs/config_ft.yml

from styletts2.

teamblubee avatar teamblubee commented on June 2, 2024

The fine tuning is a modified second training so it might work but I don't think so. If the author can create a post showing where the train second fails.

Once that's posted then I can take some time to examine it and see if I can debug the issue.

from styletts2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.