Giter VIP home page Giter VIP logo

Comments (13)

MXGray avatar MXGray commented on August 15, 2024 1

@rafaelvalle
Can you share your pretrained Tacotron2 model and hparams that generated the sample audio?

from tacotron2.

rafaelvalle avatar rafaelvalle commented on August 15, 2024 1

Pre-trained model has been made available on our README page.

from tacotron2.

rafaelvalle avatar rafaelvalle commented on August 15, 2024

Checkpoint files are saved to the output_directory that is specified when running the train.py command. The example code on our repo saves to outdir

from tacotron2.

OptimusPrimeCao avatar OptimusPrimeCao commented on August 15, 2024

@rafaelvalle
When I run train.py with default hparams on 8GB 1080 gpu, I get this error. I change batchsize to 24 but the error still exists!

src/tcmalloc.cc:278] Attempt to free invalid pointer 0x100000009
Traceback (most recent call last):
  File "train.py", line 291, in <module>
    args.warm_start, args.n_gpus, args.rank, args.group_name, hparams)
  File "train.py", line 209, in train
    for i, batch in enumerate(train_loader):
  File "/data00/home/caoyuetian/share/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 330, in __next__
    idx, batch = self._get_batch()
  File "/data00/home/caoyuetian/share/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 309, in _get_batch
    return self.data_queue.get()
  File "/data00/home/caoyuetian/share/anaconda3/lib/python3.6/multiprocessing/queues.py", line 335, in get
    res = self._reader.recv_bytes()
  File "/data00/home/caoyuetian/share/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/data00/home/caoyuetian/share/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/data00/home/caoyuetian/share/anaconda3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
  File "/data00/home/caoyuetian/share/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 227, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 125263) is killed by signal: Aborted.
Segmentation fault (core dumped)

from tacotron2.

rafaelvalle avatar rafaelvalle commented on August 15, 2024

This is probably related to your CPU not being able to load the data.
Try changing the number of workers or smaller batch size.

from tacotron2.

rafaelvalle avatar rafaelvalle commented on August 15, 2024

@OptimusPrimeCao The model uses approximately 300MB per sample. Try reducing your batch size to 16.

from tacotron2.

rafaelvalle avatar rafaelvalle commented on August 15, 2024

@MXGray We can share the hparams.
Please do not post the same message on multiple issues. I deleted your message from the Audio Examples issue.

from tacotron2.

gsoul avatar gsoul commented on August 15, 2024

Please share the hparams.

from tacotron2.

vijaysumaravi avatar vijaysumaravi commented on August 15, 2024

Is there a way I can continue training my model from a particular point?

To be specific, my training crashed at checkpoint_32000 because of memory issue which I have fixed. Can I now somehow resume training from this point or should I again begin from the start? If yes, how do I do it?

I wasn't sure if opening a new issue for this was required hence posting my comment here.

Any help is appreciated. Thanks!

from tacotron2.

vijaysumaravi avatar vijaysumaravi commented on August 15, 2024

Never mind, figured it out.

python train.py --output_directory=outdir --log_directory=logdir --checkpoint_path='outdir/checkpoint_32500'

from tacotron2.

beknazar avatar beknazar commented on August 15, 2024

Never mind, figured it out.

python train.py --output_directory=outdir --log_directory=logdir --checkpoint_path='outdir/checkpoint_32500'

@vijaysumaravi Mine was also stopped after 32.5 epochs, did you figure out the reason?

from tacotron2.

vijaysumaravi avatar vijaysumaravi commented on August 15, 2024

Reducing my batch size helped. I was training it on a single GPU.

from tacotron2.

ErfolgreichCharismatisch avatar ErfolgreichCharismatisch commented on August 15, 2024

Tutorial: Training on GPU with Colab, Inference with CPU on Server here.

from tacotron2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.