Giter VIP home page Giter VIP logo

Comments (6)

manojpamk avatar manojpamk commented on May 24, 2024

Hi,

mfcc.conf is from the voxceleb recipe in kaldi. I forgot to add the symlink from pytorch_run.sh. It should be added after the recent commit.

Best,
Manoj

from pytorch_xvectors.

LCF2764 avatar LCF2764 commented on May 24, 2024

Hi,
Thanks for your quick reply!
I found when I set the num_workers=2 at DataLoader in train_xent.py, it will throw an error as follow:
Traceback (most recent call last): File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/queues.py", line 234, in _feed File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/lcf/anaconda3/envs/python36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 333, in reduce_storage RuntimeError: unable to open shared memory object </torch_20266_1801196515> in read-write mode Traceback (most recent call last): File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/queues.py", line 234, in _feed File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/lcf/anaconda3/envs/python36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 337, in reduce_storage File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__ OSError: [Errno 24] Too many open files

Is it not support multi-processing in loading data?

from pytorch_xvectors.

manojpamk avatar manojpamk commented on May 24, 2024

I faced the same issue, and unfortunately I couldn't find a solution. From what I understood: since we are serially accessing a egs.*.scp, it is not possible to read them with num_workers > 0.

Let me know if you find a workaround!

Best,
Manoj

from pytorch_xvectors.

LCF2764 avatar LCF2764 commented on May 24, 2024

I found it can be solved by add the code below at train_xent.py

import torch.multiprocessing as mp
mp.set_sharing_strategy('file_system')

but it can face a new problem. When I set num_workers=32, it can work, but the code always executes the for loop part for _, (X, Y) in par_data_loader: and can't executes print('Archive processing time: %1.3f' %(time.time()-archive_start_time)) and print('Validation accuracy is %1.2f precent' %(valAcc)).

The reason why I want to use multiple processes to load data is that I found these code has very low GPU utilization and is slow to train.
Finally I found that the main reason for affecting the training speed is not the problem of data loading, but the problem of model definition.

In the models.py script, the following code is very time-consuming:

if self.training:
    x = x + torch.randn(x.size()).to(self.device)*eps

After I modify it as below, the training speed is much faster

        if self.training:
            #x = x + torch.randn(x.size()).to(self.device)*eps
            shape = x.size()
            noise = torch.cuda.FloatTensor(shape) if torch.cuda.is_available() else torch.FloatTensor(shape)
            torch.randn(shape, out=noise)
            x += noise*eps

from pytorch_xvectors.

manojpamk avatar manojpamk commented on May 24, 2024

When I set num_workers=32, it can work, but the code always executes the for loop part for _, (X, Y) in par_data_loader: and can't executes print('Archive processing time: %1.3f' %(time.time()-archive_start_time)) and print('Validation accuracy is %1.2f precent' %(valAcc))

Does the logging statement if batchI-loggedBatch >= args.logStepSize: execute?

After I modify it as below, the training speed is much faster
if self.training:
#x = x + torch.randn(x.size()).to(self.device)eps
shape = x.size()
noise = torch.cuda.FloatTensor(shape) if torch.cuda.is_available() else torch.FloatTensor(shape)
torch.randn(shape, out=noise)
x += noise
eps

I can confirm the speedup! Do you want to create a PR?

from pytorch_xvectors.

LCF2764 avatar LCF2764 commented on May 24, 2024

The if batchI-loggedBatch >= args.logStepSize: is executed, and the batchI will larger than numBatchsPerArk but can't break the for loop mention above.

yes, I will create a PR, thanks!

from pytorch_xvectors.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.