hi, First of all thank you for your fantastic project! I have no idea which mf

I found it can be solved by add the code below at train_xent.py <div class="highli

which mfcc.conf do you use? about pytorch_xvectors HOT 6 CLOSED

manojpamk commented on May 24, 2024

which mfcc.conf do you use?

from pytorch_xvectors.

Comments (6)

manojpamk commented on May 24, 2024

Hi,

mfcc.conf is from the voxceleb recipe in kaldi. I forgot to add the symlink from pytorch_run.sh. It should be added after the recent commit.

Best,
Manoj

from pytorch_xvectors.

LCF2764 commented on May 24, 2024

Hi,
Thanks for your quick reply!
I found when I set the num_workers=2 at DataLoader in train_xent.py, it will throw an error as follow:
Traceback (most recent call last): File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/queues.py", line 234, in _feed File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/lcf/anaconda3/envs/python36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 333, in reduce_storage RuntimeError: unable to open shared memory object </torch_20266_1801196515> in read-write mode Traceback (most recent call last): File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/queues.py", line 234, in _feed File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/lcf/anaconda3/envs/python36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 337, in reduce_storage File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__ OSError: [Errno 24] Too many open files

Is it not support multi-processing in loading data?

from pytorch_xvectors.

manojpamk commented on May 24, 2024

I faced the same issue, and unfortunately I couldn't find a solution. From what I understood: since we are serially accessing a egs.*.scp, it is not possible to read them with num_workers > 0.

Let me know if you find a workaround!

Best,
Manoj

from pytorch_xvectors.

LCF2764 commented on May 24, 2024

I found it can be solved by add the code below at train_xent.py

import torch.multiprocessing as mp
mp.set_sharing_strategy('file_system')

but it can face a new problem. When I set num_workers=32, it can work, but the code always executes the for loop part for _, (X, Y) in par_data_loader: and can't executes print('Archive processing time: %1.3f' %(time.time()-archive_start_time)) and print('Validation accuracy is %1.2f precent' %(valAcc)).

The reason why I want to use multiple processes to load data is that I found these code has very low GPU utilization and is slow to train.
Finally I found that the main reason for affecting the training speed is not the problem of data loading, but the problem of model definition.

In the models.py script, the following code is very time-consuming:

if self.training:
    x = x + torch.randn(x.size()).to(self.device)*eps

After I modify it as below, the training speed is much faster

        if self.training:
            #x = x + torch.randn(x.size()).to(self.device)*eps
            shape = x.size()
            noise = torch.cuda.FloatTensor(shape) if torch.cuda.is_available() else torch.FloatTensor(shape)
            torch.randn(shape, out=noise)
            x += noise*eps

from pytorch_xvectors.

manojpamk commented on May 24, 2024

When I set num_workers=32, it can work, but the code always executes the for loop part for _, (X, Y) in par_data_loader: and can't executes print('Archive processing time: %1.3f' %(time.time()-archive_start_time)) and print('Validation accuracy is %1.2f precent' %(valAcc))

Does the logging statement if batchI-loggedBatch >= args.logStepSize: execute?

After I modify it as below, the training speed is much faster
if self.training:
#x = x + torch.randn(x.size()).to(self.device)eps
shape = x.size()
noise = torch.cuda.FloatTensor(shape) if torch.cuda.is_available() else torch.FloatTensor(shape)
torch.randn(shape, out=noise)
x += noiseeps

I can confirm the speedup! Do you want to create a PR?

from pytorch_xvectors.

LCF2764 commented on May 24, 2024

The if batchI-loggedBatch >= args.logStepSize: is executed, and the batchI will larger than numBatchsPerArk but can't break the for loop mention above.

yes, I will create a PR, thanks!

from pytorch_xvectors.

which mfcc.conf do you use? about pytorch_xvectors HOT 6 CLOSED

Comments (6)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent