Comments (6)
Hi,
mfcc.conf is from the voxceleb recipe in kaldi. I forgot to add the symlink from pytorch_run.sh. It should be added after the recent commit.
Best,
Manoj
from pytorch_xvectors.
Hi,
Thanks for your quick reply!
I found when I set the num_workers=2 at DataLoader in train_xent.py, it will throw an error as follow:
Traceback (most recent call last): File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/queues.py", line 234, in _feed File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/lcf/anaconda3/envs/python36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 333, in reduce_storage RuntimeError: unable to open shared memory object </torch_20266_1801196515> in read-write mode Traceback (most recent call last): File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/queues.py", line 234, in _feed File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/lcf/anaconda3/envs/python36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 337, in reduce_storage File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__ OSError: [Errno 24] Too many open files
Is it not support multi-processing in loading data?
from pytorch_xvectors.
I faced the same issue, and unfortunately I couldn't find a solution. From what I understood: since we are serially accessing a egs.*.scp, it is not possible to read them with num_workers > 0.
Let me know if you find a workaround!
Best,
Manoj
from pytorch_xvectors.
I found it can be solved by add the code below at train_xent.py
import torch.multiprocessing as mp
mp.set_sharing_strategy('file_system')
but it can face a new problem. When I set num_workers=32, it can work, but the code always executes the for loop part for _, (X, Y) in par_data_loader:
and can't executes print('Archive processing time: %1.3f' %(time.time()-archive_start_time))
and print('Validation accuracy is %1.2f precent' %(valAcc))
.
The reason why I want to use multiple processes to load data is that I found these code has very low GPU utilization and is slow to train.
Finally I found that the main reason for affecting the training speed is not the problem of data loading, but the problem of model definition.
In the models.py script, the following code is very time-consuming:
if self.training:
x = x + torch.randn(x.size()).to(self.device)*eps
After I modify it as below, the training speed is much faster
if self.training:
#x = x + torch.randn(x.size()).to(self.device)*eps
shape = x.size()
noise = torch.cuda.FloatTensor(shape) if torch.cuda.is_available() else torch.FloatTensor(shape)
torch.randn(shape, out=noise)
x += noise*eps
from pytorch_xvectors.
When I set num_workers=32, it can work, but the code always executes the for loop part for _, (X, Y) in par_data_loader: and can't executes print('Archive processing time: %1.3f' %(time.time()-archive_start_time)) and print('Validation accuracy is %1.2f precent' %(valAcc))
Does the logging statement if batchI-loggedBatch >= args.logStepSize:
execute?
After I modify it as below, the training speed is much faster
if self.training:
#x = x + torch.randn(x.size()).to(self.device)eps
shape = x.size()
noise = torch.cuda.FloatTensor(shape) if torch.cuda.is_available() else torch.FloatTensor(shape)
torch.randn(shape, out=noise)
x += noiseeps
I can confirm the speedup! Do you want to create a PR?
from pytorch_xvectors.
The if batchI-loggedBatch >= args.logStepSize:
is executed, and the batchI
will larger than numBatchsPerArk
but can't break the for loop mention above.
yes, I will create a PR, thanks!
from pytorch_xvectors.
Related Issues (15)
- What's the shape of network's input HOT 3
- Missing file for training meta learning embeddings HOT 1
- run.pl tasks all failed while running pytorch_run.sh HOT 1
- Could not find common file: exp/xvector_nnet_1a/egs//egs.1.ark HOT 13
- Running speaker embeeding training on multiple GPUs on single node HOT 1
- train_proto
- [How to?] Embeddings for each .wav file in dataset folder HOT 2
- tdnn layers HOT 1
- training data
- pre-trained model download error
- How much performance can data augmentation improve? HOT 8
- Some problems when making evaluation on AMI dev and test dataset. HOT 2
- Provide example for inference in Python HOT 1
- Fail to get access to preTrained/models/
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch_xvectors.