Hi, I was trying to train the model and it crashed at stage 6 <div class="snippet-

Hello, If I understand correctly, the fails at the get_egs.sh

Hello, If I understand correctly, the fails at t

Hello, If I understand correctly, the fails at th

Hello,have you run this project successed? <p dir="auto

Could not find common file: exp/xvector_nnet_1a/egs//egs.1.ark about pytorch_xvectors HOT 13 CLOSED

manojpamk commented on May 27, 2024

Could not find common file: exp/xvector_nnet_1a/egs//egs.1.ark

from pytorch_xvectors.

Comments (13)

fengye-lu commented on May 27, 2024

I have the same question.Could you tell me how to solve this question?

from pytorch_xvectors.

manojpamk commented on May 27, 2024

Hello,

The path exp/xvector_nnet_1a/egs/egs.1.ark should be replaced with the nnet3-egs files prepared by the get-egs command. The nnet3-egs files contain data suitable for DNN training.

Unfortunately, I cannot share this data directly. You can download them directly from the author (https://www.robots.ox.ac.uk/~vgg/data/voxceleb/) and place them on your computer. Make sure to provide the links here.

Manoj

from pytorch_xvectors.

NPC51129 commented on May 27, 2024

Hi, thank you for the advice. I already have the dataset voxceleb1 & 2 and musan and RIR on my disk, and have updated the paths in pytorch_run.sh. But the problem still exists. When i look at the project directory, there isn't any xvector_nnet_1a folder under exp/. It seems the egs files are not generated or not located here. what might cause this?

from pytorch_xvectors.

manojpamk commented on May 27, 2024

Hello,

If I understand correctly, the script fails at the get_egs.sh command. As far as this command is concerned, exp/xvector_nnet_1a/egs/ is an output directory. You can replace this with wherever you'd like to create the egs.*.ark files - ideally someplace with >400G space.

Just make sure to use the same path in the next step (train_xent.py)

Manoj

from pytorch_xvectors.

fengye-lu commented on May 27, 2024

Hi,thanks for your reply.But l cant't find get_egs.sh and train_xent.py in your project.So, my problem still exists.

from pytorch_xvectors.

manojpamk commented on May 27, 2024

Hi,

get_egs.sh is part of Kaldi which will be available once you create the softlink for sid directory at the beginning of pytorch_run.sh.
train_xent.py is available in this repo.

from pytorch_xvectors.

NPC51129 commented on May 27, 2024

Hello,

If I understand correctly, the script fails at the get_egs.sh command. As far as this command is concerned, exp/xvector_nnet_1a/egs/ is an output directory. You can replace this with wherever you'd like to create the egs.*.ark files - ideally someplace with >400G space.

Just make sure to use the same path in the next step (train_xent.py)

Manoj

Hi, thank you so much for your time.
I think the script failed at line 205 train_xent.py exp/xvector_nnet_1a/egs/, not the get_egs.sh command. Here's my full log in stage 6:

sid/nnet3/xvector/get_egs.sh --cmd run.pl --nj 8 --stage 0 --frames-per-iter 1000000000 --frames-per-iter-diagnostic 100000 --min-frames-per-chunk 200 --max-frames-per-chunk 400 --num-diagnostic-archives 3 --num-repeats 50 data/train_combined_no_sil exp/xvector_nnet_1a/egs/
sid/nnet3/xvector/get_egs.sh: expected file data/train_combined_no_sil/feats.scp
Namespace(baseLR=0.001, batchSize=32, featDim=30, featDir='exp/xvector_nnet_1a/egs/', local_rank=0, logStepSize=200, maxLR=0.002, modelType='xvecTDNN', noiseEps=1e-05, numArchives=84, numEgsPerArk=366150, numEpochs=2, numSpkrs=7323, optimMomentum=0.5, pDropMax=0.2, preFetchRatio=30, preTrainedModelDir=None, protoEpisodesPerArk=25, protoMaxClasses=35, protoMinClasses=5, resumeModelDir=None, stepFrac=0.5, supportFrac=0.7, totalEpisodes=100, trainingMode='init')
Initializing Model..
Reading from archive 1
Traceback (most recent call last):
  File "train_xent.py", line 69, in <module>
    for _,(X, Y) in par_data_loader:
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
    data.append(next(self.dataset_iter))
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 284, in __iter__
    with ext_open(self.ark_or_pipe, "rb") as fd:
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 106, in __enter__
    self.fd = _fopen(self.fname, self.mode)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 79, in _fopen
    "Could not find common file: {}".format(fname))
FileNotFoundError: Could not find common file: exp/xvector_nnet_1a/egs//egs.1.ark
Traceback (most recent call last):
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/tjw/anaconda3/envs/xvec/bin/python', '-u', 'train_xent.py', '--local_rank=0', 'exp/xvector_nnet_1a/egs/']' returned non-zero exit status 1.

As the Traceback info shows, the error occurred in the python script File "train_xent.py", line 69, in

from pytorch_xvectors.

fengye-lu commented on May 27, 2024

Hello,
If I understand correctly, the script fails at the get_egs.sh command. As far as this command is concerned, exp/xvector_nnet_1a/egs/ is an output directory. You can replace this with wherever you'd like to create the egs.*.ark files - ideally someplace with >400G space.
Just make sure to use the same path in the next step (train_xent.py)
Manoj

Hi, thank you so much for your time.
I think the script failed at line 205 train_xent.py exp/xvector_nnet_1a/egs/, not the get_egs.sh command. Here's my full log in stage 6:

sid/nnet3/xvector/get_egs.sh --cmd run.pl --nj 8 --stage 0 --frames-per-iter 1000000000 --frames-per-iter-diagnostic 100000 --min-frames-per-chunk 200 --max-frames-per-chunk 400 --num-diagnostic-archives 3 --num-repeats 50 data/train_combined_no_sil exp/xvector_nnet_1a/egs/
sid/nnet3/xvector/get_egs.sh: expected file data/train_combined_no_sil/feats.scp
Namespace(baseLR=0.001, batchSize=32, featDim=30, featDir='exp/xvector_nnet_1a/egs/', local_rank=0, logStepSize=200, maxLR=0.002, modelType='xvecTDNN', noiseEps=1e-05, numArchives=84, numEgsPerArk=366150, numEpochs=2, numSpkrs=7323, optimMomentum=0.5, pDropMax=0.2, preFetchRatio=30, preTrainedModelDir=None, protoEpisodesPerArk=25, protoMaxClasses=35, protoMinClasses=5, resumeModelDir=None, stepFrac=0.5, supportFrac=0.7, totalEpisodes=100, trainingMode='init')
Initializing Model..
Reading from archive 1
Traceback (most recent call last):
  File "train_xent.py", line 69, in <module>
    for _,(X, Y) in par_data_loader:
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
    data.append(next(self.dataset_iter))
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 284, in __iter__
    with ext_open(self.ark_or_pipe, "rb") as fd:
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 106, in __enter__
    self.fd = _fopen(self.fname, self.mode)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 79, in _fopen
    "Could not find common file: {}".format(fname))
FileNotFoundError: Could not find common file: exp/xvector_nnet_1a/egs//egs.1.ark
Traceback (most recent call last):
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/tjw/anaconda3/envs/xvec/bin/python', '-u', 'train_xent.py', '--local_rank=0', 'exp/xvector_nnet_1a/egs/']' returned non-zero exit status 1.

As the Traceback info shows, the error occurred in the python script File "train_xent.py", line 69, in

Hello,have you run this project successed?

from pytorch_xvectors.

manojpamk commented on May 27, 2024

Hello,
If I understand correctly, the script fails at the get_egs.sh command. As far as this command is concerned, exp/xvector_nnet_1a/egs/ is an output directory. You can replace this with wherever you'd like to create the egs.*.ark files - ideally someplace with >400G space.
Just make sure to use the same path in the next step (train_xent.py)
Manoj

Hi, thank you so much for your time.
I think the script failed at line 205 train_xent.py exp/xvector_nnet_1a/egs/, not the get_egs.sh command. Here's my full log in stage 6:

sid/nnet3/xvector/get_egs.sh --cmd run.pl --nj 8 --stage 0 --frames-per-iter 1000000000 --frames-per-iter-diagnostic 100000 --min-frames-per-chunk 200 --max-frames-per-chunk 400 --num-diagnostic-archives 3 --num-repeats 50 data/train_combined_no_sil exp/xvector_nnet_1a/egs/
sid/nnet3/xvector/get_egs.sh: expected file data/train_combined_no_sil/feats.scp
Namespace(baseLR=0.001, batchSize=32, featDim=30, featDir='exp/xvector_nnet_1a/egs/', local_rank=0, logStepSize=200, maxLR=0.002, modelType='xvecTDNN', noiseEps=1e-05, numArchives=84, numEgsPerArk=366150, numEpochs=2, numSpkrs=7323, optimMomentum=0.5, pDropMax=0.2, preFetchRatio=30, preTrainedModelDir=None, protoEpisodesPerArk=25, protoMaxClasses=35, protoMinClasses=5, resumeModelDir=None, stepFrac=0.5, supportFrac=0.7, totalEpisodes=100, trainingMode='init')
Initializing Model..
Reading from archive 1
Traceback (most recent call last):
  File "train_xent.py", line 69, in <module>
    for _,(X, Y) in par_data_loader:
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
    data.append(next(self.dataset_iter))
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 284, in __iter__
    with ext_open(self.ark_or_pipe, "rb") as fd:
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 106, in __enter__
    self.fd = _fopen(self.fname, self.mode)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/kaldi_python_io/inst.py", line 79, in _fopen
    "Could not find common file: {}".format(fname))
FileNotFoundError: Could not find common file: exp/xvector_nnet_1a/egs//egs.1.ark
Traceback (most recent call last):
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/tjw/anaconda3/envs/xvec/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/tjw/anaconda3/envs/xvec/bin/python', '-u', 'train_xent.py', '--local_rank=0', 'exp/xvector_nnet_1a/egs/']' returned non-zero exit status 1.

As the Traceback info shows, the error occurred in the python script File "train_xent.py", line 69, in

The output at second line indicates that feats.scp is missing, hence get_egs.sh did not actually succeed.
The output log from train_xent.py is caused by the above error.

from pytorch_xvectors.

NPC51129 commented on May 27, 2024

Hello,have you run this project successed?

Not yet. I turned to voxceleb v2 demo provided by kaldi, which is also an implementation of xvector. hope this can help you

from pytorch_xvectors.

NPC51129 commented on May 27, 2024

The output at second line indicates that feats.scp is missing, hence get_egs.sh did not actually succeed.
The output log from train_xent.py is caused by the above error.

Hi, thanks. I checked my data/train_combined_no_sil/ and there's no file named feats.scp. But I still dont understand why i dont have this.
codes that i have changed in your repository only includes voxceleb1_root and voxceleb2_root in pytorch_run.sh before running. what other work do i need to do to run this project?

from pytorch_xvectors.

fengye-lu commented on May 27, 2024

The output at second line indicates that feats.scp is missing, hence get_egs.sh did not actually succeed.
The output log from train_xent.py is caused by the above error.

Hi, thanks. I checked my data/train_combined_no_sil/ and there's no file named feats.scp. But I still dont understand why i dont have this.
codes that i have changed in your repository only includes voxceleb1_root and voxceleb2_root in pytorch_run.sh before running. what other work do i need to do to run this project?

Yes, I have the same problem as you. But today I found out that my Voxceleb dataset file was not in the right structure, which may have caused the data to be read incorrectly. So, I'm adjusting the file structure of the dataset.

from pytorch_xvectors.

NPC51129 commented on May 27, 2024

Yes, I have the same problem as you. But today I found out that my Voxceleb dataset file was not in the right structure, which may have caused the data to be read incorrectly. So, I'm adjusting the file structure of the dataset.

may i know what structure you have now? and does it work?

from pytorch_xvectors.

Could not find common file: exp/xvector_nnet_1a/egs//egs.1.ark about pytorch_xvectors HOT 13 CLOSED

Comments (13)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent