Giter VIP home page Giter VIP logo

mravanelli / pytorch-kaldi Goto Github PK

View Code? Open in Web Editor NEW
2.4K 93.0 447.0 581 KB

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Shell 19.52% Python 43.98% Perl 36.50%
speech-recognition gru dnn kaldi rnn-model pytorch timit deep-learning deep-neural-networks recurrent-neural-networks

pytorch-kaldi's People

Contributors

akashmjn avatar dmitriy-serdyuk avatar hlthu avatar larspars avatar mravanelli avatar shigetajima avatar shuttle1987 avatar sungyihsun avatar tparcollet avatar tzyll avatar xinchiqiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-kaldi's Issues

Why there is a lot of errors when running joint training ?

I want to test the joint training part. But when I run the scrpits. I first counter that
"ERROR: features names (fea_name=) must contain only letters or numbers (no special characters as "_,$,..")"
So I change the fbank_clean to fbankclean in TIMIT_joint_training_liGRU_fbank.cfg. it is ok!
But when I run the scripts, I just counter that "IndexError: list index out of range" in
"pytorch-kaldi-master/utils.py", line 1546, in dict_fea_lab_arch
fea_lst_used.append((inp2+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).split(','))"
I just print the
pattern_fea is
"
fea_name=fbankrev
fea_lst=(.)
fea_opts=(.
)
cw_left=(.)
cw_right=(.
)
fea_name=fbankclean
fea_lst=exp/TIMIT_joint_training_liGRU_fbank/exp_files/train_TIMIT_tr_ep00_ck0_fbankclean.lst
fea_opts=apply-cmvn --utt2spk=ark:timit-data/timit/data/train/utt2spk ark:kalditimit/timit/s5/fbank/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0"

the fea_field is
"
fea_name=fbankrev
fea_lst=exp/TIMIT_joint_training_liGRU_fbank/exp_files/train_TIMIT_tr_ep00_ck0_fbankrev.lst
fea_opts=apply-cmvn --utt2spk=ark:timit-data/timit-rev/data/train/utt2spk scp:timit-data/timit-rev/data/train/cmvn.scp ark:- ark:- |
add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
"

But the print (re.findall(pattern_fea,fea_field)) is NULL .

So do I do something wrong? I just run the ori scripts. But It has so many errors.

Compute the alignments for test

Hi , Mirco@mravanelli . In step 4 of TIMIT tutorial, we should compute the alignments for test set.

steps/align_fmllr.sh --nj 4 data/dev data/lang exp/tri3 exp/tri3_ali_dev

Usually, I only calculate MFCC features for test data. Why do you do this step?

John

Pytorch 0.4

Hello and thank you for this very useful repo! Will you update the code for pytorch 0.4.0?
Thank you very much in advance!

License

What is the license for this repo?

How to decode from the dev data?

The cfg files specify $data_dir (which is $KALDI_ROOT....../data/test/ in default). For the purpose to get the LMWT that has the lowest WER in dev set(./RESULTS will only use the one that has lowest WER in test set), I changed the $data_dir to let it points to dev data folder($KALDI_ROOT/....../data/dev).

However, decoding process failed due to feature/frame files that were being decoded do not match the files with the correct labels among all score 0 - 9. Probably there is still something pointing to the test set.

Is there any guide for decoding from dev data set of KALDI?

[Question]Error when run_nn.py

I feel really sorry for asking this stupid question. I was going through the TIMIT experiment and got this error when I ran ./run_exp.sh cfg/baselines/TIMIT_MLP.cfg
The key mrjm0_sxx148 is not present in utt2spk. But it seems to me this key is related to training set for TIMIT instead of dev set. Do you know which part I got wrong in the process? Thanks

copy-feats scp:/home-nfs/yangc1/pytorch-kaldi/mfcc_shu/dev_split.000 ark:-
add-deltas --delta-order=2 ark:- ark:-
apply-cmvn --utt2spk=ark:/share/data/speech/yangc1/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home-nfs/yangc1/pytorch-kaldi/exp/splits_fea/dev_cmvn_speaker.ark ark:- ark:-
ERROR (apply-cmvn[5.4.197~1-8a27]:HasKey():util/kaldi-table-inl.h:2639) Attempting to read key mrjm0_sx148, which is not present in utt2spk map or similar map being read from ark:/share/data/speech/yangc1/kaldi/egs/timit/s5/data/dev/utt2spk

[ Stack-Trace: ]
apply-cmvn() [0x5722a8]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::RandomAccessTableReaderMapped<kaldi::KaldiObjectHolder<kaldi::Matrix > >::HasKey(std::string const&)
main
__libc_start_main
apply-cmvn() [0x4aa509]

ali-to-pdf /share/data/speech/yangc1/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev/final.mdl ark:- ark:-
LOG (ali-to-pdf[5.4.197~1-8a27]:main():ali-to-pdf.cc:68) Converted 400 alignments to pdf sequences.
Traceback (most recent call last):
File "run_nn.py", line 95, in
[data_name,data_set,data_end_index]=load_chunk(fea_scp,fea_opts,lab_folder,lab_opts,left,right,seed)
File "/home-nfs/yangc1/pytorch-kaldi/data_io.py", line 57, in load_chunk
[data_name,data_set,data_lab,end_index]=load_dataset(fea_scp,fea_opts,lab_folder,lab_opts,left,right)
File "/home-nfs/yangc1/pytorch-kaldi/data_io.py", line 35, in load_dataset
end_index[-1]=end_index[-1]-right
IndexError: list index out of range

feats_raw.scp (TIMIT_CNN_raw.conf)

Hi!
Thanks to your help, I'm understanding how it works.

This time, I'm trying to check "TIMIT_CNN_raw.conf", inserting "feats_raw.scp" as "wav.scp"
And it did not work.
I think I don't know how to extract correct .scp for this conf.
I could not find any proper "steps/make_**".

In training quantization of weights (parameters) and activations

Have there been any attempts at implementing in training quantization, like that in "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1" using pytorch-kaldi? If not can the admin provide direction on how to proceed to obtain in-training quantization?

Failed to open script file /mfcc_shu/train_split.000

Hello and thank you for this useful project. I try to run timit experiment following your instructions using tri3 alignments. But it returns following error - WARNING (copy-feats[5.4]:Open():util/kaldi-table-inl.h:106) Failed to open script file /mfcc_shu/train_split.000.

Here are the conf and log file:

conf&log.zip

what should i do now. Thanks advance.

n_chunks > 100

Hi,
Setting the number of chunks to a number > 100 doesn't result in an error, but the model will not be trained on all of the data. compute_n_chunks in utils.py will return 99 in this case, due to "02d" formatting of the checkpoint number. So maybe you'd like to add a warning about this, or change the checkpoint number formatting.
Joanna

online speech recognition

Hi,I have completed the training of the acoustic model. How can I use it for online speech recognition?look forward to your kind advice.

Prediction on new audio file

Hi,
How to use pytorch-kaldi in production environment after training the Model.
I have models ready, which I have generated by using core Kaldi. The problem I am facing is that it takes lot of time during decoding/prediction phase.

So please let me know how to use this tool during live environment.
Also if you have useful suggestions for Kaldi deployments please share.

Also I am planing to integrate Kaldi model in one of ours applications which is live.
So yours suggestions will be very useful for me.

--
thanks
Nisar

Can't find egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_al

Hi,

I followed the instructions and after running run.sh in the timit example:

I only get dnn4_pretrain-dbn_dnn_al_dev in the exp folder, the corresponding test and train files are not there. So, I can't run the dnn alignments step.

The RESULTS file looks like this:

Hybrid System (Karel's DNN)

---------------------------------Dev Set------------------------------------------
%WER 17.5 | 400 15057 | 84.6 10.5 4.8 2.2 17.5 98.5 | -0.471 | exp/dnn4_pretrain-dbn_dnn/decode_dev/score_6/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 18.5 | 192 7215 | 84.2 11.0 4.8 2.7 18.5 100.0 | -1.151 | exp/dnn4_pretrain-dbn_dnn/decode_test/score_4/ctm_39phn.filt.sys

Hybrid System (Karel's DNN), sMBR training

---------------------------------Dev Set------------------------------------------
%WER 17.3 | 400 15057 | 85.5 10.6 4.0 2.7 17.3 98.5 | -0.696 | exp/dnn4_pretrain-dbn_dnn_smbr/decode_dev_it1/score_5/ctm_39phn.filt.sys
%WER 17.3 | 400 15057 | 85.4 10.7 3.9 2.7 17.3 98.5 | -0.380 | exp/dnn4_pretrain-dbn_dnn_smbr/decode_dev_it6/score_7/ctm_39phn.filt.sys
--------------------------------Test Set------------------------------------------
%WER 18.6 | 192 7215 | 84.2 11.1 4.7 2.8 18.6 100.0 | -0.816 | exp/dnn4_pretrain-dbn_dnn_smbr/decode_test_it1/score_5/ctm_39phn.filt.sys
%WER 18.8 | 192 7215 | 84.7 11.4 3.9 3.5 18.8 100.0 | -0.819 | exp/dnn4_pretrain-dbn_dnn_smbr/decode_test_it6/score_5/ctm_39phn.filt.sys

sMBR not helpful here...

How to get the code working for the timit example in this case?

Thanks!

Using SincNet for speech recognition

I want to train a model using SincNet, so I copy the data inkaldi/egs/timit/s5/data into my pytorch-kaldi/quick_test/data and edit the save_raw_fea.py as follows:

lab_folder='quick_test/tri3_ali'
lab_opts='ali-to-pdf'
out_folder='raw_TIMIT_200ms/train'
wav_lst='/data/zhanghao/pytorch-kaldi/quick_test/data/train/wav.scp'
scp_file_out='quick_test/data/train/feats_raw.scp'

Note:Because I don't have the wav_lst, I use the wav.scp which is originally from the kaldi/egs/timit/s5/data/train. Then when I ran the save_raw_fea.py, I got the error:

ali-to-pdf quick_test/exp_ali/tri3_ali/final.mdl ark:- ark:- 
LOG (ali-to-pdf[5.5.166~1-013489]:main():ali-to-pdf.cc:68) Converted 3696 alignments to pdf sequences.
Traceback (most recent call last):
  File "save_raw_fea.py", line 76, in <module>
    [fs,signal]=scipy.io.wavfile.read(sig_path)
  File "/usr/lib/python2.7/dist-packages/scipy/io/wavfile.py", line 193, in read
    fsize, is_big_endian = _read_riff_chunk(fid)
  File "/usr/lib/python2.7/dist-packages/scipy/io/wavfile.py", line 140, in _read_riff_chunk
    raise ValueError("Not a WAV file.")
ValueError: Not a WAV file.

Then I edited the line 75 assig_path=sig_file.split(' ')[4] so that the sig_path was correct(e.g./data/zhanghao/kaldi/egs/timit/data/TIMIT/TRAIN/DR2/FAEM0/SI1392.WAV ).
I think the datas which come from timit are all wav files, so I don't know why the error occurred.

ValueError: operands could not be broadcast together with shapes (446,1,1100) (3440,)

Hi, when I tested thchs30_test chunk, an error has occurred.

The details show as follows:

 Testing thchs30_test chunk = 1 / 10
Traceback (most recent call last):
  File "run_exp.py", line 330, in <module>
[data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict]=run_nn(data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict,config_chunk_file,processed_first,next_config_file)
  File "/home/server/pytorch-kaldi-thchs30/core.py", line 233, in run_nn
    out_save=out_save-np.log(counts/np.sum(counts))             
ValueError: operands could not be broadcast together with shapes (446,1,1100) (3440,) 

Many warnings appear during the training process.

OMP: Warning #190: Forking a process while a parallel region is active is potentially unsafe.

What caused this error?
BTW, I guess it may related to the value of n_chunks.

IndexError: list index out of range

I have installed all the required dependencies mentioned in the readme file. But while running the command mentioned below
python run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg
I am facing the issues as following
screenshot from 2019-02-13 19-44-08

data_io.py doesn't work under Py3k

Blender 2.6x comes with its own Python in Version 3.3.x, and in Py3k next() becomes __next__().
Better fix this if you want perfect python 3 compatibility.

Here is the error message in log.log:
File ".../pytorch-kaldi/data_io.py", line 86, in load_counts
row = f.next().strip().strip('[]').strip()
AttributeError: '_io.TextIOWrapper' object has no attribute 'next'

I was using python 3.6.2 with Anaconda.

How to train and evaluate mono phoneme model?

Hi. Thanks for your work on such an amazing package.
I have trained and tested tri-phone model and it worked pretty well. Right now I hope to train a DNN model using mono phoneme model per frame. I change the cfg file as you mentioned and set the output number of MLP as 48. I finish the train and validating phrase. But when I try to decode it, an error occurred saying that "48 mismatched with 144".

So I wonder how could I set the training and decode the output of Mono phone DNN? I guess 144 may refer to the 144 states, but I don't understand why I need these states for mono phoneme label per frame. And if we don't need to decode for mono phoneme case, how could I evaluate on test data?

Thanks

run TIMIT_SincNet_raw error

When I ran the python ./run_exp.py cfg/TIMIT_baselines/TIMIT_SincNet_raw.cfg, the error occurred:

- Reading config file......OK!
- Chunk creation......OK!

------------------------------ Epoch 000 / 023 ------------------------------
Training TIMIT_tr chunk = 1 / 10
ERROR: training epoch 0, chunk 0 not done! File exp/TIMIT_SincNet_raw/exp_files/train_TIMIT_tr_ep000_ck00.info does not exist.
See exp/TIMIT_SincNet_raw/log.log 

the logs are as follows:

copy-feats scp:exp/TIMIT_SincNet_raw/exp_files/train_TIMIT_tr_ep000_ck00_raw.lst ark:- 
  LOG (copy-feats[5.5.166~1-013489]:main():copy-feats.cc:143) Copied 370 feature matrices. 
  ali-to-pdf quick_test/exp_ali/tri3_ali/final.mdl ark:- ark:-           
  LOG (ali-to-pdf[5.5.166~1-013489]:main():ali-to-pdf.cc:68) Converted 3696 alignments to pdf sequences. 
  copy-feats scp:exp/TIMIT_SincNet_raw/exp_files/train_TIMIT_tr_ep000_ck00_raw.lst ark:- 
  LOG (copy-feats[5.5.166~1-013489]:main():copy-feats.cc:143) Copied 370 feature matrices. 
  ali-to-phones --per-frame=true quick_test/exp_ali/tri3_ali/final.mdl ark:- ark:-
  LOG (ali-to-phones[5.5.166~1-013489]:main():ali-to-phones.cc:134) Done 3696 utterances. 
  Traceback (most recent call last):
    File "run_nn.py", line 208, in <module>
       outs_dict=forward_model(fea_dict,lab_dict,arch_dict,model,nns,costs,inp,inp_out_dict,max_len,batch_size,to_do,forward_outs) 
    File "/data/zhanghao/pytorch-kaldi/utils.py", line 1584, in forward_model
       outs_dict[out_name]=nns[inp1](inp_dnn)
    File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in __call__ 
      result = self.forward(*input, **kwargs)
    File "/data/zhanghao/pytorch-kaldi/neural_networks.py", line 1363, in forward
      x = self.drop[i](self.act[i](self.ln[i](F.max_pool1d(self.conv[i](x), self.sinc_max_pool_len[i])))) 
    File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in __call__ 
      result = self.forward(*input, **kwargs) 
    File "/data/zhanghao/pytorch-kaldi/neural_networks.py", line 1500, in forward
      self.filters = (band_pass * self.window_).view( 
  RuntimeError: The size of tensor a (127) must match the size of tensor b (128) at non-singleton dimension 1

and my torch version is 0.4.0 on Ubuntu16.04

Train with LF-MMI in Kaldi

Does this support LF-MMI training in Kaldi, Or have you compared the results between LF-MMI in Kaldi and Pytorch?
Thank you!

Decoding issue

I'm getting an error in the decoding section of run_exp.py.
When running cmd_decode (kaldi_decoding_scripts/decode_dnn.sh) I'm a getting an ls error saying that the script cannot aches these .ark files. Are these .ark files supposed to be generated somewhere else in the code? Or are these .ark files created after running decode_dnn.sh?

log file output:

/bin/sh: 1: ./path.sh: not found

1947

hmm-info /z/mkperez/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali/final.mdl

ali-to-pdf /z/mkperez/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali/final.mdl 'ark:gunzip -c /z/mkperez/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali/ali.*.gz |' ark:-
LOG (ali-to-pdf[5.0.131-95f0f]:main():ali-to-pdf.cc:68) Converted 5352 alignments to pdf sequences.
LOG (analyze-counts[5.0.13
1-95f0f]:main():analyze-counts.cc:193) Summed 5352 int32 vectors to counts, skipped 0 vectors.
LOG (analyze-counts[5.0.13~1-95f0f]:main():analyze-counts.cc:195) Counts written to exp/TIMIT_liGRU_mfcc/exp_files/forward_out_dnn2_lab_cd.count

kaldi_decoding_scripts/decode_dnn.sh /home/mkperez/pytorch-kaldi/exp/TIMIT_liGRU_mfcc/decoding_TIMIT_test_out_dnn2.conf /home/mkperez/pytorch-kaldi/exp/TIMIT_liGRU_mfcc/decode_TIMIT_test_out_dnn2 /home/mkperez/pytorch-kaldi/exp/TIMIT_liGRU_mfcc/exp_files/forward_TIMIT_test_ep*_ck*_out_dnn2_to_decode.ark
featString /home/mkperez/pytorch-kaldi/exp/TIMIT_liGRU_mfcc/exp_files/forward_TIMIT_test_ep*_ck*_out_dnn2_to_decode.ark
kaldi_decoding_scripts/decode_dnn.sh: no such file /home/mkperez/pytorch-kaldi/quick_test/graph/HCLG.fst

ls: cannot access '/home/mkperez/pytorch-kaldi/exp/TIMIT_liGRU_mfcc/exp_files/forward_TIMIT_test_ep*_ck*_out_dnn2_to_decode.ark': No such file or directory

MemoryError and IndexError: list index out of range

Hallo, I'm trying run pytorch-kaldi wit LRS2 databank, but I found this error, I have checked all paths are exists and the alignments are also exists. Please help me, thank you

  • Reading config file......OK!
  • Chunk creation......OK!

------------------------------ Epoch 00 / 23 ------------------------------
Training pretrain_train chunk = 1 / 10
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/media/wentao/wentaodisk/pytorch_kaldi/pytorch-kaldi/data_io.py", line 207, in read_lab_fea
[data_name_fea,data_set_fea,data_end_index_fea]=load_chunk(fea_scp,fea_opts,lab_folder,lab_opts,cw_left,cw_right,max_seq_length, output_folder, fea_only)
File "/media/wentao/wentaodisk/pytorch_kaldi/pytorch-kaldi/data_io.py", line 133, in load_chunk
data_set=(data_set-np.mean(data_set,axis=0))/np.std(data_set,axis=0)
MemoryError

Traceback (most recent call last):
File "/media/wentao/wentaodisk/pytorch_kaldi/pytorch-kaldi/run_exp.py", line 202, in
[data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict]=run_nn(data_name,data_set,data_end_index,fea_dict,lab_dict,arch_dict,config_chunk_file,processed_first,next_config_file)
File "/media/wentao/wentaodisk/pytorch_kaldi/pytorch-kaldi/core.py", line 81, in run_nn
data_name=shared_list[0]
IndexError: list index out of range

Process finished with exit code 1

Assertion `t >= 0 && t < n_classes` failed

I am trying to train a model for AMI dataset and I get the following error:

  • Reading config file......OK!
  • Chunk creation......OK!
    ------------------------------ Epoch 000 / 023 ------------------------------
    Training AMI_tr chunk = 1 / 50
    ERROR: training epoch 0, chunk 0 not done! File exp/AMI_MLP_fbank/exp_files/train_AMI_tr_ep000_ck00.info does not exist.
    See exp/AMI_MLP_fbank/log.log

##########################################################

exp/AMI_MLP_fbank/log.log:

add-deltas --delta-order=0 ark:- ark:-
apply-cmvn --utt2spk=ark:/disk/scratch1/s1569548/software/kaldi/egs/ami/s5b/data/ihm/train_cleaned/utt2spk scp:/disk/scratch1/s1569548/software/kaldi/egs/ami/s5b/data-fbank/ihm/train_cleaned/cmvn.scp ark:- ark:-

copy-feats scp:exp/AMI_MLP_fbank/exp_files/train_AMI_tr_ep000_ck00_fbank.lst ark:-
LOG (copy-feats[5.5.1051-d3379]:main():copy-feats.cc:143) Copied 1969 feature matrices.
LOG (apply-cmvn[5.5.105
1-d3379]:main():apply-cmvn.cc:162) Applied cepstral mean normalization to 1969 utterances, errors on 0
ali-to-phones --per-frame=true /disk/scratch1/s1569548/software/pytorch-kaldi/kaldi/exp/ihm/tri3_cleaned_ali_train_cleaned/final.mdl ark:- ark:-
LOG (ali-to-phones[5.5.1051-d3379]:main():ali-to-phones.cc:134) Done 98455 utterances.
copy-feats scp:exp/AMI_MLP_fbank/exp_files/train_AMI_tr_ep000_ck00_fbank.lst ark:-
apply-cmvn --utt2spk=ark:/disk/scratch1/s1569548/software/kaldi/egs/ami/s5b/data/ihm/train_cleaned/utt2spk scp:/disk/scratch1/s1569548/software/kaldi/egs/ami/s5b/data-fbank/ihm/train_cleaned/cmvn.scp ark:- ark:-
add-deltas --delta-order=0 ark:- ark:-
LOG (copy-feats[5.5.105
1-d3379]:main():copy-feats.cc:143) Copied 1969 feature matrices.
LOG (apply-cmvn[5.5.1051-d3379]:main():apply-cmvn.cc:162) Applied cepstral mean normalization to 1969 utterances, errors on 0
ali-to-pdf /disk/scratch1/s1569548/software/pytorch-kaldi/kaldi/exp/ihm/tri3_cleaned_ali_train_cleaned/final.mdl ark:- ark:-
LOG (ali-to-pdf[5.5.105
1-d3379]:main():ali-to-pdf.cc:68) Converted 98455 alignments to pdf sequences.
Traceback (most recent call last):
File "run_nn.py", line 207, in
outs_dict=forward_model(fea_dict,lab_dict,arch_dict,model,nns,costs,inp,inp_out_dict,max_len,batch_size,to_do,forward_outs)
File "/disk/scratch1/s1569548/software/pytorch-kaldi/utils.py", line 1630, in forward_model
lab_dnn=lab_dnn.view(-1).long()
RuntimeError: CUDA error: device-side assert triggered
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1544081127912/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed.

##########################################################

So it looks like something is wrong with the targets. Therefore, I added those lines (in load_chunk() in data_io.py):
print("Min label value of this chunk: ", min(data_lab))
print("Max label value of this chunk: ", max(data_lab))

and got:

Min label value of this chunk: 1
Max label value of this chunk: 175
Min label value of this chunk: 0
Max label value of this chunk: 3983

I didn't modify the default architecture, so it is a monophone+cd model.

From gmm-info on the triphone model used for alignments, I get the following:

number of phones 176
number of pdfs 3984
number of transition-ids 27460
number of transition-states 13650
feature dimension 40
number of gaussians 80060

So for the pdfs, the labels satisfy t >= 0 && t < n_classes.
For the monophone part, the silence label is missing. I'm not sure why...

##########################################################

Do you know what can be the issue here?
I was trying to train with only monophone and with only cd targets, but I got the same error.

How to apply this repo for another emotion task?

Hi,
Thank you for your work.
I've read the instruction and the SincNet paper. I wonder that how can I use the pytorch-kaldi and, especially, the SincNet for emotion recognition task since the repo instruction and SincNet paper are all about the speaker identification which differ from emotion recognition in term of label.
For example, all I need to do is to modify the label TIMIT_labels.npy to the label of my emotion dataset (0-7, for 8 emotions), of course along with other instruction steps?

Thank you for your time.
tsly

I am a little confused about the chunk?

Does the chunk means to split the all the training dataset? if so, when I set n_chunks to 100, then every chunk has 10833 batchs:

------------------------------ Epoch 000 / 001 ------------------------------
Training WSJ0_train chunk = 1 / 100
[=-----------------------------------------] 3.8% Training | (Batch 411/10833)

When I change the n_chunks, the number of batchs of every chunk stay the same.

------------------------------ Epoch 000 / 001 ------------------------------
Training WSJ0_train chunk = 1 / 10
[====------------------------------------] 9.4% Training | (Batch 1021/10833)

So what does the n_chunks mean?

overwriting config parameters from the command line

Hi,
I was trying to train a new model with a different seed by overwriting config parameters from the command line with this command:
python run_exp.py cfg/my_cfg --exp,out_folder="exp/new_dir" --exp,seed=1234
As a result, exp_new_dir was created with exp_files copied from the original model, a new model wasn't trained, and the information for the original model from cfg/my_cfg was printed out. Can the exp,out_folder be specified this way to train and store a new model with modified options?
Thanks,
Joanna

train_TIMIT_tr_ep000_ck00.info does not exist

I have got the following error. Could you tell me how to solve this?

  • Reading config file......OK!
  • Chunk creation......OK!

------------------------------ Epoch 000 / 023 ------------------------------
Training TIMIT_tr chunk = 1 / 5
ERROR: training epoch 0, chunk 0 not done! File exp/TIMIT_MLP_basic/exp_files/train_TIMIT_tr_ep000_ck00.info does not exist.
See exp/TIMIT_MLP_basic/log.log

exp/TIMIT_MLP_basic/log.log:

apply-cmvn --utt2spk=ark:/audio/kaldi/kaldi/egs/timit/s5/data/train/utt2spk ark:/audio/kaldi/kaldi/egs/timit/s5/mfcc/cmvn_train.ark ark:- ark:-
copy-feats scp:exp/TIMIT_MLP_basic/exp_files/train_TIMIT_tr_ep000_ck00_mfcc.lst ark:-
add-deltas --delta-order=2 ark:- ark:-
LOG (copy-feats[5.5.1931-05d9a]:main():copy-feats.cc:143) Copied 739 feature matrices.
LOG (apply-cmvn[5.5.193
1-05d9a]:main():apply-cmvn.cc:162) Applied cepstral mean normalization to 739 utterances, errors on 0
ali-to-pdf /audio/kaldi/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali/final.mdl ark:- ark:-
LOG (ali-to-pdf[5.5.193~1-05d9a]:main():ali-to-pdf.cc:68) Converted 6648 alignments to pdf sequences.
Traceback (most recent call last):
File "run_nn.py", line 109, in
[nns,costs]=model_init(inp_out_dict,model,config,arch_dict,use_cuda,multi_gpu,to_do)
File "/audio/kaldi/pytorch-kaldi/utils.py", line 1455, in model_init
net=nn_class(config[arch_dict[inp1][0]],inp_dim)
File "/audio/kaldi/pytorch-kaldi/neural_networks.py", line 71, in init
self.wx = nn.ModuleList([])
AttributeError: 'module' object has no attribute 'ModuleList'

python2 or python3 ?

Could you tell me this projuct support which python ?
I installed python3 with torch and other libs . But, it has some error:
python3 run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg Traceback (most recent call last): File "run_exp.py", line 48, in <module> [config,name_data,name_arch]=check_cfg(cfg_file,config,cfg_file_proto) File "/mnt/lustre/asrdata/users/xch27/experiments/pytorch_kaldi/pytorch-kaldi/utils.py", line 501, in check_cfg N_out=int(output.decode().rstrip()) ValueError: invalid literal for int() with base 10: ''
May be I should try python2 ... But matplotlib doesn't support python2.x , I feel a bit confused.

Get nan training loss

Hi, thanks for the great job! @mravanelli
I am running the TIMIT_1dnn+5liGRU+1dnn_best config. But for my use setting, I changed the mfcc, fbank and fmllr framshift to 5ms, instead of the default setting 10ms. The problem is that the training loss and the valid loss are nan. BTW, I have successfully run your default setting, where frameshift is 10ms. Could you please suggest possible issues? Thank you.
wx20181223-100038 2x

stm file and glm file

Hi,I want to use this demo to Chinese speech recognition. The example I use is thchs30 in kaldi. But there was an error in the decoding process, which was caused by the stm file not found in the score process. I compared the process of thchs30 and timit in kaldi and found that thchs30 did not generate glm file and stm file . What are the two files used for? what should I do to solve this problem?

log.log
Cannot open transcription file '/home/LLL/kaldi/egs/thchs30/s5/data/mfcc/test/stm': No such file or directory at local/timit_norm_trans.pl line 74, line 61.
cp: cannot stat '/home/LLL/kaldi/egs/thchs30/s5/data/mfcc/test/glm': No such file or directory
run.pl: 10 / 10 failed, log is in /home/LLL/pytorch-kaldi-thchs30/exp/thchs30_MLP_basic_low/decode_thchs30_test_out_dnn1/scoring/log/score.*.log

score.1.log
Error: Unable to stat GLM file '/home/LLL/pytorch-kaldi-thchs30/exp/thchs30_MLP_basic_low/decode_thchs30_test_out_dnn1/scoring/glm_39phn' at /home/LLL/kaldi/tools/sctk/bin/hubscr.pl line 265.

ERROR: hmm-info command doesn't exist. Make sure your .bashrc contains the Kaldi paths and correctly exports it.

Hi, I'm running the run_exp.py, but I have got an error:
ERROR: hmm-info command doesn't exist. Make sure your .bashrc contains the Kaldi paths and correctly exports it.

But I have already make kaldi path in bashrc and try run hmm-info in bash, and I got this info
~$ hmm-info
hmm-info

Write to standard output various properties of HMM-based transition model
Usage: hmm-info [options]
e.g.:
hmm-info trans.mdl

Standard options:
--config : Configuration file to read (this option may be repeated) (string, default = "")
--help : Print out usage message (bool, default = false)
--print-args : Print the command line arguments (to stderr) (bool, default = true)
--verbose : Verbose level (higher->more logging) (int, default = 0)
I'm trying figure out what is the problem but I can't find out, can you please help me? Thanks.
in run_shell(cmd,log_file): my cmd is 'hmm-info /media/wentao/wentaodisk/pytorch_kaldi/kaldi_prepare/exp/tri4a_pt/final.mdl | awk \'/pdfs/{print $4}\''.
But after p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) and (output, err) = p.communicate() the output=b'' and err=b'/bin/sh: 1: hmm-info: not found\n'

Parallel decoding w/o reloading

Hey, have you thought of any way to do parallel Kaldi decoding w/o reloading the fst graph
(like this class http://kaldi-asr.org/doc/decoder-wrappers_8cc_source.html called in https://github.com/kaldi-asr/kaldi/blob/master/src/nnet3bin/nnet3-latgen-faster-parallel.cc)

Seems like this parallel method will simply reload the HCLG.fst in every process, making it much slower than just loading it once and doing it serially.
https://github.com/mravanelli/pytorch-kaldi/blob/master/kaldi_decoding_scripts/decode_dnn.sh#L85

Maybe I'm misinterpreting how this works? Relatively new to Kaldi, but I'm working on integrating parallel lattice decoding w/ Torch on my own right now, and running into enough issues that I'm considering
A) Using the new PyTorch 1.0 c++ frontend
B) Torch --> ark files --> parallel decoder.

Are you guys looking into this problem? Happy to help if you are.

Can't find cfg for librespeech?

Hey,
trying to reproduce the results for the twinreg paper. The cfg folder only seems to contain stuff for TIMIT though?

Also in the paper it says you just ran the kaldi recipe for the corpus, does that mean you trained alignments using the full ~900 hour dataset (as is done in the current kaldi recipe for librespeech)? In the paper it says you only used the 100 hour subset (train-clean), was that only for training the DNN part (and I'm guessing you did not use speed perturbation, so it really is 100 hours in the end?)?

nn alignments in TIMIT tutorial

Hi, I'm following your TIMIT tutorial.
And... I'm stuck in step 4.
I could use the first two commands, but last two were not possible.
I thought it can be ignored, but step 7' command required the outputs of those.
Sorry, if this is a basic issue .. Can I get some comments..?

CNN_feaproc shape issue

Hi Mirco,

I think this is a bug, but please correct me if I'm wrong.

When constructing context windows, the left and right context are constructed by flattening a slice of an input feature array of shape (num_frames, num_feats). Later, CNN_feaproc reconstructs the 2D context window by calling x.view(batch * steps, 1, -1, 11), which, as I understand it, is supposed to generate a tensor of shape (n, num_input_channels, num_feats, window_length) (11 is the fixed context window length). Since the original slice of shape (window_length, num_feats) would be flattened in c-contiguous order, asking for a view of shape (num_feats, window_length) would not recover the original slice.

Best,
Sean

Input shape issue

Hello,
Thank you for sharing your work! I've been trying to train the baseline (python3 run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg), but I get the an input shape error (I'm attaching the log). I already checked a previous issue about the shape, but it seems a different one. I would appreciate any help you could provide!

log.log

Usage case for Kaldi

Mirko,
Thanks for posting this.

I have digits decoded by LeNet5 and I want an HMM to combine the probability vector for each character position with a prior distribution of words to get the posterior distribution of the word.

How would I adapt Kaldi to do so?

when run decode_dnn_TIMIT.sh, I meet bash : line 49185 Aborted (core dumped)

./decode_dnn_TIMIT.sh /home/gkb/Download/kaldi/egs/timit/s5/exp/tri3/graph /home/gkb/Download/kaldi/egs/timit/s5/data/test/ /home/gkb/Download/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali ..//home/gkb/torch_kaldi_exp/exp/TIMIT_MLP/decoding_test cat ..//home/gkb/torch_kaldi_exp/exp/TIMIT_MLP/forward_ep_24_ck_4.pkl
bash: line 1: 49185 Aborted (core dumped) ( latgen-faster-mapped --min-active=200 --max-active=7000 --max-mem=50000000 --beam=13.0 --lattice-beam=8.0 --acoustic-scale=0.2 --allow-partial=true --word-symbol-table=/home/gkb/Download/kaldi/egs/timit/s5/exp/tri3/graph/words.txt /home/gkb/Download/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali/final.mdl /home/gkb/Download/kaldi/egs/timit/s5/exp/tri3/graph/HCLG.fst "ark,s,cs:cat ..//home/gkb/torch_kaldi_exp/exp/TIMIT_MLP/forward_ep_24_ck_4.pkl |" "ark:|gzip -c > ..//home/gkb/torch_kaldi_exp/exp/TIMIT_MLP/decoding_test/lat.1.gz" ) 2>> ..//home/gkb/torch_kaldi_exp/exp/TIMIT_MLP/decoding_test/log/decode.1.log >> ..//home/gkb/torch_kaldi_exp/exp/TIMIT_MLP/decoding_test/log/decode.1.log
run.pl: job failed, log is in ..//home/gkb/torch_kaldi_exp/exp/TIMIT_MLP/decoding_test/log/decode.1.log

Why there is a lot of errors when running joint training ?

I want to test the joint training part. But when I run the scrpits. I first counter that
"ERROR: features names (fea_name=) must contain only letters or numbers (no special characters as "_,$,..")"
So I change the fbank_clean to fbankclean in TIMIT_joint_training_liGRU_fbank.cfg. it is ok!
But when I run the scripts, I just counter that "IndexError: list index out of range" in
"pytorch-kaldi-master/utils.py", line 1546, in dict_fea_lab_arch
fea_lst_used.append((inp2+","+",".join(list(re.findall(pattern_fea,fea_field)[0]))).split(','))"
I just print the
pattern_fea is
"
fea_name=fbankrev
fea_lst=(.)
fea_opts=(.
)
cw_left=(.)
cw_right=(.
)
fea_name=fbankclean
fea_lst=exp/TIMIT_joint_training_liGRU_fbank/exp_files/train_TIMIT_tr_ep00_ck0_fbankclean.lst
fea_opts=apply-cmvn --utt2spk=ark:timit-data/timit/data/train/utt2spk ark:kalditimit/timit/s5/fbank/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0"

the fea_field is
"
fea_name=fbankrev
fea_lst=exp/TIMIT_joint_training_liGRU_fbank/exp_files/train_TIMIT_tr_ep00_ck0_fbankrev.lst
fea_opts=apply-cmvn --utt2spk=ark:timit-data/timit-rev/data/train/utt2spk scp:timit-data/timit-rev/data/train/cmvn.scp ark:- ark:- |
add-deltas --delta-order=0 ark:- ark:- |
cw_left=0
cw_right=0
"

But the print (re.findall(pattern_fea,fea_field)) is NULL .

So do I do something wrong? I just run the ori scripts. But It has so many errors.

How to use pytorch-kaldi for Speaker REcognition (SRE) ?

Hi,
Thank you very much for this interesting project.
I am wondering if it is possible to use pytprch-kaldi for speaker recognition. If yes, could you please give me some hints and clues on how to start ? is there any tutorial for this ?

Thanks in advance,
Tony

Unable to run run_exp.sh with liGRU_mfcc

Hi Mirco,

I am trying to reproduce the li-GRU results on the TIMIT dataset in your paper. For this I followed the readme and changed all relevant paths in the liGRU_mfcc.cfg. Training with './run_exp.sh cfg/baselines/liGRU_mfcc.cfg' runs fine, but as soon as it goes to the next step I get the following error:

bash: line 1: 63283 Aborted (core dumped) ( latgen-faster-mapped --min-active=200 --max-active=7000 --max-mem=50000000 --beam=13.0 --lattice-beam=8.0 --acoustic-scale=0.2 --allow-partial=true --word-symbol-table=/home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/tri3/graph/words.txt /home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali/final.mdl /home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/tri3/graph/HCLG.fst "ark,s,cs:cat ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/forward_ep_24_ck_4.pkl |" "ark:|gzip -c > ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/decoding_test/lat.1.gz" ) 2>> ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/decoding_test/log/decode.1.log >> ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/decoding_test/log/decode.1.log
run.pl: job failed, log is in ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/decoding_test/log/decode.1.log
run.pl: job failed, log is in ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/decoding_test/scoring/log/best_path.1.1.log

The two log files at the end of the error don't exist. When I run latgen-faster-mapped with all parameters directly I get this error:

latgen-faster-mapped --min-active=200 --max-active=7000 --max-mem=50000000 --beam=13.0 --lattice-beam=8.0 --acoustic-scale=0.2 --allow-partial=true --word-symbol-table=/home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/tri3/graph/words.txt /home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali/final.mdl /home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/tri3/graph/HCLG.fst "ark,s,cs:cat ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/forward_ep_24_ck_4.pkl |" "ark:|gzip -c > ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/decoding_test/lat.JOB.gz"
latgen-faster-mapped --min-active=200 --max-active=7000 --max-mem=50000000 --beam=13.0 --lattice-beam=8.0 --acoustic-scale=0.2 --allow-partial=true --word-symbol-table=/home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/tri3/graph/words.txt /home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali/final.mdl /home/algo-phd-01/tools/kaldi/egs/timit/s5/exp/tri3/graph/HCLG.fst 'ark,s,cs:cat ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/forward_ep_24_ck_4.pkl |' 'ark:|gzip -c > ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/decoding_test/lat.JOB.gz'
cat: ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/forward_ep_24_ck_4.pkl: No such file or directory
WARNING (latgen-faster-mapped[5.4.1981-be7c1]:Close():kaldi-io.cc:515) Pipe cat ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/forward_ep_24_ck_4.pkl | had nonzero return status 256
ERROR (latgen-faster-mapped[5.4.198
1-be7c1]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive 'cat ..//home/algo-phd-01/workspace/phd/benchmarking/pytorch-kaldi/exp/TIMIT_liGRU_official_v1/forward_ep_24_ck_4.pkl |'

[ Stack-Trace: ]
latgen-faster-mapped() [0xb1aae0]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::SequentialTableReaderArchiveImpl<kaldi::KaldiObjectHolder<kaldi::Matrix > >::~SequentialTableReaderArchiveImpl()
kaldi::SequentialTableReaderArchiveImpl<kaldi::KaldiObjectHolder<kaldi::Matrix > >::~SequentialTableReaderArchiveImpl()
kaldi::SequentialTableReader<kaldi::KaldiObjectHolder<kaldi::Matrix > >::~SequentialTableReader()
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
what():
Aborted (core dumped)

How to use Batchnormalization effectly ?

Dear mravanelli,
Recently, I have read your paper "BATCH-NORMALIZED JOINT TRAINING FOR DNN-BASED
DISTANT SPEECH RECOGNITION" . In the paper it says the batchnormalization can replace the pretraining and can get significant improvement.
Now I want to use the joint training in the AMI dataset. But I can not get any performance when using batchnormalization. Can you give me some idea of how you use the batchnorm. Or does the pytorch-kaldi have the environment in your paper?
Thank you !

TIMIT sample problem

TIMIT sample problem
After following the instructions step by step installation, I received an error when running python run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg as follows.

Traceback (most recent call last):
   File "run_exp.py", line 49, in
     [config,name_data,name_arch]=check_cfg(cfg_file,config,cfg_file_proto)
   File "/home/chen/pytorch-kaldi/utils.py", line 565, in check_cfg
     N_out=int(output.decode().rstrip())
ValueError: invalid literal for int() with base 10: ''

Can you tell me the reason for the error and how to modify it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.