thu-spmi / cat Goto Github PK

View Code? Open in Web Editor NEW

314.0 314.0 74.0 48.47 MB

A CRF-based ASR Toolkit

License: Apache License 2.0

Makefile 0.18% C++ 5.68% CMake 0.39% Cuda 17.29% C 0.55% Python 65.92% Shell 9.99%

cat's People

Contributors

Stargazers

Watchers

cat's Issues

安装到最后，Test CTC-CRF installation时出错

Test CTC-CRF installation:
Traceback (most recent call last):
  File "main.py", line 1, in <module>
    import ctc_crf
  File "/root/miniconda3/lib/python3.8/site-packages/ctc_crf-0.1.1-py3.8-linux-x86_64.egg/ctc_crf/__init__.py", line 15, in <module>
    import ctc_crf._C as core
ImportError: /root/miniconda3/lib/python3.8/site-packages/ctc_crf-0.1.1-py3.8-linux-x86_64.egg/ctc_crf/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN2at5emptyEN3c108ArrayRefIlEENS0_13TensorOptionsENS0_8optionalINS0_12MemoryFormatEEE
failed to install cat.

可以帮忙看一下是什么原因吗

FAQ

I have written the answers to some frequently asked questions about CAT. It would be nice if others could share some suggestions & experience about CAT. And we welcome those who are interested in CAT to translate FAQs into English, or supplement relevant English documents.

Hard to converge? Is it possible to release model configs or trained models?

Hi,
I have tried different model topologies and found it's really hard to converge.

I am reproducing results reported in papers "_CAT: CRF-BASED ASR TOOLKIT_I" with Aishell dataset. To be exact, I am reproducing Table 4 in that paper.
However, I was only able to reproducing the BLSTM model(pointed by red arrow in following graph).

Here is the convergence of my successful experiments with BLSTM. It only takes one epoch to reach cer 9.9%, but it takes another 16 epochs to reach cer 7.42%.

And I failed to get a decent result with LSTM, VGG-LSTM, TDNN-LSTM models. I have tried different initial learning rate, decay strategy, ctc objective weight lambda. But they just failed to converge.

Could you release associate configs for these models? If together with trained models would be much more appreciated.

Thank you!

decode error

hi, when I run the decode process, some error happens:

==================== Stage 4 Decode ====================
Decode: set 'inference:infer:option:resume' -> /home/test/hd1/work/CAT/egs/aishell/exp/char_ctc-crf-cuside/check/checkpoint.1e1000s.pt
Decode: set 'inference:infer:option:output_dir' -> exp/char_ctc-crf-cuside/decode/{}
Decode: test: set 'output_dir' -> exp/char_ctc-crf-cuside/decode/test
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Traceback (most recent call last):
File "/home/test/hd1/work/CAT/egs/aishell/utils/pipeline/asr.py", line 464, in
interface.main(parse_args_from_var(
File "/home/test/hd1/work/CAT/cat/ctc/cal_logit.py", line 54, in main
mp.spawn(worker, nprocs=world_size, args=(args, q, model))
File "/home/test/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/test/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/test/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 1
Traceback (most recent call last):
File "", line 1, in
File "/home/test/miniconda3/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/test/miniconda3/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/home/test/miniconda3/lib/python3.10/multiprocessing/synchronize.py", line 110, in setstate
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

Can you figure out the problems? thanks

Example & test of the installation of `ctc_crf`

You should first install the ctc_crf module following the guide:
https://github.com/thu-spmi/CAT/blob/master/install.md#cat see 4. install ctc-crf

After installation. Download the mini-denominator LM den_lm.txt
NOTE: I rename the den_lm.fst -> den_lm.txt due to the limitation of github. It doesn't matter what the suffix is since the content is still in binary.

Use following code to test the ctc_crf loss

import ctc_crf
import torch

den_lm = "den_lm.txt"

# 0: <blk>
# 1: a
# 2: c
# 3: s
# 4: t
vocab_size = 5


def test():
    criterion = ctc_crf.CTC_CRF_LOSS(lamb=0.01)
    logits = torch.tensor([
        [
            [0.1, 0.1, 0.5, 0.1, 0.2],
            [0.5, 0.1, 0.1, 0.2, 0.2],
            [0.1, 0.7, 0.1, 0.05, 0.05],
            [0.6, 0.1, 0.1, 0.1, 0.1],
            [0.1, 0.1, 0.1, 0.6, 0.1]
        ]
    ], device=0, dtype=torch.float32, requires_grad=True).log()
    # [2, 1, 4] -> c a t
    labels = torch.tensor([2, 1, 4], dtype=torch.int32)
    frame_lens = torch.tensor([5], dtype=torch.int32)
    label_lens = torch.tensor([3], dtype=torch.int32)
    print("Frame len: {}".format(frame_lens.tolist()))
    print("Label len: {}".format(label_lens.tolist()))
    print("Logit shape: {}".format(logits.shape))
    print("Label shape: {}".format(labels.shape))

    loss = criterion(logits, labels, frame_lens, label_lens)
    print("CRF loss:", loss.item())

    loss.backward()


if __name__ == "__main__":
    ctx = ctc_crf.CRFContext(den_lm, gpus=0)
    test()

The output should be

Frame len: [5]
Label len: [3]
Logit shape: torch.Size([1, 5, 5])
Label shape: torch.Size([3])
CRF loss: -2.4786245822906494

Librispeech data preparation

There are some basic steps which are missing.
For example - symbolic link to wsj sub-directories
Execute permission to .sh and .pl files.

Also could you clarify what is expected in Librispeech directory.

Got following error:

Stage 1: ========================================
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
utils/data/get_utt2dur.sh: line 91: utils/data/split_data.sh: No such file or directory
run.pl: 4 / 4 failed, log is in data/dev_clean/log/get_durations..log
utils/data/get_utt2dur.sh: there was a problem getting the durations
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
utils/data/get_utt2dur.sh: line 91: utils/data/split_data.sh: No such file or directory
run.pl: 4 / 4 failed, log is in data/test_clean/log/get_durations..log
utils/data/get_utt2dur.sh: there was a problem getting the durations

docker 运行问题

RuntimeError: NCCL Error 2: unhandled system error

在 docker 执行训练的时候，遇到上述的问题。

支持cuda 11.1吗？

目前，我的配置改成了Geforce RTX 3080,需要11.1的cuda版本，pytorch=1.8.1,请问怎么支持?

请问Deformable TDNN 这方面的代码在哪个部分？非常感谢！

How to rerun test step?

When I run this script

CAT/egs/commonvoice/run_mc.sh

Lines 235 to 246 in 15ed6f2

 # uncomment the following line if you want to use specified GPUs 

 # CUDA_VISIBLE_DEVICES="0,1" \ 

 python3 ctc-crf/train.py --seed=0 \ 

 --world-size 1 --rank $NODE \ 

 --batch_size=128 \ 

 --dir=$dir \ 

 --config=$dir/config.json \ 

 --trset=data/pickle/train.pickle \ 

 --devset=data/pickle/dev.pickle \ 

 --data=$DATAPATH || 

 exit 1 

 fi

succeed in trianing step , but not in testing step,

Test: [5260/5412]       Time  0.573 ( 1.062)    Data  0.001 ( 0.551)    Loss_real 2.5580e+01 (2.1091e+01)
Test: [5270/5412]       Time  0.575 ( 1.061)    Data  0.002 ( 0.550)    Loss_real 2.1622e+01 (2.1092e+01)
Test: [5280/5412]       Time  0.581 ( 1.060)    Data  0.002 ( 0.549)    Loss_real 2.6419e+01 (2.1094e+01)
Test: [5290/5412]       Time  0.481 ( 1.059)    Data  0.001 ( 0.548)    Loss_real 2.5244e+01 (2.1089e+01)
Test: [5300/5412]       Time  0.542 ( 1.058)    Data  0.002 ( 0.547)    Loss_real 2.9388e+01 (2.1091e+01)
Test: [5310/5412]       Time  0.549 ( 1.057)    Data  0.001 ( 0.545)    Loss_real 1.4099e+01 (2.1093e+01)
Test: [5320/5412]       Time  0.510 ( 1.056)    Data  0.000 ( 0.544)    Loss_real 3.1891e+01 (2.1092e+01)
Test: [5330/5412]       Time  0.541 ( 1.055)    Data  0.000 ( 0.543)    Loss_real 1.4288e+01 (2.1090e+01)
Test: [5340/5412]       Time  0.508 ( 1.054)    Data  0.001 ( 0.542)    Loss_real 2.1985e+01 (2.1087e+01)
Test: [5350/5412]       Time  0.469 ( 1.052)    Data  0.001 ( 0.541)    Loss_real 2.3008e+01 (2.1091e+01)
Test: [5360/5412]       Time  0.436 ( 1.051)    Data  0.002 ( 0.540)    Loss_real 2.0166e+01 (2.1095e+01)
Test: [5370/5412]       Time  0.527 ( 1.050)    Data  0.001 ( 0.539)    Loss_real 2.4653e+01 (2.1095e+01)
Test: [5380/5412]       Time  0.434 ( 1.049)    Data  0.001 ( 0.538)    Loss_real 2.5790e+01 (2.1101e+01)
Test: [5390/5412]       Time  0.512 ( 1.048)    Data  0.002 ( 0.537)    Loss_real 1.9989e+01 (2.1098e+01)
Test: [5400/5412]       Time  0.558 ( 1.047)    Data  0.001 ( 0.536)    Loss_real 3.2615e+01 (2.1102e+01)
Test: [5410/5412]       Time  0.556 ( 1.046)    Data  0.018 ( 0.535)    Loss_real 1.0644e+01 (2.1104e+01)
Epoch: [2@0] | best=21.11 | current=21.11 | worse_count=0 | lr=1.00e-04
> Monitor figure saved at exp/mc_flatphone/monitor.png
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
MemoryError
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
MemoryError

so how can I just rerun the testing step?

Test on python3

Thanks for your great work~
Have you tried to install the dependencies on python3 and run the code?

请问为什么v2版本移除了Aishell-1数据集

【librispeech】No converge? Is it possible to release model configs or trained models?

HI, I run the libr/run.sh demo ,but the loss is still so large, the model can't converge.Can you help me? Is it possible to release model configs or trained models?
my env: pytorch 1.5 cuda 10.1 python3.7
run
python3 steps/train.py --lr=0.001 --output_unit=72 --lamb=0.001 --data_path=$dir --batch_size=256
loss:

libri $ cat log_train_1.log |grep "mean"
mean_cv_loss: 687.0884070194129
mean_cv_loss: 1222.706387606534
mean_cv_loss: 1671.5333037405303
mean_cv_loss: 1874.3074100378788
mean_cv_loss: 1370.3589680989583
mean_cv_loss: 1030.87519679214
mean_cv_loss: 1022.6952799479167
mean_cv_loss: 816.0803873697917
mean_cv_loss: 1366.5945001775567
mean_cv_loss: 1273.6065474076704
mean_cv_loss: 1495.3210212476326
mean_cv_loss: 1720.553447561553
mean_cv_loss: 1175.598328006629
mean_cv_loss: 2410.611476089015
mean_cv_loss: 1572.0060176964962
mean_cv_loss: 1302.1010786576705
mean_cv_loss: 1690.6941273082386
mean_cv_loss: 1641.5495087594697
mean_cv_loss: 1623.9612718986743

Look forward to your reply. Thank you

language mode not used in aishell?

thanks for sharing! i do aishell dataset,but i find that language model is not used in the decode process? is i am right?

TLG.fst was not determinizable

Hi
I am a newbie on wfst and I found that TLG.fst was not determinizable, Does it have any effect on the results?

fstdeterminizestar TLG.fst
ERROR (fstdeterminizestar[5.5.567~1-daf9d]:AddOneElement():fstext/determinize-star-inl.h:791) FST was not functional -> not determinizable.
First string: 74084
Second string: 6413

online decode?

is it possible to do online decoding? thanks

How to build the batchnorm_src?

GPU-DEN blank_index is defalut 0?

It's not support adjust blank_index to other index. But gpu_ctc support. Is it mean blank index is not important for gpu_den?

Refactoring

Hi, I am trying to make a Dockerfile and refactor directory structures.

In my branch docker, I have finished a dockerfile based on kaldi's official gpu-latest image. The pytorch is at 1.2.0 version, because the base image, kaldiasr/kaldi:gpu-latest, is staying at CUDA10.0. I'm not familiar with pytorch, so I just copy the instrunction from official site here.

Update: I chose pytorch 1.5 as base image now.

After those works, I found out that the scripts in wsj/steps and utils are not following kaldi's guidline.
Some of them were copied from old version of kaldi, Eesen, etc. I would suggest untracking typeical scripts from old Kaldi and moving remains into scripts/ctc-crf. Every egs has two soft links, steps and utils, to kaldi's wsj, and one soft link goes to scripts/ctc-crf .

Do you agree with above changes? I'm happy to make PR.

I has implement Tensorflow binding, but gradient maye error.

I have modifies warp-ctc Tensorflow binding and CAT pytorch binding, besides I have removed costs_beta (maybe useless).

def ctc_crf_loss(logits, labels, input_lengths,
                 blank_label=0, lamb=0.1):
  '''Computes the CTC-CRF loss between a sequence of logits and a
  ground truth labeling.

  Args:
      logits: A 3-D Tensor of floats. The dimensions
                   should be (t, n, a), where t is the time index, n
                   is the minibatch index, and a indexes over
                   logits for each symbol in the alphabet.

      labels: An int32 SparseTensor. labels.indices[i, :] == [b, t] means 
              labels.values[i] stores the id for (batch b, time t). 
              labels.values[i] must take on values in [0, num_labels).

      input_lengths: A 1-D Tensor of ints, the number of time steps
                     for each sequence in the minibatch.

      blank_label: int, the label value/index that the CTC
                   calculation should use as the blank label.

      lamb: float, A weight α for CTC Loss. 
                  Combined with the CRF loss to help convergence.

  Returns:
      1-D float Tensor, the cost of each example in the minibatch
      (as negative log probabilities).

  * This class performs the softmax operation internally.

  * The label reserved for the blank symbol should be label 0.

  '''
  # The input of the warp-ctc is modified to be the log-softmax output of the bottom neural network.
  activations = tf.nn.log_softmax(logits) # (t, n, a)
  activations_ = tf.transpose(activations, (1, 0, 2)) # (n, t, a)
  loss, _, _, costs_alpha = _ctc_crf.ctc_crf_loss(
      activations, activations_, labels.indices, labels.values,
      input_lengths, blank_label, lamb) # costs, gradients, grad_net, costs_alpha

  return (costs_alpha - (1 + lamb) * loss)  # (n,)


@ops.RegisterGradient("CtcCrfLoss")
def _CTCLossGrad(op, grad_loss, a, b, c):
  """The derivative provided by CTC-CRF Loss.

  Args:
     op: the CtcCrfLoss op.
     grad_loss: The backprop for cost.

  Returns:
     The CTC-CRF Loss gradient.
  """
  lamb = op.get_attr('lamb')
  grad_ctc = op.outputs[1] # (t, n, a)
  grad_den = tf.transpose(op.outputs[2], (1, 0, 2)) # (t, n, a)
  grad = grad_den - (1 + lamb) * grad_ctc # (t, n, a)
  # average with batch size.
  grad /= tf.cast(_get_dim(grad, 1), dtype=tf.float32) # (t, n, a)

  # Return gradient for inputs and None for
  # activations_, labels_indices, labels_values and sequence_length.
  return [_BroadcastMul(grad_loss, grad), None, None, None, None]
  # return [_BroadcastMul(grad_loss, op.outputs[1]), None, None, None, None]

I can provide all the codes if necessary, but my result is error because TER is over 100%.

Question about JoinAP

Hi. I apologize if this is not the right place to post a question, but I wasn't sure where to post it.

I have a few questions about JoinAP from Zhu et al 2021 (https://arxiv.org/pdf/2107.05038.pdf).

Where does the model obtain the phones in Figure 2? Are the phones obtained from the ground truth transcriptions or are they first predicted by the acoustic model?
By top-down, are you referring to breaking phones down into articulatory phonetic features using panphon?
During test time, are the phonetic transcriptions generated by Phonetisaurus also fed into the acoustic model as phone sequences? If not, where do the phones come from?

Thank you in advance!!

issues python install setup_1_0.py

Hi, my env :
torch 1.5.0
conda gcc version is 5.2.0 (GCC)
python:3.6.5

when i run python install setup_1_0.py , get errors
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' running install running bdist_egg running egg_info writing ctc_crf_base.egg-info/PKG-INFO writing dependency_links to ctc_crf_base.egg-info/dependency_links.txt writing top-level names to ctc_crf_base.egg-info/top_level.txt reading manifest file 'ctc_crf_base.egg-info/SOURCES.txt' writing manifest file 'ctc_crf_base.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'ctc_crf_base' extension gcc -pthread -B /home/sean/anaconda3/envs/p36/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/sean/anaconda3/envs/p36/lib/python3.6/site-packages/torch/include -I/home/sean/anaconda3/envs/p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/sean/anaconda3/envs/p36/lib/python3.6/site-packages/torch/include/TH -I/home/sean/anaconda3/envs/p36/lib/python3.6/site-packages/torch/include/THC -I/home/sean/anaconda3/envs/p36/include/python3.6m -c binding_1_0.cpp -o build/temp.linux-x86_64-3.6/binding_1_0.o -std=c++14 -fPIC -I/usr/local/cuda/include -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=ctc_crf_base -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ creating build/lib.linux-x86_64-3.6 g++ -pthread -shared -B /home/sean/anaconda3/envs/p36/compiler_compat -L/home/sean/anaconda3/envs/p36/lib -Wl,-rpath=/home/sean/anaconda3/envs/p36/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/binding_1_0.o -L/data00/home/sean/02_workspace/00_github_code/CAT/src/ctc_crf/gpu_ctc/build -L/data00/home/sean/02_workspace/00_github_code/CAT/src/ctc_crf/gpu_den/build -L/home/sean/anaconda3/envs/p36/lib/python3.6/site-packages/torch/lib -lfst_den -lwarpctc -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.6/ctc_crf_base.cpython-36m-x86_64-linux-gnu.so -Wl,-rpath,/data00/home/sean/02_workspace/00_github_code/CAT/src/ctc_crf/gpu_ctc/build -Wl,-rpath,/data00/home/sean/02_workspace/00_github_code/CAT/src/ctc_crf/gpu_den/build /home/sean/anaconda3/envs/p36/compiler_compat/ld: cannot find -lm /home/sean/anaconda3/envs/p36/compiler_compat/ld: cannot find -lpthread /home/sean/anaconda3/envs/p36/compiler_compat/ld: cannot find -lc collect2: error: ld returned 1 exit status /home/sean/anaconda3/envs/p36/lib/python3.6/distutils/extension.py:131: UserWarning: Unknown Extension options: 'headers', 'with_cuda' warnings.warn(msg) /home/sean/anaconda3/envs/p36/lib/python3.6/site-packages/torch/utils/cpp_extension.py:304: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) error: command 'g++' failed with exit status 1

Can you pls suggest the steps to resolve it? Thanks

关于ctc反向传播求导公式的实现问题

在gpu_ctc_kernels.h 这个文件中，计算的关于softmax的梯度，代码如下

// grad for softmax
for (int idx = tid, j = 0; idx < uniquelabels; idx += blockDim.x, ++j) {
     const int grads_offset = prob_offset + start_prob_col + keys_shared[idx];
     // output[idx] = accum[j];
     grads[grads_offset] = exp(accum[j] - probs[grads_offset] - log_partition);
}

不同的beta初值，其梯度是有差异的，下图是反向传播的推导公式，
左边来自博客
右边来自原始论文
我们这边使用的右边的初值方式，感觉应该是
grads[grads_offset] = exp(accum[j] - 2*probs[grads_offset] - log_partition);
这里的实现是否有问题，还请帮确认一下？

采用你们的修正的T.fst. TLG.fst模型变大了，没有变小,

你们的工程很棒，我也一直在eesen的工程上做工作，根据你们的论文CRF-BASED SINGLE-STAGE ACOUSTIC MODELING WITH CTC TOPOLOGY里说的，修改T.fst的生成方式，模型大小会变小，并且性能会略有提升. 我实际操作了下，根据我的语言模型，原来eesen生成的TLG.fst,是16M，直接在eesen的脚本中替换成你们的ctc_token_fst_corrected.py，生成的TLG.fst是20M．我发现你们论文说的，你们的T.fst的state的数量减少了，但是相对的arc增多了，这个你们有研究过吗？还是说我哪里有没理解到的地方？期待你们的回复，谢谢

安装出错

大佬救救，报错点：
Building wheel for ctcdecode (setup.py) ... error
error: subprocess-exited-with-error
ERROR: Failed building wheel for ctcdecode
error: legacy-install-failure

gcc-9如何安装？

Ubuntu 20.04.3 gcc version 9.4.0 如何安装？

building of `ctc_crf` module failed due to c++ API deprecation in torch 1.11

Please read PyTorch 1.11 release notes, CUDA

We should possibly update the binding.cpp and move to modern torch api.

CAT/src/ctc_crf/binding.cpp

Lines 9 to 14 in a89faa3

 #include "gpu_ctc/ctc.h" 

 #include <TH.h> 

 #include <THC.h> 

 #include <THCTensor.h> 

 #include <algorithm> 

 #include <torch/extension.h>

This is in progress.

aishell recipe 训练中log的含义

你好，我用你们v2的版本，跑通aishell recipe．看了训练过程中打印出的log，有点不太懂每一列的含义，
１）为啥loss还能负值呢，还有括号里的值含义是啥？Loss_real的含义还有括号里的含义，希望可以详细解释下，比如：

Epoch: [1][ 940/11260] Time 1.033 ( 0.931) Data 0.023 ( 0.031) Loss -2.0485e+01 (-1.2205e+00) Loss_real 6.3320e+01 (7.0811e+01)

谢谢

Decoding error: Cannot read file at utils/scoring/wer_per_spk_details.pl line 135

@liubin3702
已经完成了完整的训练并且测试了在解码之前的字错误率。我现在遇到的问题是，在步骤8 TLG解码测试时，执行ctc-crf/decode.sh 遇到了“run.pl: job failed, log is in exp/TDNN_LSTM/decode_test/scoring_kaldi/log/stats1.log”的报错，log中没用error，只有一行“Cannot read file at utils/scoring/wer_per_spk_details.pl line 135.”疑似错误，无法定位到问题在哪里；能否帮忙看看可能是哪里出的问题？

Originally posted by @liubin3702 in #64 (comment)

baseline aishell result

whether train parameters is use the default？

Issue on speed of denominator calculation

Hi, Thank you for sharing this code first of all.
I'm curious on the denominator calculation speed when training, can you share some information on this?
Did you try this method on any mandarin dataset? like HKUST or AISHELL? I tried on an in-house mandarin dataset of my lab, the modeling unit is syllable(>1000) by the way, the denominator calculation seems too slow. The generated denominator fst has about 2600 states and 24e5 arcs ( it's a bigram syllable lm ). It takes about 2.6s to calculate on one example with 200 frames on a RTX-2080Ti GPU, and 7.8s for 600 frames. Should I expect this to happen?
I think the reason might be the large number of modeling unit, which becomes a cause of a big dense denominator graph. Do you have any way to handle this situation better?

Thanks!

安装 ctc-crf 包的问题

按照教程按照安装没问题中间没看到报错，显示安装成功；但报错： No module named 'ctc_crf._C'

示例代码

请问egs里面没有相关闽南语？

utils/prep_ctc_trans.py

utils中没有这个文件

CTC Beam Search for CTC-CRF

我两个个问题想问下，
1 得到logits后，如果再接softmax，是不是可以得到音素的概率分布？
2 可以对得到的音素概率采用CTC greedy或CTC beam search解码吗？这样可以得到最佳的音素序列吗？还是说需要den_lm.fst参与计算才可以?（这里不需要涉及到音到字的解码)

谢谢

测试 thchs30 实验语言模型出现问题

在实验 thchs30数据的时候，执行训练语言模型的脚本 train_lm.sh 时，出现下述的异常：
Usage: optimize_alpha.pl alpha1 perplexity@alpha1 alpha2 perplexity@alpha2 alpha3 perplexity@alph3 at /opt/kaldi//tools/kaldi_lm/optimize_alpha.pl line 26.

python steps/train.py error

Hi, my env :
torch 1.2.0
python:3.6.5

When I run egs/libri/run.sh
python steps/train.py --output_unit=72 --lamb=0.01 --data_path=$dir got error:
from batchnorm_utils import pytorch as batch_norm ModuleNotFoundError: No module named 'batchnorm_utils'

How can i solve it ? Look forward to your reply~
Thank you

关于音位特征功能支持

您好，论文《MULTILINGUAL AND CROSSLINGUAL SPEECH RECOGNITION USING PHONOLOGICAL-VECTOR BASED PHONE EMBEDDINGS》在跨语言语音识别中是个有意思的工作，其中提到相关工作会在本工具包中实现，想问下目前的支持计划是怎么样的呢？谢谢～

SIGSEGV segmentation fault

Hi, thank you for your work.

I import your code as ctc-crf package into my DL model.
In my training code, I calculate loss with
loss = CRFLoss(logits, labels, inputLenBatch, input_lengths)

I run my training code with faulthandler, and I received segmentation fault.

train_crf.py import general_crf.py, and general_crf.py import CRFLoss from __init__.py.

in __init__ line86 is the blank_label index

By the way, due to different data & dataloader, my batchsize is in logits.size(1), I only changed this part.

As you can see, the last input of gpu_ctc is blank_label index.

However when I run training code with gdb, I received segmentation fault too, but occurs on different line.

I guess it's because of options in get_workspace_size.
So the problem is still on blank_label index.

I don't understand why blank_label index causes this fault.
wandering how to solve it, could you help?

error: identifier "__shfl_up_sync" is undefined

make ctc_crf use cuda 10.0

Issue in compiling Kaldi patch

Copying patch in bin directory and compiling is giving error as it is dependent on other files from directories which are one level above

(base):~/kaldi/src/bin$ gcc -o latgen-faster latgen-faster.cc
latgen-faster.cc:23:31: fatal error: base/kaldi-common.h: No such file or directory

Can you pls suggest the steps to resolve it?

get_word_map.pl: command not found

when I run the script commonvoice/run_mc.sh in line 98 local/mozilla_train_lms.sh , one Error occurred:
local/mozilla_train_lms.sh: line 63: get_word_map.pl: command not found

seems like missing the file get_word_map.pl
Is my problem here?
asking for your help

Can you provide docker under centos? Thank you!

blank index的问题

虽然我已经反复的看了问题#11和问题#22的回答，但是依然对blank index的用法不太清楚，在此询问下。
训练时，ctc_crf.py中：
costs_ctc = torch.zeros(logits.size(0))
act = torch.transpose(logits, 0, 1).contiguous()
grad_ctc= torch.zeros(act.size()).type_as(logits)
ctc_crf_base.gpu_ctc(act, grad_ctc, labels, label_lengths, input_lengths, logits.size(0), costs_ctc, 0)
ctc_crf_base.gpu_den(logits, grad_den, input_lengths.cuda(), costs_alpha_den, costs_beta_den)
由于gpu_ctc函数的参数blank设置为0，所以logits中blank index必然为0，所以gpu_den的输入blank index也为0

生成denominator时：
run.sh中：

Prepare denominator

python3 ctc-crf/prep_ctc_trans.py data/lang_phn/lexicon_numbers.txt data/train_tr95/text "<UNK>" > data/train_tr95/text_number || exit 1
cat data/train_tr95/text_number | sort -k 2 | uniq -f 1 > data/train_tr95/unique_text_number || exit 1
mkdir -p data/den_meta
chain-est-phone-lm ark:data/train_tr95/unique_text_number data/den_meta/phone_lm.fst || exit 1
python3 ctc-crf/ctc_token_fst_corrected.py den data/lang_phn/tokens.txt | fstcompile | fstarcsort --sort_type=olabel > data/den_meta/T_den.fst || exit 1
fstcompose data/den_meta/T_den.fst data/den_meta/phone_lm.fst > data/den_meta/den_lm.fst || exit 1
echo "Prepare denominator finished"

我查看了lexicon.txt、units.txt和lexicon_num.txt，推理出unique_text_number 中的blank index依然是0？
虽然tokens.txt中<blk>为1，但是ctc_token_fst_corrected.py程序中
for line in lines:
sp = line.split()
phone = sp[0]
if phone == '' or phone == '':
continue
<eps>和<blk>都continue了，所以意味着blank index依然是0吗？
为什么在#11中，您解答blank index为1，这个地方不太懂。
@alex-ht

JoinAP related sources have been removed

Hi, It seems that I could't find JoinAP related content in this repo anymore, could you please help me with it?

new eroor on mc flatphone finetuning

CAT/egs/commonvoice/run_mc.sh

Lines 248 to 274 in 15ed6f2

 finetune_dir="exp/mc_flatphone_finetune/" 

 if [ $stage -le 8 ] && [ $stop_stage -ge 8 ]; then 

 # finetune 

 if [[ $NODE == 0 && ! -f $dir/scripts.tar.gz ]]; then 

 echo "" 

 tar -zcf $dir/scripts.tar.gz $(readlink ctc-crf) $0 

 elif [ $NODE == 0 ]; then 

 echo "" 

 echo "'$dir/scripts.tar.gz' already exists." 

 echo "If you want to update it, please manually rm it then re-run this script." 

 fi 

 for x in de; do 

 CUDA_VISIBLE_DEVICES=0,1,2 \ 

 python3 ctc-crf/train.py --seed=0 \ 

 --world-size 1 --rank $NODE \ 

 --batch_size=128 \ 

 --resume=$dir/ckpt/bestckpt.pt \ 

 --den-lm=data/den_meta_${x}/den_lm.fst \ 

 --mc-conf=conf/mc_flatphone_finetune_${x}.json \ 

 --trset=data/pickle/train_${x}.pickle \ 

 --devset=data/pickle/dev_${x}.pickle \ 

 --dir=$finetune_dir \ 

 --config=$dir/config.json \ 

 --data=data/train_${x}_sp 

 done 

 fi

when I run fine-tune stage, occurrd this error, could you help me find the solution?

'exp/mc_flatphone/scripts.tar.gz' already exists.
If you want to update it, please manually rm it then re-run this script.
Global number of GPUs: 1
Use GPU: local[0] | global[0]
> Data prepare
  Data prepared.

>>> Disable SpecAug <<<

[GPU 0]: Resuming from: exp/mc_flatphone/ckpt/bestckpt.pt
Traceback (most recent call last):
  File "ctc-crf/train.py", line 211, in <module>
    main_spawner(args, main_worker)
  File "ctc-crf/train.py", line 38, in main_spawner
    mp.spawn(_main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/ssdhome/sardar321/CAT/egs/commonvoice/ctc-crf/train.py", line 94, in main_worker
    manager = coreutils.Manager(build_model, args)
  File "/ssdhome/sardar321/CAT/scripts/ctc-crf/coreutils.py", line 66, in __init__
    self.model, lr = update_model(
  File "/ssdhome/sardar321/CAT/scripts/ctc-crf/mc_lingual.py", line 85, in update_model
    src_idx, des_idx, pv, hdim, odim, mode, usg, lr = load_mc_conf(args)
  File "/ssdhome/sardar321/CAT/scripts/ctc-crf/mc_lingual.py", line 74, in load_mc_conf
    pv = load_pv(config["P"])
  File "/ssdhome/sardar321/CAT/scripts/ctc-crf/mc_lingual.py", line 61, in load_pv
    pv = np.load(fin)
  File "/ssdhome/sardar321/anaconda3/envs/torch/lib/python3.8/site-packages/numpy/lib/npyio.py", line 417, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
TypeError: expected str, bytes or os.PathLike object, not NoneType

关于前后向算法中alpha和beta的问题

从源码中看fst_read.cc中ReadFst函数看，beta_next和alpha_next的具体含义是否跟常规的前后向算法一致？
感觉这俩实际赋值，跟常规计算的公式，是反着的
是正常的 alpha的计算需要看前面的 beta的计算需要看后面的没问题的

2、还有一个问题。就是path_weight这个在loss的计算中基本是没有用到的，请问计算这个值的作用是什么呢?

Issue in make OPENFST=/path/to/your/openfst

I am using pytorch 1.5 and using python setup_1_0.py install in the ctc_crf/Makefile.

/home/anaconda3/envs/torch_w2l/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/expanding_array.h:23:7: note: in call to ‘torch::ExpandingArray<2ul, double>& torch::ExpandingArray<2ul, double>::operator=(const torch::ExpandingArray<2ul, double>&)’
class ExpandingArray {
^
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/anaconda3/envs/torch_w2l/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1395, in _run_ninja_build
subprocess.run(
File "/home/anaconda3/envs/torch_w2l/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred::
:
_write_ninja_file_and_compile_objects(
File "/home/anaconda3/envs/torch_w2l/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1135, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/home/anaconda3/envs/torch_w2l/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1413, in _run_ninja_build
raise RuntimeError(message)
RuntimeError: Error compiling objects for extension
Makefile:22: recipe for target 'CTCCRF' failed
make: *** [CTCCRF] Error 1

Can you pls suggest next steps?

	# uncomment the following line if you want to use specified GPUs
	# CUDA_VISIBLE_DEVICES="0,1" \
	python3 ctc-crf/train.py --seed=0 \
	--world-size 1 --rank $NODE \
	--batch_size=128 \
	--dir=$dir \
	--config=$dir/config.json \
	--trset=data/pickle/train.pickle \
	--devset=data/pickle/dev.pickle \
	--data=$DATAPATH \|\|
	exit 1
	fi

	#include "gpu_ctc/ctc.h"
	#include <TH.h>
	#include <THC.h>
	#include <THCTensor.h>
	#include <algorithm>
	#include <torch/extension.h>

	finetune_dir="exp/mc_flatphone_finetune/"
	if [ $stage -le 8 ] && [ $stop_stage -ge 8 ]; then
	# finetune
	if [[ $NODE == 0 && ! -f $dir/scripts.tar.gz ]]; then
	echo ""
	tar -zcf $dir/scripts.tar.gz $(readlink ctc-crf) $0
	elif [ $NODE == 0 ]; then
	echo ""
	echo "'$dir/scripts.tar.gz' already exists."
	echo "If you want to update it, please manually rm it then re-run this script."
	fi

	for x in de; do
	CUDA_VISIBLE_DEVICES=0,1,2 \
	python3 ctc-crf/train.py --seed=0 \
	--world-size 1 --rank $NODE \
	--batch_size=128 \
	--resume=$dir/ckpt/bestckpt.pt \
	--den-lm=data/den_meta_${x}/den_lm.fst \
	--mc-conf=conf/mc_flatphone_finetune_${x}.json \
	--trset=data/pickle/train_${x}.pickle \
	--devset=data/pickle/dev_${x}.pickle \
	--dir=$finetune_dir \
	--config=$dir/config.json \
	--data=data/train_${x}_sp
	done
	fi

thu-spmi / cat Goto Github PK

cat's People

Contributors

Stargazers

Watchers

Forkers

cat's Issues

Prepare denominator

Recommend Projects

Recommend Topics

Recommend Org