z-yq / tensorflowasr Goto Github PK

一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目，CPU上的实时率(RTF)小于0.1

License: Apache License 2.0

Python 66.72% C++ 32.32% CMake 0.04% Shell 0.01% C 0.91%

transformer bert tensorflow2 automatic-speech-recognition state-of-the-art ctc listen-attend-and-spell transducers cpp tensorflow-cpp

tensorflowasr's Introduction

TensorflowASR

基于Conformer的Tensorflow 2版本的端到端语音识别模型，并且CPU的RTF(实时率)在0.1左右

当前branch为V2版本，为CTC+translate结构

欢迎使用并反馈bug

旧版请看 V1版本

项目对比

Aishell-1 上训练结果：

离线结果

Name	参数量	中文CER	训练轮数	online/offline	测试数据	解码方式
Wenet(Conformer)	9.5M	6.48%	100	offline	aishell1-test	ctc_greedy
Wenet(transformer)	9.7M	8.68%	100	offline	aishell1-test	ctc_greedy
Wenet(Paraformer)	9.0M	6.99%	100	offline	aishell1-test	paraformer_greedy
FunASR(Paraformer)	9.5M	6.37%	100	offline	aishell1-test	paraformer_greedy
FunASR(Conformer)	9.5M	6.64%	100	offline	aishell1-test	ctc_greedy
FunASR(e_branchformer)	10.1M	6.65%	100	offline	aishell1-test	ctc_greedy
repo(ConformerCTC)	10.1M	6.8%	100	offline	aishell1-test	ctc_greedy

流式结果

Name	参数量	中文CER	训练轮数	online/offline	测试数据	解码方式
Wenet(U2++Conformer)	10.6M	8.18%	100	online	aishell1-test	ctc_greedy
Wenet(U2++transformer)	10.3M	9.88%	100	online	aishell1-test	ctc_greedy
repo(StreamingConformerCTC)	10.1M	7.2%	100	online	aishell1-test	ctc_greedy
repo(ChunkConformer)	10.7M	8.9%	100	online	aishell1-test	ctc_greedy

实现功能

VAD+降噪
在线流式识别/离线识别
标点恢复
TTS数据增强
音色转换数据增强
远近场数据增强

其它项目

TTS：https://github.com/Z-yq/TensorflowTTS

NLU: -

BOT: -

TTS数据增强系统

没有数据也可以达到一定水平的ASR效果哟。

针对ASR的TTS：训练数据为aishell1和aishell3，数据类型比较适合ASR。

tips:

一共有500个音色
仅支持中文
如果待合成文本有标点符号请手动去除
如果想添加停顿，请在文本中间添加sil

step1：准备一个待合成的文本列表,假如命名为text.list, egs：

这是第一句话
这是第二句话
这是一句sil有停顿的话
...

step2: 下载model

链接：https://pan.baidu.com/s/1deN1PmJ4olkRKw8ceQrUNA 提取码：c0tp

两个都要下载，然后放到目录 ./augmentations/tts_for_asr/models 下面

step3：然后在根目录下运行脚本：

python ./augmentations/tts_for_asr/tts_augment.py -f text.list -o save_dir --voice_num 10 --vc_num 3

其中：

-f 是step1准备的列表

-o 用于保存合成的语料路径，建议是绝对路径。

--voice_num 每句话用多少个音色合成

--vc_num 每句话使用音色转换增强多少次

运行完毕后，会在 -o 下生成wavs目录和utterance.txt

Mel Layer

参照librosa库，用TF2实现了语音频谱特征提取的层。

或者可以使用更小参数量的Leaf 。

使用:

am_data.yml

mel_layer_type: Melspectrogram #Spectrogram/leaf
trainable_kernel: True #support train model,not recommend

Cpp Inference

已经更新基于ONNX的CPP项目，

详见 CppInference ONNX

Python Inference

基于ONNX的python inference方案，详情见python inference

Streaming Conformer

现在支持流式的Conformer结构啦。

当前实现了两种方式：

Block Conformer + Global CTC
- 可用于有VAD的短时识别系统，global CTC 来构建上下文信息。
Chunk Conformer + CTC Picker
- 参考了百度的SMLTA2，先利用音素CTC采样出有效的Feature，再给到lookahead的chunk conformer进行上下文信息构建做出预测。可用于长时间的流式识别系统。

同epoch训练下，Block Conformer和全局conformer的CER仅差0.8%。

Causal Chunk Conformer做了存储管理，默认配置下一次推理的算力消耗是 350MFlops。

两种方式的逻辑如图：

Pretrained Model

所有结果测试于 AISHELL TEST 数据集.

RTF(实时率) 测试于CPU单核解码任务。

AM:

Model Name	Mel layer(USE/TRAIN)	link	code	train data	phoneme CER(%)	Params Size	RTF
ConformerCTC(S)	True/False	pan.baidu.com/s/1k6miY1yNgLrT0cB-xsqqag	8s53	aishell-1(50 epochs)	6.4	10M	0.056
StreamingConformerCTC	True/False	pan.baidu.com/s/1Rc0x7LOiExaAC0GNhURkHw	zwh9	aishell-1(50 epochs)	7.2	15M	0.08
ChunkConformer	True/False	pan.baidu.com/s/1o_x677WUyWNld-8sNbydxg	ujmg	aishell-1(50 epochs)	11.4	15M	0.1

VAD:

Model Name	link	code	train data	params size	RTF
8k_online_vad	pan.baidu.com/s/1ag9VwTxIqW4C2AgF-6nIgg	ofc9	openslr开源数据	80K	0.0001

Punc:

Model Name	link	code	train data	acc	params size	RTF
PuncModel	pan.baidu.com/s/1gtvRKYIE2cAbfiqBn9bhaw	515t	NLP开源数据	95%	600K	0.0001

使用：

test_asr.py 中将model转成onnx文件放入pythonInference中

Community

欢迎加入，讨论和分享问题。群已满200人需邀请进入，请添加备注"TensorflowASR"。

What's New?

Supported Structure

CTC+Streaming

Supported Models

Conformer
BlockConformer
ChunkConformer

Requirements

Python 3.6+
Tensorflow 2.8+: pip install tensorflow-gpu 可以参考 https://www.bilibili.com/read/cv14876435
librosa
pypinyin if you need use the default phoneme
keras-bert
addons For LAS structure,pip install tensorflow-addons
tqdm
tf2onnx
rir_generator pip install rir-generator
onnxruntime pip install onnxruntime or pip install onnxruntime-gpu

Usage

准备train_list和test_list.

asr_train_list 格式，其中'\t'为tap，建议用程序写入一个文本文件中，路径+'\t'+文本

wav_path="xxx/xx/xx/xxx.wav"
wav_label="这是个例子"
with open('train.list','w',encoding='utf-8') as f:
  f.write(wav_path+'\t'+wav_label+'\n') :

例如得到的train.list：

/opt/data/test.wav	这个是一个例子
......

以下为vad和标点恢复的训练数据准备格式（非必需）：

vad_train_list 格式:

wav_path1
wav_path2
……

例如：

/opt/data/test.wav

vad训练内部处理逻辑是靠能量做训练样本，所以确保你准备的训练语料是安静条件下录制的。

punc_train_list格式：

 text1
 text2
 ……

同LM的格式，每行的text包含标点，目前标点只支持每个字后跟一个标点，连续的标点视为无效。

比如：

这是：一个例子哦。 √(正确格式）

这是：“一个例子哦”。 ×(错误格式）

这是：一个例子哦“。 ×(错误格式）

下载bert的预训练模型，用于标点恢复模型的辅助训练，如果你不需要标点恢复可以跳过:
```
 https://pan.baidu.com/s/1_HDAhfGZfNhXS-cYoLQucA extraction code: 4hsa
```
修改配置文件 am_data.yml (./asr/configs)来设置一些训练的选项，以及修改model yaml（如：./asr/configs/conformer.yml）里的name参数来选择模型结构。

然后执行命令:

python train_asr.py --data_config ./asr/configs/am_data.yml --model_config ./asr/configs/ConformerS.yml

想要测试时，可以参考 ./test_asr.py 里写的demo,当然你可以修改 stt 方法来适应你的需求:
```
 python ./test_asr.py  
```

也可以使用Tester 来大批量测试数据验证你的模型性能:

执行:

python eval_am.py --data_config ./asr/configs/am_data.yml --model_config ./asr/configs/ConformerS.yml

该脚本将展示 SER/CER/DEL/INS/SUB 几项指标

6.训练VAD或者标点恢复模型，请参照以上步骤。

Tips

如果你想用你自己的音素，需要对应 am_dataloader.py 里的转换方法。

def init_text_to_vocab(self):#keep the name
    
    def text_to_vocab_func(txt):
        return your_convert_function

    self.text_to_vocab = text_to_vocab_func #here self.text_to_vocab is a function,not a call

不要忘记你的音素列表用 <S> 和 </S> 打头,e.g:

    <S>
    </S>
    de
    shì
    ……

References

参考了以下优秀项目：

https://github.com/usimarit/TiramisuASR

https://github.com/noahchalifour/warp-transducer

https://github.com/PaddlePaddle/DeepSpeech

https://github.com/baidu-research/warp-ctc

Licence

允许并感谢您使用本项目进行学术研究、商业产品生产等，但禁止将本项目作为商品进行交易。

Overall, Almost models here are licensed under the Apache 2.0 for all countries in the world.

Allow and thank you for using this project for academic research, commercial product production, allowing unrestricted commercial and non-commercial use alike.

However, it is prohibited to trade this project as a commodity.

tensorflowasr's People

Contributors

Stargazers

Watchers

Forkers

wangbq18 orangebaowang caoyuji1986 sirilanka haoyz samsgates arryboom freefly518 wangwei7911 yyht deeplearning2012 hh1992 jupinter hannes1 phecda-xu wuyx517 xiexukang overflow001 songmyekyo zoucan520 zzf-zhu-miracle hihalue jet-voice spxia whitefu zyz0577 anigi98932 mrcuihao forestlee zhuleiustc1983 lidianxiang sunying1985 liutong-1997 zerochenml zpeng1989 shawnshenjx weimingtom zomun evelynn-n yasinjan99 firststeping hommmm ishine jiangliuer-beep pkuvanilla1207 fanhuafeng mr-late-93 liukai123456789 tyzhong coolwind8214 dpchen987 tim-chen-code dnfcallan laoyin thinkinchaos fakhraddin zhouzhou0322 a304628356 hailangzz stevenchen1976 three-sheep lem89757 mollnn shuiniu86 rexiome five-hundred-years-ago hanggun gouqi666 beatlesctr rapidai tangmingxing1988 luckylhy wyt1234 flow-boy fmzzq wy676579037 le-xiaohuai-speech wendongj xk-wang stevenhailin wangganglab alex-songs road2018 ponder-lab xbsdsongnan vincent131499 xiao2mo flashbird tian14267 frozenzero lemoncat7 shiyybua buddy23333 chaos-observer liuqasd grasshourse veryquant yexiaoya lingchivalry markofred

tensorflowasr's Issues

关于使用Tester 来大批量测试

请问可以直接下载您训练好的模型，如ConformerCTC(S)，用来测试吗？
我使用您训练好的ConformerCTC(S)，在准备好的test_list下，当运行python eval_am.py --data_config ./asr/configs/am_data.yml --model_config ./asr/configs/ConformerS.yml 的时候会出现一下错误：
outputs = call_fn(inputs, *args, **kwargs)
TypeError: call() got multiple values for argument 'training'
想请问一下这怎么解决

请问一下如何只使用cpu推理的方式

看到说要求tensorflow-gpu，并且里面也没有单独说明不用gpu的方式，想请教一下如何只使用cpu运行，谢谢

语言模型训练太慢

我计划基于 TensorflowASR 训练一个语言模型。使用的数据规模在1000万句。使用默认的 train_lm.py，且按照说明把 BERT 模型也下载了下来。

按照目前的速度，一个 epoch要的时间要上百天。
我使用的是 3090Ti 单卡。

如何加载pretrain 模型

如何加载pretrain 模型，目录结构是什么样的？

关于2020/12/1的更新，自动打标点语言模型

作者您好，您在此博文中提到2020/12/1的开源的自动打标点的语言模型链接已经失效，能麻烦您补一下链接吗？谢谢！如有打扰，请多见谅☺️

bert_feature_loss in LM

请问在语言模型中设置bert_feature_loss的动机是什么呢？

流式识别

想做一个流式识别的demo，实时采集麦克风的数据。想到的思路就是缓存接收麦克风的数据隔一段时间（比如0.5s）送一次识别，检测到一句话结束后清除缓存。求教这样是否合理，有没有更优的方案。

一个题外话，交流群的二维码失效了，不知道是否可以再放一次

想进群交流交流，不知道您是否可以再放一次新的二维码

mel_layer可能存在问题

你好：

基于你给的框架我用一维膨胀卷积（dilate Conv1D）搭建了一个TDNNTransducer的网络，尝试训练时遇到下面这种情况的报错：

indices[1] = [1 , 776, 22] does not index into shape [4, 776, 23]

错误位置在
这里
last_grads_blank = -1 * tf.scatter_nd( tf.concat([tf.reshape(tf.range(batch_size, dtype=tf.int32), shape=[batch_size, 1]), indices], axis=1), tf.ones(batch_size, dtype=tf.float32), [batch_size, input_max_len, target_max_len])

打印中间的信息：

input_max_len = 776

logit_length = [398, 777, 378, 777]

indices = 
[[0 397 12]
 [1 776 22]
 [2 377 12]
 [3 776 22]]

[batch_size, input_max_len, target_max_len]  = [4, 776, 23]

tf.scatter_nd() 使用时要求indices 每一列的最大值要比 [batch_size, input_max_len, target_max_len] 对应的值小，否则就会报出前面的索引错误。

至于为什么会出现相等的情况（示例中indices是776，与input_max_len值相等），从logit_length 可以看出，网络计算的logits对应维度大小为777，而给定的是776，实际上logits对应维度应该是776，计算indices时 776-1 = 775 ，这样才能满足条件。

这里可以看到，input_max_len是生成batch时，用传统方法提取声学特征计算出来的，当使用mel_layer时，是直接输入的raw数据，也就是说用传统方法提取声学特征和mel_layer自动处理raw数据生成的特征帧数是不一致的，mel_layer多了一帧。

将 use_mel_layer 设置为False后，训练正常了，验证了我的判断。

另外，这个错误不是立刻出现的，是经过十几个batch的正常计算后出现的。

不知您是否能提供完整的测试样例呢

感恩你做出来的完美工作
不知您是否能提供完整的测试样例呢~
从语音到输出文字我新手看这代码完全不知道如何下手
感谢

有的语音识别不成功

大佬好，先给项目点个赞。
我使用CPP进行inference，为什么有的语音可以成功识别，有的语音运行后没结果呢，也没报错。是语音的问题吗？

在执行train_am.py 文件时出现如下警告

我在执行下面代码的时候。出现如下警告。其他人有吗？
python train_am.py --data_config ./configs/am_data.yml --model_config ./configs/conformer.yml

WARNING:tensorflow:7 out of the last 9 calls to <function MultiHeadAttention.call at 0x7f2b902fb840> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-11-17 10:08:31,362 - tensorflow - WARNING - 7 out of the last 9 calls to <function MultiHeadAttention.call at 0x7f2b902fb840> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

Fix pacakges name in requirements.txt

hi:

Getting error when use pip install -r requirements.txt.

It should be keras-bert and tensorflow-addons.

成功保存checkpoint 后程序不执行了。但gpu还在占用

环境：ubuntu16 tensorflow2.2
数据集：aishell1
配置文件：conformer.yml model_config.name=ConformerCTC\ConformerLAS
运行代码：python train_am.py --data_config ./configs/am_data.yml --model_config ./configs/conformer.yml

2020-11-16 15:36:09,744 - numba.core.ssa - DEBUG - on stmt: $478.2 = cast(value=$const478.1)
2020-11-16 15:36:09,744 - numba.core.ssa - DEBUG - on stmt: return $478.2
INFO:tensorflow:batch_all_reduce: 203 all-reduces with algorithm = nccl, num_packs = 1
2020-11-16 15:36:18,367 - tensorflow - INFO - batch_all_reduce: 203 all-reduces with algorithm = nccl, num_packs = 1
[Train] [Epoch 123/2] |▋ | 500/14016 [34:39<12:23:38, 3.30s/batch, Successfully Saved Checkpoint]

大家有遇到这个问题的吗？
有知道是什么原因的吗？

transducer data process

hello:
I want to train rnn-transducer, where to use prepand_blank() of TextFeaturizer? thank you.

mask of multiheadattention in conformer ？

Hi
I see the multiheadattention in conformer needs mask, Is there the code to generate the mask? thank you

训练am模型的时候，内存一直下降。下降到几百兆

训练am模型
数据是aishell 1
am_data.yml
conformerS.yml

有遇到过这个问题的吗？

hi, i re-implement conformer based on your project, when i train conformer with ctc loss on chinese dataset, loss is fluctuating

hi， thanks to your peoject, i re-implement conformer for tf.1.15. When i train it with ctc loss on 1000-h chinese audio dataset, the loss is fluctuating and could not decline.
the conv-sampling is 3-layer conv2d with 144 filters and kernel_size=3, reduction_factor=4
the conformer is just like the bert-base but with relative position encoding using t-5
the optimizer is adam-with-weight-decay, params are default as bert-base
.
could you help me with this?

Incompatible shapes in lm_runner

tensorflow.python.framework.errors_impl.InvalidArgumentError: 4 root error(s) found.
  (0) Invalid argument:  Incompatible shapes: [1,51,768] vs. [1,52,768]
         [[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
         [[Adamax/concat_8/_2086]]
  (1) Invalid argument:  Incompatible shapes: [1,51,768] vs. [1,52,768]
         [[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
         [[replica_2/Cast_5/_1460]]
  (2) Invalid argument:  Incompatible shapes: [1,51,768] vs. [1,52,768]
         [[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]
         [[replica_1/transformer/decoder/sequential_5/dense_42/Tensordot/Prod/_1302]]
  (3) Invalid argument:  Incompatible shapes: [1,51,768] vs. [1,52,768]
         [[node replica_2/sub (defined at ./trainer/lm_runners.py:100) ]]

对应的是 def bert_feature_loss(self, real, pred):

我现在的方案是判定如果 real 和 pred shape 不同，就先跳过

dict目录下缺少必要文件

dict目录下缺少pinyin.txt和lm_tokens.txt，希望大佬能加上来

标点预测模型转换TFLITE问题

您好，根据您百度盘分享的H5模型文件，进行转换，提示缺少模型配置。是否可以存一个带模型图结构和参数的H5文件，麻烦您了。

ValueError: No model config found in the file at models/model_0.h5.

tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed

tensorflow==2.2.0
tensorflow-addons==0.10.0
keras-bert>=0.81.0

The above are some installation package versions，when I run "python train_am.py", error "tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed " happended.
Thanks for your help.

can i use it in a windows system?

I wanto use it in a windows system, there are something I need pay attention to ?

Fix model input error

更新代码后尝试训练transducer模型时遇到如下报错：

 Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'>

经过分析后发现问题出在这里：

实际需要的输入是这样
call(self, features, predicted=None, training=False)

features是一个信号或者特征的tensor，predicted是label的tensor

给模型的输入是这样
logits = self.model([wavs, pred_inp], training=True)
logits = self.model([features, pred_inp], training=True)

此时的features 变成了一个list从而导致在mel_layer里对 tensor进行tanspose操作时报错，应该把中括号去掉。

希望添加命令词模型或例子教程

现在几乎所有语音库感觉都是语音训练库，不好直接使用，名不副实。
希望分个库专门提供特定语音识别功能（功能，而不是学术倾向的训练器）。提供VAD函数及分析函数，参数是音频，返回是否含有语音，及音频识别结果。
库的功能可以是：语音转命令/词组、语音转音素、语音转句子。

我现在最需要的是语音转音素，用在单片机，训练出来的库越小越好，不能超过1M尺寸。识别库需要支持c/c++。
作为参考，上海乐鑫ESP32这MCU只有4M存储500KB不到的内存，但是官方的语音识别可以做到语音转音素，但不开源。

你好，考虑将silero-vad加入到项目中吗

数字好像无法识别，这个应该如何优化处理？

比如：语音2020你那12月12号，识别出结果： "年。月。号。/S"

想问一下楼主这个出错是什么问题

Traceback (most recent call last):
File "run-test.py", line 82, in
asr = ASR(am_config, lm_config,punc_config)
File "run-test.py", line 12, in init
self.lm = LM(lm_config,punc_config)
File "/home/w/ASR/LMmodel/trm_lm.py", line 16, in init
self.am_featurizer = TextFeaturizer(config['am_token'])
File "/home/w/ASR/utils/text_featurizers.py", line 78, in init
self.stop = self.endid()
File "/home/w/ASR/utils/text_featurizers.py", line 90, in endid
return self.token_to_index['']
KeyError: ''

训练loss不下降

您好，我用conformerS模型在AISHELL-1数据集上训练了20个epoch，但是loss一直不下降，准确度为0，似乎没有训练效果。
config用的都是项目默认的配置，只有数据位置改了一下

data.config

speech_config:
mel_layer_type: Melspectrogram #Spectrogram/Melspectrogram/leaf
mel_layer_trainable: False #leaf support train
add_wav_info: False
sample_rate: 16000
frame_ms: 25
stride_ms: 10
num_feature_bins: 80
reduction_factor: 4 #should keep the same with model_config, DS2 : time_reduction_factor *= s[0] for s in 'conv_strides'
train_list: '/remote-home/jzhan/Datasets/AISHELL-1/train/transcripts.txt'
eval_list: '/remote-home/jzhan/Datasets/AISHELL-1/dev/transcripts.txt'
wav_max_duration: 30 # s
only_chinese: True
streaming: False
streaming_bucket: 0.5 #s
pinyin_map: './asr/configs/dict/pinyin2phone.map'
inp_config:
vocabulary: './asr/configs/dict/pinyin.txt'
blank_at_zero: False
beam_width: 1
tar_config:
vocabulary: './asr/configs/dict/lm_tokens.txt'
blank_at_zero: False
beam_width: 1
augments_config:
noise:
active: False
sample_rate: 16000
SNR: [0,15]
noises: './noise'
masking:
active: False
zone: (0.1,0.9)
mask_ratio: 0.3
mask_with_noise: False
pitch:
active: False
zone: (0.0,1.0)
sample_rate: 16000
factor: (-1,3)
speed:
active: False
factor: (0.9,1.2)
hz:
active: False
optimizer_config:
lr: 0.001
warmup_steps: 10000
beta1: 0.9
beta2: 0.98
epsilon: 0.000001
running_config:
batch_size: 32
train_steps_per_batches: 10
eval_steps_per_batches: 10
num_epochs: 20
outdir: './models'
log_interval_steps: 300
eval_interval_steps: 500
save_interval_steps: 500

conformerS.config

model_config:
name: OfflineConformerCTC
dmodel: 144
reduction_factor: 4
num_blocks: 13
head_size: 36
num_heads: 4
kernel_size: 32
fc_factor: 0.5
dropout: 0.1
ctcdecoder_num_blocks: 1
ctcdecoder_kernel_size: 32
ctcdecoder_fc_factor: 0.5
ctcdecoder_dropout: 0.1
translator_num_blocks: 2
translator_kernel_size: 32
translator_fc_factor: 0.5
translator_dropout: 0.1

训练结果

大佬可以更新下微信群吗，想交流一下

关于train_loss、ctc_loss和translate_loss的关系有些疑问

您好，我在代码中看了一下loss的计算。
我的理解是ctc_loss是encoder+decoder的loss，translate_loss是translator的loss，train_loss=ctc_loss+translate_loss*2，最后用train_loss对整个模型进行端到端的训练，是这样吗
但是我看训练过程打印出来的loss不符合这个规律，比如某时刻是train_loss=210.327, ctc_loss=3.709, translate_loss=1.432，是我的理解有什么问题吗，希望可以得到解答，谢谢！

No reference in the code

Did you copy the code from my repository https://github.com/usimarit/TiramisuASR?
I looked into the code and I see it look like the code I wrote.
It would be nice if you could write references to where you learn and take source code or any kind of knowledge involving in this repo. Other authors would appreciate it.

train failed

(tf2) root@adminer-X10SRA:~/debug# python train_am.py
2020-11-03 14:07:11.171313: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-03 14:07:11.213020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.635GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2020-11-03 14:07:11.213266: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.214429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-03 14:07:11.215292: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-03 14:07:11.215560: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-03 14:07:11.216746: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-03 14:07:11.217707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-03 14:07:11.220552: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:11.222021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-11-03 14:07:11,222 - root - INFO - valid gpus:1
2020-11-03 14:07:11.236938: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-03 14:07:11.243922: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2500075000 Hz
2020-11-03 14:07:11.244808: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb304000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-03 14:07:11.244827: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-03 14:07:11.386453: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5642b12f8f90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-03 14:07:11.386519: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-11-03 14:07:11.388405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.635GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2020-11-03 14:07:11.388517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.388569: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-03 14:07:11.388614: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-03 14:07:11.388660: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-03 14:07:11.388705: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-03 14:07:11.388751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-03 14:07:11.388798: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:11.392179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-11-03 14:07:11.392285: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-03 14:07:11.395987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-03 14:07:11.396028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-11-03 14:07:11.396044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-11-03 14:07:11.399441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9620 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
not found state file
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
2020-11-03 14:07:12,898 - tensorflow - INFO - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
2020-11-03 14:07:12.948908: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-03 14:07:13.979204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:14,960 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:14,961 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,066 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,067 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,169 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,169 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,257 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,258 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,346 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2020-11-03 14:07:15,346 - tensorflow - INFO - Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[Train] [Epoch 1/2] | | 0/216 [00:00<?, ?batch/s]

def generator(self,train=True):
    while 1:
        x, wavs,bert_feature, input_length, words_label, words_label_length, phone_label, phone_label_length, py_label, py_label_length, txt_label, txt_label_length=self.generate(train)

        guide_matrix = self.guided_attention(input_length, txt_label_length, np.max(input_length),
                                             txt_label_length.max())
        yield x, wavs, bert_feature,input_length, words_label, words_label_length, phone_label, phone_label_length, py_label, py_label_length, txt_label, txt_label_length,guide_matrix

没有新数据

训练语言模型遇到问题

我尝试基于本项目训练语言模型。
修改了 configs/lm_data.yml

train_list: './common.all.1w'
eval_list: './common.all.1w'
...

bert:
  config_json: './LMmodel/bert/bert_config.json'
  bert_ckpt: './LMmodel/bert/bert_model.ckpt'
  bert_vocab: './LMmodel/bert/vocab.txt'

后，运行 python train_lm.py，总是失败。下面是出错日志：

2020-12-22 23:56:06,255 - root - INFO - start training language model
Traceback (most recent call last):
  File "train_lm.py", line 90, in <module>
    train.train()
  File "train_lm.py", line 45, in train
    self.runner.set_datasets(train, test)
  File "/home/user/TensorflowASR/trainer/base_runners.py", line 164, in set_datasets
    self.train_datasets=self.strategy.experimental_distribute_dataset(train)
  File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 805, in experimental_distribute_dataset
    return self._extended._experimental_distribute_dataset(dataset)  # pylint: disable=protected-access
  File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 638, in _experimental_distribute_dataset
    return input_lib.get_distributed_dataset(
  File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 84, in
get_distributed_dataset
    return DistributedDataset(
  File "/home/user/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 659, in __init__
    with ops.colocate_with(dataset._variant_tensor):
AttributeError: 'generator' object has no attribute '_variant_tensor'

我尝试基于 cpu和gpu的tf2.2.0，得到一样的错误日志。

关于VAD的训练集

您好！能否给出训练VAD模型的链接

transducer解码时lstm存在大量重复计算

在Transducer类中的perform_greedy方法里，循环中的pred_net用到了lstm进行解码，每次都会对所有decoded的完整序列走一遍lstm，实际只需要解码最后一步来判断是否输出0或者blank即可，在长序列预测下能节省大量时间

随机出现“generator”取数据报错及处理

你好：

有一个小小的疑问，

在CPU上训练，linux 16.04，使用aishell_1中的几个人的数据（2100条音频，验证代码用）；训练 ConformerTransducer，其它参数默认。

020-10-05 10:28:11,241 - root - INFO - trainer resume failed020-10-05 10:28:11,241 - root - INFO - trainer resume failed
[Train] [Epoch 1/2] |                    | 7/2096 [00:36<2:07:57,  3.68s/batch, transducer_loss=373.089]
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:47,972 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:48,185 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7648c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
...
[Train] [Epoch 1/2] |████▊               | 500/2096 [09:30<26:07,  1.02batch/s, Successfully Saved Checkpoint]
...
[Train] [Epoch 1/2] |█████▏              | 547/2096 [10:39<23:14,  1.11batch/s, transducer_loss=85.972]
...
ValueError: `generator` yielded an element of shape (0,) where an element of shape (None, None, 80, 1) was expected.

第547步出现报错，但是报错并不是只出现在某个固定的步数，是随机出现的。

经过对内部数据出里过程的了解，我发现你在数据处理脚本中做了如下的过滤处理：

if len(data) < 400:
    continue
elif len(data) > self.speech_featurizer.sample_rate * 7:
    continue

也就是说当音频数据（16K采样）时长小于25ms以及大于7s的时候，丢弃。当一个batch的所有音频数据时长都大于7s时，全丢弃，generator就生成None，也就造成上述的错误。

解决方法也很简单，把数字7改大一点就行。

那么问题来了，小于25ms的数据丢弃我可以理解，那大于7s 的也丢弃是为什么呢，超过7s会造成模型识别效果变差所以不用的吗？

你在处理AISHELL2数据集的时候是把所有大于7s的音频都丢弃不用吗？

此外，tensorflow - WARNING部分是什么情况，没看明白？