kaituoxu / speech-transformer Goto Github PK
View Code? Open in Web Editor NEWA PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
line 64 of file run.sh . utils/parse_options.sh || exit 1;
but there is no such folder with such file, how to fix?
当我第一次训练模型时,在train.log中报错,此时我并没有训练好的模型,无法导入继续训练,此时continue_from=""为默认值,空值。
error: unrecognized arguments: --continue_from
Hi,
sorry to bother you,
I want to ask that even I had changed the batch size to 2,
It stills says out of cuda memory while training?
Is it possible? My input dimension is only 23.
Please help me if you have time.
Thanks very much
hi,I met some questions when the network was running step4:decoder
run.pl: job failed , log is in exp/............/decoder_test_beam5_nbest1_ml100/decode.log
the log is :
UnicoderEncoderError: 'ascii' codec can't encode character '\u4e00' in position 29: ordinal not in range(128)
No such file or directory: exp/................/decode_test_beam5_nbest1_ml100/data.json
I used the python 3.6.8 and print the setdefaultencoding is utf-8
How to solve the question?
Thank you
Hi, Could you please offer some demo for this project?
您好,在data.py文件中,return里面有个ilen值,按理来说应该得到的是padding之前的xs的长度?但是这里得到的是长度已经一样的?
我用batch_size=4测试了一下:
ilens = np.array([x.shape[0] for x in xs])
ilens = torch.from_numpy(ilens) # [235, 235, 235, 235] [250, 250, 250, 250]...
好像在前面
batch = load_inputs_and_targets(batch[0], LFR_m=LFR_m, LFR_n=LFR_n)
xs, ys = batch
这里xs就已经padding过了?
这里的ilen值的问题会导致后面的enc_dec_attn_mask全都是false...
在解码时(stage 4),遇到如下问题:UnicodeEncodeError: 'ascii' codec can't encode character '\u751a' in position 6: ordinal not in range(128)。定位到代码,是decoder.py中print('hypo: ' + ''.join([char_list[int(x)] for x in hyp['yseq'][0, 1:]]))这一行代码出了问题。
网上的建议是设置utf-8编码,但是python3.6默认就是这个编码,很苦恼这个问题。
你好,请问这个是实现文献1的code吗?但是我没有在里面看到conv的操作。
Hi, Could you please share with me the reference paper? looking forword to your reply, thx!
Hello, Mr. Xu:
I am a postgraduate student of speech recognition in Jiangnan University. I have long heard of your name. Can you add Wechat to facilitate future academic exchanges(my WeChat is 'zw76859420')?
By Wei Zhang
请问最终的输出是拼音还是汉字? @kaituoxu
Hi, Xu.
I want to use the model to test my own datasets in some real situations.
I have wav but I don't have transcript.
I want to generate some sentences.
Can you tell me what should I do ?
Thanks!
Excuse me ! Is there pre-trained model to share?i couldnt get the good result.
Also,how long sentence the model can be recognition, whether there is length restriction about the model?
Sorry for bother! hope the reply!
(1/7176) decoding BAC009S0764W0121
remeined hypothes: 5
Traceback (most recent call last):
File "/data/gjj/Speech-Transformer/egs/aishell/../../src/bin/recognize.py", line 70, in
recognize(args)
File "/data/gjj/Speech-Transformer/egs/aishell/../../src/bin/recognize.py", line 59, in recognize
nbest_hyps = model.recognize(input, input_length, char_list, args)
File "/data/gjj/Speech-Transformer/src/transformer/transformer.py", line 46, in recognize
args)
File "/data/gjj/Speech-Transformer/src/transformer/decoder.py", line 222, in recognize_beam
for x in hyp['yseq'][0, 1:]]))
UnicodeEncodeError: 'ascii' codec can't encode character '\u751a' in position 6: ordinal not in range
hello, I have this problem after training and I don't what this error means and how to solve it.
Traceback (most recent call last):
File "../../src/bin/recognize.py", line 69, in <module>
recognize(args)
File "../../src/bin/recognize.py", line 58, in recognize
nbest_hyps = model.recognize(input, input_length, char_list, args)
File "/home/tt/Speech-Transformer/src/transformer/transformer.py", line 46, in recognize
args)
File "/home/tt/Speech-Transformer/src/transformer/decode.py", line 222, in recognize_beam
for x in hyp['yes'][0, 1:]]))
UnicodeEncodeError: 'ascii' codec can't encode character '\u751a' in position 6: ordinal not in range(128)
Any solution?
博主可以尝试在解码中加入n-gram语言模型,我试了下,有一定的效果。
Hi kaituo, thanks for sharing such a useful speech transformer.
I have run your code and got the reported results successfully but I have a question about the non_pad_mask, I notice that when calculating the loss, we have masked the padded part. So since we have restricted the loss calculation, why we need to restrict step by step when forwarding.
def forward(self, dec_input, enc_output, non_pad_mask=None, slf_attn_mask=None, dec_enc_attn_mask=None):
dec_output, dec_slf_attn = self.slf_attn(
dec_input, dec_input, dec_input, mask=slf_attn_mask)
dec_output *= non_pad_mask
dec_output, dec_enc_attn = self.enc_attn(
dec_output, enc_output, enc_output, mask=dec_enc_attn_mask)
dec_output *= non_pad_mask
dec_output = self.pos_ffn(dec_output)
dec_output *= non_pad_mask
loss = loss.masked_select(non_pad_mask).sum() / n_word
Hi, Xu. Thanks for your contribution of this excellent work.
However, in stage -3 of run.sh, I didn't find where the train.py's code is,
Can you tell me ?
Thank you !
the four flag will be added to the softmax classification numuber?
While Speech-Transformer is obvious coded in Python 3
Some strange error occurred, eg.
def init(self, *args, LFR_m=1, LFR_n=1, **kwargs):
SyntaxError: invalid syntax
How to solve it?
Thanks in advance.
i found any feature the kaldi performed will pass function "build_LFR_features", is it has any special function? what's means by "stacking frames and skipping frames"?
你好 我尝试运行你的代码,发现没法运行,您可以给我提供些帮助吗,代码报错如下
run.sh: 行 64: utils/parse_options.sh: 没有那个文件或目录
目前的脚本不支持多GPU,有没有哪位改成功的push一下?
Hi, kaituo ,i'm trying to train this network on librispeech ,the loss curve of epoch 1 shows that the model tends to saturate after first few steps(there has approximately 4k iters per epoch,and my loss has dropped from 4 to 3 after 100 iters and then stays the same.)
I have not made any changes to the model. The only change i do is to use my own dataloader (for loading librispeech corpus) . so i wander if u have the same trend of loss-decline on traning aishell corpus?
作者,您好,论文可以发我一份吗,外网下不到,邮箱[email protected]。谢谢!
您好,我想在跑了2个epoch后保存的模型final.pth.tar上继续训练,continue_from应该如何设置呢?
how can I solve this problem? thanks a lot
Hi kaituo,
Thank you very much for your nice code. I was wondering how long will it take to train?
I find that the loss value on thchs dataset is smaller than aishell dataset? However the predict result is bad for thchs test set(almost can't recognize). What's the reason? Can you explain? Thanks
Can this code use batch to predict?
我在bash.run.sh的时候会报错,原因是run.sh第65行出现了一个没给的地址,请问一下应该怎么解决呢?
Hi, I found that it spend long time in stage 1, is there any wrong with my operation? How long will this process takes?
Hi, Your work is excellent.
Since transformer is very difficult to train, would you like to provide a pre-trained model ? Thank you very much !
加入语言模型效果虽有提升,但引入了断链的问题,有些语音解码不出来,在不使用语言模型时是可以正常解码的,请问这是什么原因呢
您好,感谢您的开源,我在执行训练的时候前面0,1,2的步骤都训练。就是为什么在步骤3训练的时候好像采用的python版本会是python2?我把系统默认的python的软链接改到python 3.7又会报:Unknown option: --
usage: python3 [option] ... [-c cmd | -m mod | file | -] [arg] ...
请问您的训练环境是怎么样的呢
Hi Kaituo,
I'm doing a very naive test. I selected one speaker from aishell(spkr = 2,#snt = 364, #wrd = 4,989) and used this speaker both for training, dev, and test. I expected to get some decent results becuase I'm overfitting the model. However here are the results I got (#epoch = 50):
SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 364 4989 | 0.2 95.9 3.9 3.9 103.7 100.0
I understand an NN needs much much more data to work. but I wonder why the results are poor for this toy example when we use the same data for train, dev and test. shouldn't at least we get better results.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.