githubharald / ctcdecoder Goto Github PK

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Home Page: https://towardsdatascience.com/3797e43a86c

License: MIT License

Python 91.37% C 8.63%

token-passing beam-search ctc language-model best-path prefix-search handwriting-recognition speech-recognition recurrent-neural-networks loss

ctcdecoder's People

Contributors

Stargazers

Watchers

Forkers

dromescu jetw fireae ufukhurriyetoglu dafeix fendaq runngezhang sgangireddy meelement rhythmblue edchengg dhruvrnaik everitt257 ashishpahwa7 xdcesc huangjq0617 lindsaypeng aheba qgzang jdc08161063 mayanksuman gzjas toannhu shubhampachori12110095 saranshkarira linecode handsomeboy experimenti harirajeev shubhamchandak94 ryanleary entn-at yibin-chn beyondboy 13476279840 drfarasat yanjun-zh billyzju lqyiii changss ajitaru holianh whaozl yonggucheng aflyingwolf zgsxwsdxg jkeun thomasdelteil ib1387 fanofjava cse-mojammel hosikchoi tongyoungg hxk11111 lijuan-su magiccodess lijiangguo phoenixfury007 pacinoan templeblock ieee820 ajgappmark mowayao asdf25jae abdelwahed gds101054108 xrick caozhengquan mfribrahim heisenberg0391 withing1113 gitchenguang aijianiula0601 huyhoang17 baskaranangappan wxb506 fesianxu xstarse robbine75 raotnameh sailinglqh chwick yuan776 hoanglongtran yudmoe guome d2rivendell vikashranjan shuyangli94 gkumar08021 barathiganesh-hb emersoncpp christinaanokhova hate-deadline quminhdo hoangtienduc jnarhan aditya3107 craig-matadeen wumeng2

ctcdecoder's Issues

beam_search with a beam_width=1

Hi!
Could you please tell, why does a beam_search with a beam_width equal to 1 not give the same result as best_path?

For example

import numpy as np
from ctc_decoder import best_path, beam_search

chars = 'ab'
mat = np.array([[0.8, 0, 0.2], [0.4, 0.0, 0.6], [0.8, 0, 0.2]])

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars, beam_width=1)}"')

Gives:
Best path: "aa"
Beam search: "a"

Thanks!

No module named 'editdistance'

I had installed editdistance at terminal successfully by:

pip install editdistance

And install requirements.txt with error as followed:

Could not find a version that satisfies the requirement pkg-resources==0.0.0 (from -r requirements.txt (line 5)) (from versions: )
No matching distribution found for pkg-resources==0.0.0 (from -r requirements.txt (line 5))

However no error message about editdistance.

But running main.py still get error message:

ModuleNotFoundError: No module named 'editdistance'

beam search

Hi! I have a question! The result of "best_path" is normal or i want to,but when i used the "beam_search",the result is none or no output,is all blank! So what i meet? How to deal with?

beam_search.py don't support batch data

def ctcBeamSearch(mat, classes, lm, beamWidth=25): blankIdx = len(classes) maxT, maxC = mat.shape
the matrix only have 2 dimension(length, char_size), don't have batch dimension.

About blankIdx

I have a question
How can I use blankIdx value is zero?
Thank you so much in advance

Handling duplicate paths in Beam Search

Hey! I am wondering if you could help me to figure something out, in [1] you mentioned that summing up probabilities for Pr, Pr+ and Pr- leads to better results, I tried it in my implementation based on [2], and it does get better, but my probabilities are getting positive (I am working with log probabilities, so the values should be between (-inf, 0]). Did you experienced this phenomenon while implementing the sum in your algorithm?

[1] Stackexchange CTC
[2] CTC implementation github

PS: Sorry if this is not the place to make this question, but I have no other way to reach you.

How get probability

Hi, is there any way to get result probability?
Thanks

tensor flow op

great code!!
can you please post the code of the implementation in c++ for tensorflow you mentioned?

thanks

Character level probability

How to get the character level probability along with the string output?

Test custom image and word

Hello, I'm trying to test a custom word and image but doesn't work for me, can you tell me how can I use a specific word and line for the test?

CTC Token Passing

Hi!

I'm trying to use the Token Passing algorithm for decoding a model trained in IAM-DB. I'm using a language model built with the LOB corpus, however, there are situations in which the word that is passed to wordToLabelSeq method presents a character that is not mapped to any class, eg.: '>'. What do you advise to do in these situations?

Thanks in advance,
Dayvid.

Question about language model initialization

Hi, Thanks for the sharing. I have a question in language model initialization function. Why do you set numSamples[c] to be len(classes) instead of 0?

CTCDecoder/src/LanguageModel.py

Line 31 in d6a88ea

self.numSamples[c] = len(classes)

module 'pyopencl' has no attribute 'enqueue_write_buffer

I am having a difficult time using the GPU. it runs ok with out GPU but with I get this error

=====Line example (GPU)=====
Traceback (most recent call last):
File "main.py", line 147, in
testLineExampleGPU()
File "main.py", line 122, in testLineExampleGPU
resBatch = BestPathCL.ctcBestPathCL(batch, classes, clWrapper)
File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 109, in ctcBestPathCL
labelStrBatch = clWrapper.compute(batch)
File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 84, in compute
cl.enqueue_write_buffer(self.queue, self.batchBuf, batch.astype(np.float32), is_blocking=False)
AttributeError: module 'pyopencl' has no attribute 'enqueue_write_buffer'

difference between this repo and ctc decoder in tensorflow

Thanks for your code.
I test my sequence with both ctc decoder in tensorflow and this repo.I always get different result. Tensorflow is always right.This repo sometimes return right and sometimes return wrong.
Have you ever compare these two implementation?

Your beamsearch decoder never gives result like aa or ll

refering to https://github.com/githubharald/CTCDecoder/blob/master/src/BeamSearch.py
There seems to be a bug, I'm trying to debug now

Language model at word level

Hi, did you add word level language model for beam search?

Currently its easy to add character level bi-gram, but I find it much harder to add word level. I tried CTC token passing algorithm but its just way too slow comparing beam search.

in beamsearch.py，why last.norm() only use in the last step？

It's a great job, thanks to the author.

i have a question in beam search +lm ,why last.norm() only use in the last step ?
why not use last.norm() in every time step ? the long the seq the lm is small,so it should be compensate by length norm, i think it should be norm every time step ，is it right ? thanks in advance

best_path is better than beam_search in my experiments，is there something wrong with my network？

Hi,
I use cnn + lstm + ctc in my license plate recognition system。And I use your python scripts to test the precision of "best path decoding" and "beam search decoding"。The result is "best path" got 92.3% while "beam search" got 91.7%。In my experiments, time step equals to 21，and the total num of labels equals to 77。
Could you give me some advice?
Thanks a lot!

Support for K-Gram LM where k > 2?

I wanted to experiment with the LMs other than Bigram. Any suggestion on how to approach extending current codebase?
or
the probability of the last character conditioned on all previous character.

question about prefix beam search

In PrefixSearch.py , in the func ctcPrefixSearch ,what is prob_ext?

Different beam search output using different blankIdx value

Hi, I have a question regarding your beam search implementation.

On your ctcBeamSearch method, you put value on blankIdx equals to length of the classes (in this case, the known letters and symbols). But on some other beam-search implementation, they put zero on it.

I tested this using your example, and indeed it differs both in decoded result and how far is it from the ground truth (i'm using CER and WER)

=====Line example (using blankIdx = len(classes))=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the fak friend of the fomcly hae tC" CER: CER/WER: 0.25714/0.15000
BEAM SEARCH LM          : "the fake friend of the family, lie th" CER: CER/WER: 0.05405/0.03226
=====Line example (using blankIdx=0)=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the faetker friend of ther foarmnacly,  harse. tHhC." CER: CER/WER: 0.33333/0.22368
BEAM SEARCH LM          : "the fake friend of the family, like the " CER: CER/WER: 0.00000/0.00000

So is there a different case where the blankIdx is not zero? Which value is suitable for beam search decoding?

Is support BPE token?

My acoustic model output are BPE tokens, so can i use lexicon search to get correct word from BPE tokens? Or beamsearch with ngram
Thanks!

Best path decoding (negative values of logits)

Hello Sir,
I have tested the code below. It is quite interresting. However, the best path search using my own data performs like this example:
TARGET : "Le trois Janvier mil neuf cent soixante dix,"
BEST PATH : "|Je|Je|trois|fanmier|mil|neuf|cont|soitante|dex|.||Je|trois|fanmier|mil|neuf|cont|soitante|dex|.||"

I will be gratefull if you can help me to solve this problem.
The logits are generated using a linear layer (tf.layers.dense)
Values in logits matrix are like:

4.287187	-32.6091	-12.860022	-18.233511	-12.024508	-32.516006	-31.813993	-12.016912	-11.002839
7.621706	-39.008682	-17.869062	-20.652061	-18.614656	-38.91413	-39.586323	-17.21066	-14.14866
11.9552145	-41.16309	-21.07399	-20.94039	-23.124344	-41.045444	-39.15103	-22.799099	-15.869296