Giter VIP home page Giter VIP logo

githubharald / ctcdecoder Goto Github PK

View Code? Open in Web Editor NEW
802.0 25.0 179.0 1.01 MB

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Home Page: https://towardsdatascience.com/3797e43a86c

License: MIT License

Python 91.37% C 8.63%
token-passing beam-search ctc language-model best-path prefix-search handwriting-recognition speech-recognition recurrent-neural-networks loss

ctcdecoder's People

Contributors

a-sneddon avatar chwick avatar githubharald avatar thomasdelteil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ctcdecoder's Issues

beam_search with a beam_width=1

Hi!
Could you please tell, why does a beam_search with a beam_width equal to 1 not give the same result as best_path?

For example

import numpy as np
from ctc_decoder import best_path, beam_search

chars = 'ab'
mat = np.array([[0.8, 0, 0.2], [0.4, 0.0, 0.6], [0.8, 0, 0.2]])

print(f'Best path: "{best_path(mat, chars)}"')
print(f'Beam search: "{beam_search(mat, chars, beam_width=1)}"')

Gives:
Best path: "aa"
Beam search: "a"

Thanks!

No module named 'editdistance'

I had installed editdistance at terminal successfully by:

pip install editdistance

And install requirements.txt with error as followed:

Could not find a version that satisfies the requirement pkg-resources==0.0.0 (from -r requirements.txt (line 5)) (from versions: )
No matching distribution found for pkg-resources==0.0.0 (from -r requirements.txt (line 5))

However no error message about editdistance.

But running main.py still get error message:

ModuleNotFoundError: No module named 'editdistance'

beam search

Hi! I have a question! The result of "best_path" is normal or i want to,but when i used the "beam_search",the result is none or no output,is all blank! So what i meet? How to deal with?

beam_search.py don't support batch data

def ctcBeamSearch(mat, classes, lm, beamWidth=25): blankIdx = len(classes) maxT, maxC = mat.shape
the matrix only have 2 dimension(length, char_size), don't have batch dimension.

About blankIdx

I have a question
How can I use blankIdx value is zero?
Thank you so much in advance

Handling duplicate paths in Beam Search

Hey! I am wondering if you could help me to figure something out, in [1] you mentioned that summing up probabilities for Pr, Pr+ and Pr- leads to better results, I tried it in my implementation based on [2], and it does get better, but my probabilities are getting positive (I am working with log probabilities, so the values should be between (-inf, 0]). Did you experienced this phenomenon while implementing the sum in your algorithm?

[1] Stackexchange CTC
[2] CTC implementation github

PS: Sorry if this is not the place to make this question, but I have no other way to reach you.

tensor flow op

great code!!
can you please post the code of the implementation in c++ for tensorflow you mentioned?

thanks

Test custom image and word

Hello, I'm trying to test a custom word and image but doesn't work for me, can you tell me how can I use a specific word and line for the test?

CTC Token Passing

Hi!

I'm trying to use the Token Passing algorithm for decoding a model trained in IAM-DB. I'm using a language model built with the LOB corpus, however, there are situations in which the word that is passed to wordToLabelSeq method presents a character that is not mapped to any class, eg.: '>'. What do you advise to do in these situations?

Thanks in advance,
Dayvid.

module 'pyopencl' has no attribute 'enqueue_write_buffer

I am having a difficult time using the GPU. it runs ok with out GPU but with I get this error

=====Line example (GPU)=====
Traceback (most recent call last):
File "main.py", line 147, in
testLineExampleGPU()
File "main.py", line 122, in testLineExampleGPU
resBatch = BestPathCL.ctcBestPathCL(batch, classes, clWrapper)
File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 109, in ctcBestPathCL
labelStrBatch = clWrapper.compute(batch)
File "/home/ubuntu/handwrite/CTCDecoder/src/BestPathCL.py", line 84, in compute
cl.enqueue_write_buffer(self.queue, self.batchBuf, batch.astype(np.float32), is_blocking=False)
AttributeError: module 'pyopencl' has no attribute 'enqueue_write_buffer'

difference between this repo and ctc decoder in tensorflow

Thanks for your code.
I test my sequence with both ctc decoder in tensorflow and this repo.I always get different result. Tensorflow is always right.This repo sometimes return right and sometimes return wrong.
Have you ever compare these two implementation?

Language model at word level

Hi, did you add word level language model for beam search?

Currently its easy to add character level bi-gram, but I find it much harder to add word level. I tried CTC token passing algorithm but its just way too slow comparing beam search.

in beamsearch.py,why last.norm() only use in the last step?

It's a great job, thanks to the author.

i have a question in beam search +lm ,why last.norm() only use in the last step ?
why not use last.norm() in every time step ? the long the seq the lm is small,so it should be compensate by length norm, i think it should be norm every time step ,is it right ? thanks in advance

Support for K-Gram LM where k > 2?

I wanted to experiment with the LMs other than Bigram. Any suggestion on how to approach extending current codebase?
or
the probability of the last character conditioned on all previous character.

Different beam search output using different blankIdx value

Hi, I have a question regarding your beam search implementation.

On your ctcBeamSearch method, you put value on blankIdx equals to length of the classes (in this case, the known letters and symbols). But on some other beam-search implementation, they put zero on it.

I tested this using your example, and indeed it differs both in decoded result and how far is it from the ground truth (i'm using CER and WER)

=====Line example (using blankIdx = len(classes))=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the fak friend of the fomcly hae tC" CER: CER/WER: 0.25714/0.15000
BEAM SEARCH LM          : "the fake friend of the family, lie th" CER: CER/WER: 0.05405/0.03226
=====Line example (using blankIdx=0)=====
TARGET                  : "the fake friend of the family, like the"
BEAM SEARCH             : "the faetker friend of ther foarmnacly,  harse. tHhC." CER: CER/WER: 0.33333/0.22368
BEAM SEARCH LM          : "the fake friend of the family, like the " CER: CER/WER: 0.00000/0.00000

So is there a different case where the blankIdx is not zero? Which value is suitable for beam search decoding?

Is support BPE token?

My acoustic model output are BPE tokens, so can i use lexicon search to get correct word from BPE tokens? Or beamsearch with ngram
Thanks!

Best path decoding (negative values of logits)

Hello Sir,
I have tested the code below. It is quite interresting. However, the best path search using my own data performs like this example:
TARGET : "Le trois Janvier mil neuf cent soixante dix,"
BEST PATH : "|Je|Je|trois|fanmier|mil|neuf|cont|soitante|dex|.||Je|trois|fanmier|mil|neuf|cont|soitante|dex|.||"

I will be gratefull if you can help me to solve this problem.
The logits are generated using a linear layer (tf.layers.dense)
Values in logits matrix are like:

4.287187 -32.6091 -12.860022 -18.233511 -12.024508 -32.516006 -31.813993 -12.016912 -11.002839
7.621706 -39.008682 -17.869062 -20.652061 -18.614656 -38.91413 -39.586323 -17.21066 -14.14866
11.9552145 -41.16309 -21.07399 -20.94039 -23.124344 -41.045444 -39.15103 -22.799099 -15.869296

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.