Giter VIP home page Giter VIP logo

ctcdecoder's Introduction

CTC Decoding Algorithms with Language Model

Connectionist Temporal Classification (CTC) decoding algorithms are implemented as Python scripts. A minimalistic Language Model (LM) is used.

Algorithms

  • Best Path Decoding: takes best label per time-step, then removes repeated labels and blanks from this path. File: BestPath.py [1]
  • Prefix Search Decoding: best-first search through tree of labellings. File: PrefixSearch.py [1]
  • Beam Search Decoding: iteratively searches for best labelling, uses a character-level LM. File: BeamSearch.py [2]
  • Token Passing: searches for most probable word sequence, words are restricted to the words from a dictionary. Can be extended to use a word-level LM. File: TokenPassing.py [1]
  • Word Beam Search: TensorFlow implementation see repository CTCWordBeamSearch

Choosing the right algorithm

This paper compares beam search decoding and token passing. It also gives suggestions when to use best path decoding, beam search decoding and token passing.

Run

python main.py

Expected results:

=====Mini example=====                                   
TARGET       : "a"                                       
BEST PATH    : ""                                        
PREFIX SEARCH: "a"                                       
BEAM SEARCH  : "a"                                       
TOKEN        : "a"                                       
=====Real example=====                                   
TARGET        : "the fake friend of the family, like the"
BEST PATH     : "the fak friend of the fomly hae tC"     
PREFIX SEARCH : "the fak friend of the fomcly hae tC"    
BEAM SEARCH   : "the fak friend of the fomcly hae tC"
BEAM SEARCH LM: "the fake friend of the family, fake th"
TOKEN         : "the fake friend of the family fake the" 

Data files

  • data/rnnOutput.csv: output of RNN layer (softmax not yet applied), which contains 100 time-steps and 80 label scores per time-step
  • data/corpus.txt: the text from which the language model is generated. In this case it is just the scrambled ground-truth text.

Notes

These Python scripts are intended for tests and experiments. For productive use I implemented the functions in C++ (for performance reasons) and then added them to TensorFlow as custom ops.

The ground-truth text is "the fake friend of the family, like the" and is a sample from the IAM Handwriting Database [4]. RNN output was generated by a partially trained TensorFlow model which was inspired by CRNN [3]. The visualisation below shows the input image and the RNN output as a matrix with 100 time-steps and 80 classes (the last one being the blank label). Each column sums to 1 and each entry shows the probability of seeing a label at a given time-step.

img

Illustration of the "Mini example" testcase: the RNN output is a table containing 2 time-steps (t0 and t1) and 3 labels (a, b and - as the special blank label). Best path decoding takes the most probable label per time-step which gives the path "--" and therefore the recognized text "" with probability 0.6*0.6=0.36. Beam and prefix search calculate the probability of labellings. For the labelling "a" it sums over the paths (see thin lines) "-a", "a-" and "aa" with probability 0.4*0.4+2*0.6*0.4=0.64. The only path (see dashed line) which gives "" still has probability 0.36, therefore "a" is the result returned by beam search.

ctc

References

[1] Graves - Supervised sequence labelling with recurrent neural networks

[2] Hwang - Character-level incremental speech recognition with recurrent neural networks

[3] https://github.com/bgshih/crnn

[4] http://www.fki.inf.unibe.ch/databases/iam-handwriting-database

ctcdecoder's People

Contributors

githubharald avatar

Watchers

James Cloos avatar Siva Reddy Gangireddy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.