Giter VIP home page Giter VIP logo

Comments (8)

githubharald avatar githubharald commented on August 22, 2024 1

@Qhuangcn: see my repo CTCDecoder.

from crnn.

bgshih avatar bgshih commented on August 22, 2024

@chengzhanzhan I have also observed the same phenomenon. Perhaps it's not a bug, but a strategy learned by LSTM. Notice that the CTC loss function does not care the position of individual letters, but only the final recognition results.

from crnn.

chengzhanzhan avatar chengzhanzhan commented on August 22, 2024

@bgshih I test 38k images for this phenomenon. Because of the convolution operation, a section of continuous pixels are represented by a shorter interval, which may results in the first position including some information of the first character.

Then I conduct experiments based on the
setting 1: 3 * 256 * 32 (channel * width * height) image ==> convolution layers ==> 128 * 65* 1 ==> LSTM layers ==> CTC. Here, each column (of 65) represent several columns of origin image.

setting 2: 3 * 256 * 32 (channel * width * height) image ==> convolution layers ==> 128 * 225* 1 ==> LSTM layers ==> CTC to make sure that each step (of 225) represents each column of the image.

A example is as follow:

given image:
121457

setting 1 result:
3-----------22----6----22-----b----------------------------------
shows that the first position is character 3.

setting 2 result:
-----33------------------2----------------------6-------------------22-------------------------b---------------------------------------------------------------------------------------------------------------------------------
shows that the first several positions are blanks.

In CTC, each peak represents corresponding character, otherwise represent blank. Therefore, I think that the convolution strategy results in the phenomenon and CTC can index the probable position of each character.

from crnn.

githubharald avatar githubharald commented on August 22, 2024

@chengzhanzhan just want to let you know that I've observed the same phenomena with a different model, different framework (TensorFlow) and different input data (IAM). The last character is always predicted at the last time-step of the RNN output sequence. All other character-predictions are roughly aligned with the characters in the image. Look at the attached image, where the probability of the characters over time is plotted in the bottom-most graph: the "i", the "l"s and "t"s are aligned with the image while the final "e" is not.

rnn_output

from crnn.

sheirving avatar sheirving commented on August 22, 2024

@githubharald
Thankyou for your experimtent, do you figure out the reason caused this phenomena?

from crnn.

githubharald avatar githubharald commented on August 22, 2024

no.

from crnn.

JimmyJuan avatar JimmyJuan commented on August 22, 2024

@githubharald Hello, can you tell me how to calculate the character-level probability in ctc-beam search? Is there a tutorial of implementing that in tensorflow?

thank you

from crnn.

luvwinnie avatar luvwinnie commented on August 22, 2024

does anyone research on ctc of this phenomenon, for my model it shows the same problem as this. for example a blank image input to this, it sometime decode something instead of blank only output...... This is a serious problem on OCR application.
Let say I have 4 fields have to recognition, and 1 field is optional to fill in on the document, it should be able to recognition as blank image as blank output. if not the recognition accuracy within a document will be affected

from crnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.