I don't clear that why the first position of time sequence always predicted as the fir

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question for CTC decoding about crnn HOT 8 CLOSED

bgshih commented on August 22, 2024

Question for CTC decoding

from crnn.

Comments (8)

githubharald commented on August 22, 2024 1

@Qhuangcn: see my repo CTCDecoder.

from crnn.

bgshih commented on August 22, 2024

@chengzhanzhan I have also observed the same phenomenon. Perhaps it's not a bug, but a strategy learned by LSTM. Notice that the CTC loss function does not care the position of individual letters, but only the final recognition results.

from crnn.

chengzhanzhan commented on August 22, 2024

@bgshih I test 38k images for this phenomenon. Because of the convolution operation, a section of continuous pixels are represented by a shorter interval, which may results in the first position including some information of the first character.

Then I conduct experiments based on the
setting 1: 3 * 256 * 32 (channel * width * height) image ==> convolution layers ==> 128 * 65* 1 ==> LSTM layers ==> CTC. Here, each column (of 65) represent several columns of origin image.

setting 2: 3 * 256 * 32 (channel * width * height) image ==> convolution layers ==> 128 * 225* 1 ==> LSTM layers ==> CTC to make sure that each step (of 225) represents each column of the image.

A example is as follow:

given image:

setting 1 result:
3-----------22----6----22-----b----------------------------------
shows that the first position is character 3.

setting 2 result:
-----33------------------2----------------------6-------------------22-------------------------b---------------------------------------------------------------------------------------------------------------------------------
shows that the first several positions are blanks.

In CTC, each peak represents corresponding character, otherwise represent blank. Therefore, I think that the convolution strategy results in the phenomenon and CTC can index the probable position of each character.

from crnn.

githubharald commented on August 22, 2024

@chengzhanzhan just want to let you know that I've observed the same phenomena with a different model, different framework (TensorFlow) and different input data (IAM). The last character is always predicted at the last time-step of the RNN output sequence. All other character-predictions are roughly aligned with the characters in the image. Look at the attached image, where the probability of the characters over time is plotted in the bottom-most graph: the "i", the "l"s and "t"s are aligned with the image while the final "e" is not.

from crnn.

sheirving commented on August 22, 2024

@githubharald
Thankyou for your experimtent, do you figure out the reason caused this phenomena?

from crnn.

githubharald commented on August 22, 2024

no.

from crnn.

JimmyJuan commented on August 22, 2024

@githubharald Hello, can you tell me how to calculate the character-level probability in ctc-beam search? Is there a tutorial of implementing that in tensorflow?

thank you

from crnn.

luvwinnie commented on August 22, 2024

does anyone research on ctc of this phenomenon, for my model it shows the same problem as this. for example a blank image input to this, it sometime decode something instead of blank only output...... This is a serious problem on OCR application.
Let say I have 4 fields have to recognition, and 1 field is optional to fill in on the document, it should be able to recognition as blank image as blank output. if not the recognition accuracy within a document will be affected

from crnn.

Question for CTC decoding about crnn HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent