sbillburg / crnn-with-stn Goto Github PK

View Code? Open in Web Editor NEW

100.0 7.0 31.0 105 KB

implement CRNN in Keras with Spatial Transformer Network

Python 100.00%

spatial-transformer-network crnn stn dataset ocr text-recognition

crnn-with-stn's People

Stargazers

Watchers

crnn-with-stn's Issues

你好，想问一下你的train数据格式是什么样的

是30807140_1102551758.jpg 压抑的，十日关白而已（文件名+ 文字）吗
还是文件名+文字对应的数字，
能提供一下样例吗
谢谢

You could read the paper A Multi-Object Rectified Attention Network for Scene Text Recognition at https://paperswithcode.com/paper/a-multi-object-rectified-attention-network.
The author introduces a training method called curriculum training. First, he trains the crnn. Second, he trains the STN. Finally, he trains the STN and crnn at the same time.

I encountered a problem when predicting?

error while using the saved model

When I want to use the saved mode to get prediction based on that, it raises

  File "/home/sgnbx/Downloads/projects/CRNN-with-STN-master/prediction.py", line 20, in <module>
    model = load_model('weightswithoutstnlrchanged.best.hdf5', custom_objects={"bknd": backend})
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/saving.py", line 312, in _deserialize_model
    sample_weight_mode=sample_weight_mode)
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training.py", line 129, in compile
    loss_functions.append(losses.get(loss.get(name)))
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/losses.py", line 133, in get
    return deserialize(identifier)
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/losses.py", line 114, in deserialize
    printable_module_name='loss function')
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 165, in deserialize_keras_object
    ':' + function_name)
ValueError: Unknown loss function:<lambda>

Do you have any idea of this?

this is my piece of code:

model = load_model('weightswithoutstnlrchanged.best.hdf5', custom_objects={"bknd": backend})
sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
adam = optimizers.Adam()

model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)

I gave custom_object backend because firstly it did not recognize backend.
now it raising error for loss, and I tried to assign loss in the custom object but it did not work.or maybe I have to try something else.

can you please have a look on this.
Thank you

你好，加载模型时遇到了一些问题

当我修改config.py的模型路径load_model_path = '/home/user/WWY/CRNN-with-STN-master/weights_with_STN.hdf5'
命令行报错：
ValueError: Layer #10 (named "spatial_transformer_1" in the current model) was found to correspond to layer spatial_transformer_1 in the save file. However the new layer spatial_transformer_1 expects 16 weights, but the saved weights have 8 elements.

如果是load_model_path = '/home/user/WWY/CRNN-with-STN-master/weights_without_STN.hdf5'
命令行报错：
ValueError: axes don't match array

这两个错误在我以前没有遇到过，我不知道如何修改，麻烦作者解答一下困惑，谢谢啦

Learning rate

Hi!
I read in a different issue that you are able to achieve 90% accuracy in 24 hours.
I'd like to ask what is the learning rate you used for that. Is it 0.0001 or 0.002? There are two values in the config.py so I was wondering which of the two was used for that.
I'm quite struggling to train the model as I can only reach like 3% accuracy in 48 hours when training on 4 GPUs...

alternative way of concatenating two LSTM cell

Hello again:)

I am working on your code and almost done. except I need to change one line of the code:
rnn2_merged = concatenate([rnn_2, rnn_2b]) in which you are concatenating them.
Can you please help me with this. I want to keep the same structure but without Concatenate?
to put it another way, what will be the alternative way of concatenating them in keras but without using concatenate?

Thanks for taking the time.

I'm having a problem to Text

Bi-LTSM's implementation

Your bidirectional LSTM's implementation has a mistake. go_backwards=True in LSTM (rnn_1b, rnn_2b) is only reverse the input before using LSTM, so you have to reverse the output of LSTM before adding or concating with forward-LSTM (rnn_1 , rnn_2).

More info about the issue in here: qjadud1994/CRNN-Keras#26

You can fix it or optimize your code using Bidirectional like this one:
https://github.com/tuanphan09/captcha-recognition/blob/master/model.py

about the input shape

I found your model has the certain size of input, so, how can your recognize images with uncertain size? Like a 64*500 image, if resize the image, it main destroy its aspect ratio and influence the result, is it?

Can you please post your trained weights?

Thank you very much!

y_true (label) in CTC

Hi,
I've just learnt CTC loss, and as I know it allows various length of labels as long as it's not longer than label_len. For that reason, I don't understand why you needed to pad '-' for the labels (your comment doesn't make sense btw):

# due to the explanation of ctc_loss, try to not add "-" for blank
while len(lexicon) < label_len:
     lexicon += "-"

and why you added '-' symbol in your vocabulary (characters):

characters = '0123456789'+string.ascii_lowercase+'-'
label_classes = len(characters)+1

EDIT:
I fought that you need to pad to the label to make the code run well. Last question, eg. label='12345---' and label_len=5, CTC just uses label[:label_len] for caculating the loss, right?

你好，加入新数据后遇到的问题

我用您提供的数据集成功运行后，尝试将自己的数据加入其中，但不管是类似原数据集准备lexicon,train,vali,还是直接将新数据覆盖在原数据集上，总是会报错，而且是同一种错误

(0) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 38 num_classes: 38 labels: 0,255,0,1,255,1,5,9,0,0,36,36,36,36,36,36 labels seen so far: 0
[[{{node ctc/CTCLoss}}]]
[[training/SGD/gradients/ctc/CTCLoss_grad/mul/_431]]
(1) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 38 num_classes: 38 labels: 0,255,0,1,255,1,5,9,0,0,36,36,36,36,36,36 labels seen so far: 0
[[{{node ctc/CTCLoss}}]]

这种应该如何修改呢？

why decoding starts from 3rd position?

Hi,

I wonder what is the theoretical basis for starting decoding from 3rd position. I'm referring to this line:
ctc_decode = bknd.ctc_decode(y_pred[:, 2:, :], input_length=np.ones(shape[0])*shape[1])[0][0]

In image_ocr.py example on keras github there's a comment:

# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:

But why? And why everyone is using 2 regardless of dataset, image width and text length?

请问The channel dimension of the inputs should be defined. Found `None`.是为什么呢？

在我的 model.add(SpatialTransformer(localization_net=locnet,output_size=(2,128), input_shape=input_shape))之后，输出张量变成了下面这样，然后就无法运行了，我在GitHub上找到很多的stn模型也是这样的问题。

Layer (type) Output Shape Param #

spatial_transformer (Spatial (None, 2, 128, None) 65656

concept of y_pred[:,2:,:] tensor?

Hi,
Why you don't use all of y_pred[:,:,:] tensor instead of y_pred[:,2:,:]? why you don't use 0 ,1 dims?

def evaluate(input_model):
correct_prediction = 0
generator = img_gen_val()

x_test, y_test = next(generator)
# print(" ")
y_pred = input_model.predict(x_test) 
`shape = y_pred[:, 2:, :].shape `
ctc_decode = bknd.ctc_decode(`y_pred[:, 2:, :]`, input_length=np.ones(shape[0])*shape[1])[0][0]
out = bknd.get_value(ctc_decode)[:, :label_len]

Problem related to Text Detection

I tried training East Text detection algorithm, wasn't successful in detection of all the lines in a document.
I tried Ctpn model on VOC pascal dataset ...
It works really well for font size of certain value... Beyond tht if font in an document is too small .... It fails to detect lines properly...
Any suggestions?

STN location

STN should be added at this location

Building OCR to detect and recognise

Bro I completely followed Ur approach but have some doubts related to CTPN model for text detection....
Reference:-
1.)https://github.com/YCG09/chinese_ocr/
2.)https://github.com/xiaomaxiao/keras_ocr?files=1
I need you to explain how robust is this approach which has 2 parts text detection and text recognition compared to your's
Thank you...

input and output name

Again thanks for sharing your code with us.
I'd love working on your code and I want to use in my phone.
So I need to convet the keras model to core ML.
this is the function I need to call:


def convert(model,
            mode=None,
            image_input_names=[],
            preprocessing_args={},
            image_output_names=[],
            deprocessing_args={},
            class_labels=None,
            predicted_feature_name='classLabel',
            add_custom_layers = False,
            custom_conversion_functions = {})

I was wondering can you help me with the argument according to your code?
like what will be image_input_names,....

Thanks in advance for taking the time:)

sbillburg / crnn-with-stn Goto Github PK

crnn-with-stn's People

Stargazers

Watchers

Forkers

crnn-with-stn's Issues

Recommend Projects

Recommend Topics

Recommend Org