sbillburg / crnn-with-stn Goto Github PK
View Code? Open in Web Editor NEWimplement CRNN in Keras with Spatial Transformer Network
implement CRNN in Keras with Spatial Transformer Network
是30807140_1102551758.jpg 压抑的,十日关白而已(文件名+ 文字)吗
还是文件名+文字对应的数字,
能提供一下样例吗
谢谢
You could read the paper A Multi-Object Rectified Attention Network for Scene Text Recognition at https://paperswithcode.com/paper/a-multi-object-rectified-attention-network.
The author introduces a training method called curriculum training. First, he trains the crnn. Second, he trains the STN. Finally, he trains the STN and crnn at the same time.
When I want to use the saved mode to get prediction based on that, it raises
File "/home/sgnbx/Downloads/projects/CRNN-with-STN-master/prediction.py", line 20, in <module>
model = load_model('weightswithoutstnlrchanged.best.hdf5', custom_objects={"bknd": backend})
File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/saving.py", line 312, in _deserialize_model
sample_weight_mode=sample_weight_mode)
File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/engine/training.py", line 129, in compile
loss_functions.append(losses.get(loss.get(name)))
File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/losses.py", line 133, in get
return deserialize(identifier)
File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/losses.py", line 114, in deserialize
printable_module_name='loss function')
File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 165, in deserialize_keras_object
':' + function_name)
ValueError: Unknown loss function:<lambda>
Do you have any idea of this?
this is my piece of code:
model = load_model('weightswithoutstnlrchanged.best.hdf5', custom_objects={"bknd": backend})
sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
adam = optimizers.Adam()
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)
I gave custom_object backend because firstly it did not recognize backend.
now it raising error for loss, and I tried to assign loss in the custom object but it did not work.or maybe I have to try something else.
can you please have a look on this.
Thank you
当我修改config.py的模型路径load_model_path = '/home/user/WWY/CRNN-with-STN-master/weights_with_STN.hdf5'
命令行报错:
ValueError: Layer #10 (named "spatial_transformer_1" in the current model) was found to correspond to layer spatial_transformer_1 in the save file. However the new layer spatial_transformer_1 expects 16 weights, but the saved weights have 8 elements.
如果是load_model_path = '/home/user/WWY/CRNN-with-STN-master/weights_without_STN.hdf5'
命令行报错:
ValueError: axes don't match array
这两个错误在我以前没有遇到过,我不知道如何修改,麻烦作者解答一下困惑,谢谢啦
Hi!
I read in a different issue that you are able to achieve 90% accuracy in 24 hours.
I'd like to ask what is the learning rate you used for that. Is it 0.0001 or 0.002? There are two values in the config.py so I was wondering which of the two was used for that.
I'm quite struggling to train the model as I can only reach like 3% accuracy in 48 hours when training on 4 GPUs...
Hello again:)
I am working on your code and almost done. except I need to change one line of the code:
rnn2_merged = concatenate([rnn_2, rnn_2b])
in which you are concatenating them.
Can you please help me with this. I want to keep the same structure but without Concatenate?
to put it another way, what will be the alternative way of concatenating them in keras
but without using concatenate?
Thanks for taking the time.
Your bidirectional LSTM's implementation has a mistake. go_backwards=True
in LSTM (rnn_1b, rnn_2b) is only reverse the input before using LSTM, so you have to reverse the output of LSTM before adding or concating with forward-LSTM (rnn_1 , rnn_2).
More info about the issue in here: qjadud1994/CRNN-Keras#26
You can fix it or optimize your code using Bidirectional
like this one:
https://github.com/tuanphan09/captcha-recognition/blob/master/model.py
I found your model has the certain size of input, so, how can your recognize images with uncertain size? Like a 64*500 image, if resize the image, it main destroy its aspect ratio and influence the result, is it?
Thank you very much!
Hi,
I've just learnt CTC loss, and as I know it allows various length of labels as long as it's not longer than label_len. For that reason, I don't understand why you needed to pad '-' for the labels (your comment doesn't make sense btw):
# due to the explanation of ctc_loss, try to not add "-" for blank
while len(lexicon) < label_len:
lexicon += "-"
and why you added '-' symbol in your vocabulary (characters):
characters = '0123456789'+string.ascii_lowercase+'-'
label_classes = len(characters)+1
EDIT:
I fought that you need to pad to the label to make the code run well. Last question, eg. label='12345---'
and label_len=5
, CTC just uses label[:label_len]
for caculating the loss, right?
我用您提供的数据集成功运行后,尝试将自己的数据加入其中,但不管是类似原数据集准备lexicon,train,vali,还是直接将新数据覆盖在原数据集上,总是会报错,而且是同一种错误
(0) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 38 num_classes: 38 labels: 0,255,0,1,255,1,5,9,0,0,36,36,36,36,36,36 labels seen so far: 0
[[{{node ctc/CTCLoss}}]]
[[training/SGD/gradients/ctc/CTCLoss_grad/mul/_431]]
(1) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 38 num_classes: 38 labels: 0,255,0,1,255,1,5,9,0,0,36,36,36,36,36,36 labels seen so far: 0
[[{{node ctc/CTCLoss}}]]
这种应该如何修改呢?
Hi,
I wonder what is the theoretical basis for starting decoding from 3rd position. I'm referring to this line:
ctc_decode = bknd.ctc_decode(y_pred[:, 2:, :], input_length=np.ones(shape[0])*shape[1])[0][0]
In image_ocr.py example on keras github there's a comment:
# the 2 is critical here since the first couple outputs of the RNN # tend to be garbage:
But why? And why everyone is using 2 regardless of dataset, image width and text length?
在我的 model.add(SpatialTransformer(localization_net=locnet,output_size=(2,128), input_shape=input_shape))之后,输出张量变成了下面这样,然后就无法运行了,我在GitHub上找到很多的stn模型也是这样的问题。
Layer (type) Output Shape Param #
spatial_transformer (Spatial (None, 2, 128, None) 65656
Hi,
Why you don't use all of y_pred[:,:,:] tensor instead of y_pred[:,2:,:]? why you don't use 0 ,1 dims?
def evaluate(input_model):
correct_prediction = 0
generator = img_gen_val()
x_test, y_test = next(generator)
# print(" ")
y_pred = input_model.predict(x_test)
`shape = y_pred[:, 2:, :].shape `
ctc_decode = bknd.ctc_decode(`y_pred[:, 2:, :]`, input_length=np.ones(shape[0])*shape[1])[0][0]
out = bknd.get_value(ctc_decode)[:, :label_len]
I tried training East Text detection algorithm, wasn't successful in detection of all the lines in a document.
I tried Ctpn model on VOC pascal dataset ...
It works really well for font size of certain value... Beyond tht if font in an document is too small .... It fails to detect lines properly...
Any suggestions?
STN should be added at this location
Bro I completely followed Ur approach but have some doubts related to CTPN model for text detection....
Reference:-
1.)https://github.com/YCG09/chinese_ocr/
2.)https://github.com/xiaomaxiao/keras_ocr?files=1
I need you to explain how robust is this approach which has 2 parts text detection and text recognition compared to your's
Thank you...
Again thanks for sharing your code with us.
I'd love working on your code and I want to use in my phone.
So I need to convet the keras model to core ML.
this is the function I need to call:
def convert(model,
mode=None,
image_input_names=[],
preprocessing_args={},
image_output_names=[],
deprocessing_args={},
class_labels=None,
predicted_feature_name='classLabel',
add_custom_layers = False,
custom_conversion_functions = {})
I was wondering can you help me with the argument according to your code?
like what will be image_input_names,....
Thanks in advance for taking the time:)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.