belval / crnn Goto Github PK
View Code? Open in Web Editor NEWA TensorFlow implementation of https://github.com/bgshih/crnn
License: MIT License
A TensorFlow implementation of https://github.com/bgshih/crnn
License: MIT License
Line 167 in 0633495
acutually, i think you shoud set the param merge_repeated to False, so the result will keep the repeated char. decoded, log_prob = tf.nn.ctc_beam_search_decoder(logits, seq_len, merge_repeated=False)
.
so your Ground truth text in readme.md 'semantically' will not be recognized as 'semanticaly'
hello ,i will trouble you again. the pretrained model can be tested using chinese?
when i test in chinese it has the error
File "/Users/liufengnan/workspace/OCR/CRNN/CRNN/utils.py", line 48, in <listcomp> return [config.CHAR_VECTOR.index(x) for x in label] ValueError: substring not found
and then i change the CHAR_VECTOR in config.py use chinese characters. have error with shape
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [512,3992] rhs shape= [512,70] [[Node: save/Assign = Assign[T=DT_FLOAT, _class=["loc:@W"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](W, save/RestoreV2)]]
and can yue understand my english, it is poor for me
Bug position: data_manage.py->resize_omage function
code:
im_arr = imread(image, mode='L')
r, c = np.shape(im_arr)
if c > input_width:
c = input_width
ratio = float(input_width) / c
final_arr = imresize(im_arr, (int(32 * ratio), input_width))
else:
final_arr = np.zeros((32, input_width))
ratio = 32.0 / r
im_arr_resized = imresize(im_arr, (32, int(c * ratio)))
final_arr[:, 0:np.shape(im_arr_resized)[1]] = im_arr_resized
return final_arr, c
detail:
if the shape of im_arr is (22,92),the programe will run code in 'else',so the shape of im_arr_resized will be(32,int(c*(32/r))) i.e (32,int(92*(32/22)))=(32,133).
At this code : final_arr[:, 0:np.shape(im_arr_resized)[1]] = im_arr_resized
.You just make the fina_arr[ : ,0 : 133]=im_arr_resized,but the max_width of final_arr is 100
Hello! I use your network to train Chinese and English.Char_vector length is 1000+,like '®·、。〇《》一七万三上下不专且世业东丝两个中丰串临丶主丽举久义乐乒乔九也习书买了事' and so on.
The training process is as follows:
---- 50 ----
GT: 水果捞
PREDICT:
---- 50 ----
GT: 品牌电脑
PREDICT:
---- 50 ----
[50] Iteration loss: 482.9037551879883 Error rate: 1.0
step: 51
PREDICT is Blank space.
Where is the problem?
it seems that it only uses cpu to train the model...
When I train my model, I want to predict the characters in a picture. But when I use the command——‘python3 run.py -ex ../data/test --test --restore’,Will show the following results.
Restoring
Checkpoint is valid
0
Loading data
Testing
I want to know how to predict a picture,Thanks!
what if the character numbers of my training data are different,how to train?
Hello @Belval . I want to train a new model to recognize captcha . I use 3000 samples, but loss fall slow . Can you give me some advice and tricks ? add epoch ? Thank you!
how to modify the connectionist Temporal Classification (CTC) layer of the network to also give us a confidence score?
Hello @Belval . How could i sovle overfitting problem? Thanks !
Hi, can you approximate train time of 100k examples on Gtx 1080? I started it, seems very slow. Thanks.
how to use the pretrained model in the "save" folder ,i have tried many times ,thank you for your reply
hello,when i tried to train the model ,i meet the follows:
InvalidArgumentError (see above for traceback): sequence_length(0) <= 15.it happend at code ' loss = tf.nn.ctc_loss(targets, logits, seq_len) ' when feed datas
Hello, I want to try this command
python3 run.py -ex ../data/test --test --restore
But I found there is no ../data/test in the folder of CRNN
Could you help me?
I wanna if there are something developments in current LSTM+CTC project, which I notice your project was established 2years ago.
Wait for you response!
Thank you for sharing this wonderful project. It works quite well for my number recg problem.
But I have some confusion.
In feed_dict, self.__seq_len: [self.__max_char_count] * self.__data_manager.batch_size,
and
max_char_count = reshaped_cnn_output.get_shape().as_list()[1] .
I'm not quite understand of that. Should seq_len be the width of img after cnn part of shape(batchsize, ?)
Sorry for my poor English.
Hello, Belval, it's me again~ My mentor want me to implement crnn with tensorflow...But I'm poor at coding,I don't know how to start this project, can u give me some advice? Thanks very very very much!
when i trian the model ,i use 200000W word ,batchsize = 128 ,adam = 0.00001,
epoch = 100 ,
the result as
nonlister
nonlaterzzzzzzzz
neep
miepzzzzzzzz
i don't know why ?
the result is not good ?
hello, I want to run your project, but I dont know choose which dataset, can you help me? Thanks a lot @Belval
logits = tf.transpose(logits, (1, 0, 2)) why? the original order is [batch time class]
self.__seq_len: [self.__max_char_count] * self.__data_manager.batch_size why? I think seq should be a varient number equal every single target seq length
When I test the result by python3 run.py -ex ../out --test --restore ,I can't get any results. The out folder contains the pictures generated by the code form your repo(TextRecognitionDataGenerator).
The console just outputs these.
Restoring
Checkpoint is valid
0
Loading data
Testing
Process finished with exit code 0
Thank you for help me!
def resize_image(image, input_width):
im_arr = imread(image, mode='L')
r, c = np.shape(im_arr)
if c > input_width:
c = input_width
ratio = float(input_width) / c
final_arr = imresize(im_arr, (int(32 * ratio), input_width))
else:
final_arr = np.zeros((32, input_width))
final_arr = np.zeros((32, input_width), dtype=np.uint8)
if we test the image and the name of image shows the result , why i need to train it
Hi , Belval !
First , I want to say thanks.Your codes give me lots of help . But there are some problems with me .
I use your pretrained model to recognise some nubmers and some symbols like"+" , but the results is not good,so I want to train my own model to do my work but I don't know how to train ,can you give me some guidance ?
Hope for your reply.Thanks !!
I have a little question about this part below. Does this mean you slice it along first axis, which means you slice it along batch-size dimension? But according to the paper,shouldn't it be sliced along 'w' dimension?
def MapToSequences(x):
x = tf.squeeze(x, [1])
x = tf.unstack(x)
return x
When i run testing i get the below error
(vinayak) C:\Users\vinayak\Documents\github\CRNN-1\CRNN>python run.py -ex ../sam
ples --test --restore
Restoring
Checkpoint is valid
0
Loading data
examples 10
Traceback (most recent call last):
File "run.py", line 118, in
main()
File "run.py", line 111, in main
args.restore
File "C:\Users\vinayak\Documents\github\CRNN-1\CRNN\crnn.py", line 53, in in
it
self.__data_manager = DataManager(batch_size, model_path, examples_path, max
_image_width, train_test_ratio, self.__max_char_count)
File "C:\Users\vinayak\Documents\github\CRNN-1\CRNN\data_manager.py", line 26,
in init
self.test_batches = self.__generate_all_test_batches()
File "C:\Users\vinayak\Documents\github\CRNN-1\CRNN\data_manager.py", line 108
, in __generate_all_test_batches
(-1)
File "C:\Users\vinayak\Documents\github\CRNN-1\CRNN\utils.py", line 17, in spa
rse_tuple_from
indices.extend(zip([n]*len(seq), [i for i in range(len(seq))]))
TypeError: object of type 'numpy.int32' has no len()
what's the result in ICDAR2013,my result is lower than paper
what`s the format of your training data ? I want train your model from scratch.thanks a lot
max_width 训练的时候是256,预测的时候输入1024行不,这个参数是固定的么
Can I see the actual recognition effect of your program after training?
I know you add char count limit (24) when load pictures. and after CNN part the image will become a tensor of size 24x512.
if I want to recognize longer text,what should I do?
I generated training data using TextRecognitionDataGenerator.
ValueError: substring not found
When 'data_manager', it will meet a bug caused by pictures named 'A&P_58395.jpg' ,'R&D_14671.jpg'...
I think that's because ‘&‘ is a special symbol of Linux.
hi Belval , when i retrain the crnn code ,i found that if there were continuous repetition letters of the OCR, the result offer missed the repetition letters and outputed the single letter . Such as the Ground truth of the OCR is '0870011' , '37075337' , and the Prediction is '08701' , '3707537' . If there were no continuous repetition letters ,the result was correct . My training data only include digital provide by your project https://github.com/Belval/TextRecognitionDataGenerator .
Is it the problem of the CTC ? How can i to solve this problem?
ths!
the result show that
Restoring
Checkpoint is valid
0
Loading data
Testing
Process finished with exit code 0
在def __generate_all_train_batches(self)里面,
batch_dt = sparse_tuple_from(
np.reshape(
np.array(raw_batch_la),
(-1)
)
)
这样会报错: int型没有len属性
改成
batch_dt = sparse_tuple_from(
np.array(raw_batch_la)
)
就能训练了,但训练非常非常慢,10小时loss没有任何降低
I'm getting different output everytime I run the code for testing the pretrained model. and its not correct.
Kindly help.
Thanks
then,it exits.
连续两个相同的字符会漏掉一个:
例如: aaby识别为aby, 887y识别为87y
I wanted to create some images with your Generator with
python3 run.py -c 200000 -w 1 -t 8
as documented, but wasn't able to since the run.py is missing there. Did something change in the mean time?
My labels contain arabic/urdu text.
For example "اسلام آباد : چیئرمین رضابانی کی زیر صدارت سینیٹ کا اجلاس"
What changes are required to train the model given non-English labels?
I trained with the data generated by your tool TextRecognitionDataGenerator.
And trained with 100 iterations, the loss is always be inf, I'm wondering the reason, thx a lot.
This code is relatively old, uses a lot of deprecated APIs, and could use a refactor in order to be maintainable.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.