Giter VIP home page Giter VIP logo

a-crnn-model-for-text-recognition-in-keras's Introduction

a-crnn-model-for-text-recognition-in-keras's People

Contributors

theailearner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

a-crnn-model-for-text-recognition-in-keras's Issues

CTC decoder giving -1 in out

This is what im getting in out. Can anyone comment on this?
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]

Dataset

The synth90K data set is very hug its 10GB, can you share your copy of dataset.

ValueError: Error when checking input: expected input_4 to have 4 dimensions, but got array with shape (3, 1)

ValueError Traceback (most recent call last)
in ()
1 batch_size = 256
2 epochs = 1
----> 3 model.fit(x=[training_img, train_padded_txt, train_input_length, train_label_length], y=np.zeros(len(training_img)), batch_size=batch_size, epochs = epochs, validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length], [np.zeros(len(valid_img))]), verbose = 1, callbacks = callbacks_list)

2 frames
/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
133 ': expected ' + names[i] + ' to have ' +
134 str(len(shape)) + ' dimensions, but got array '
--> 135 'with shape ' + str(data_shape))
136 if not check_batch_axis:
137 data_shape = data_shape[1:]

Any reason why this error is occurring and how to solve it?

Infinite loss showing after first few iterations of each epoch

Hi,
I'm facing inf loss while training. Although val loss is constantly coming down, train loss always ends up as inf. I have tried lowering lr to 0.00001 also but it happens anyway. What val and train loss should we expect generally after its fully trained as per your dataset?

Error: Expect x to be a non-empty array or dataset.

batch_size = 256
epochs = 10
model.fit(x=[training_img, train_padded_txt, train_input_length, train_label_length], y=np.zeros(len(training_img)), batch_size=batch_size, epochs = epochs, validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length], [np.zeros(len(valid_img))]), verbose = 1, callbacks = callbacks_list)

while running above code, I am getting below error. How to resolve this error

Epoch 1/10


ValueError Traceback (most recent call last)

in ()
1 batch_size = 256
2 epochs = 10
----> 3 model.fit(x=[training_img, train_padded_txt, train_input_length, train_label_length], y=np.zeros(len(training_img)), batch_size=batch_size, epochs = epochs, validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length], [np.zeros(len(valid_img))]), verbose = 1, callbacks = callbacks_list)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1108
1109 if logs is None:
-> 1110 raise ValueError('Expect x to be a non-empty array or dataset.')
1111 epoch_logs = copy.copy(logs)
1112

ValueError: Expect x to be a non-empty array or dataset.

A request for model weights

Hi,
Sorry this is not an issue but I was not sure how to contact you.
Could you please somehow give me the best model weights for what you have trained as it would save me a lot of time if I could initialize with pretrained weights during training.
my email is [email protected]

why can't the model.fit method take images to x and labels to y? What is the reason for passing array of zeroes to y?

In this line of code, in CRNN Model.ipynb notebook:

batch_size = 256
epochs = 10
model.fit(x=[training_img, train_padded_txt, train_input_length, train_label_length], y=np.zeros(len(training_img)), batch_size=batch_size, epochs = epochs, validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length], [np.zeros(len(valid_img))]), verbose = 1, callbacks = callbacks_list)

From what I understand,
x argument of fit method have been assigned with array of images, indexed text with zeroes padded, an array of 31s with same length as input array and length of each target word.
y argument of fit method have been assigned with an array of zeroes.

My doubts:

  1. why are zeroes being sent to y, not train_padded_txt, since this is the array that contains text or target labels.
  2. Why is x being assigned with all of those things. What is the reason. Why can't it just be:
    model.fit(x=training_img, y=train_padded_txt,,,...)

Please help me understand these. Thanks in advance and very nice work and nice of you to open source this.

All input arrays (x) should have the same number of samples.

maybe there is something wrong with this

`labels = Input(name='the_labels', shape=[max_label_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')

def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args

return K.ctc_batch_cost(labels, y_pred, input_length, label_length)

loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([outputs, labels, input_length, label_length])

#model to be used at training time
model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
model.summary()`

I don't know
can you help me?
I want to load my own data
I forked the code
you can see it

this is the error that I get when


ValueError Traceback (most recent call last)
in
5 batch_size=batch_size, epochs = epochs,
6 validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length], np.zeros(len(valid_img))),
----> 7 verbose = 1, callbacks = callbacks_list)

c:\users\yehya\appdata\local\programs\python\python36\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
970 val_x, val_y,
971 sample_weight=val_sample_weight,
--> 972 batch_size=batch_size)
973 if self._uses_dynamic_learning_phase():
974 val_ins = val_x + val_y + val_sample_weights + [0.]

c:\users\yehya\appdata\local\programs\python\python36\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
802 ]
803 # Check that all arrays have the same length.
--> 804 check_array_length_consistency(x, y, sample_weights)
805 if self._is_graph_network:
806 # Additional checks to avoid users mistakenly

c:\users\yehya\appdata\local\programs\python\python36\lib\site-packages\keras\engine\training_utils.py in check_array_length_consistency(inputs, targets, weights)
226 raise ValueError('All input arrays (x) should have '
227 'the same number of samples. Got array shapes: ' +
--> 228 str([x.shape for x in inputs]))
229 if len(set_y) > 1:
230 raise ValueError('All target arrays (y) should have '

ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(4500, 32, 200, 1), (500, 20), (500, 1), (500, 1)]

when I run your repo,

Error when checking input: expected input_1 to have 4 dimensions, but got array with shape (0, 1)

3 model.fit(x=[training_img, train_padded_txt, train_input_length, train_label_length], y=np.zeros(len(training_img)), batch_size=batch_size, epochs = epochs, validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length], [np.zeros(len(valid_img))]), verbose = 1, callbacks = callbacks_list)

the only thing is that my images are 100X200, what is the problem?

Out value is empty

Hi, I am faced with a problem when I execute this model in the last step, I don't have any ctc value and I don't have any result of predicted value, any help please,
this is the result:
original_text = Enuresis
predicted text = []
and this is my code:
char_list = string.ascii_letters + string.digits
print("Character List: ", char_list)

#function to decode the text into indice of char list
def encode_to_labels(text):
# We encode each output word into digits
digit_list = []
for index, character in enumerate(text):
try:
digit_list.append(char_list.index(character))
except:
print("Error in finding index for character ", character)
#End For
return digit_list
a= encode_to_labels('hola')
print(a)

#preprocess the data
#read the image from IAM Dataset
n_samples = len(os.listdir('/home/yosra/Desktop/imagetest'))

#Number of samples in xml file

xml_samples = len(dic)

#list of trining_set

training_img = []
training_txt=[]
train_input_length = []
train_label_length = []
orig_txt = []

#lists for validation dataset
valid_img = []
valid_txt = []
valid_input_length = []
valid_label_length = []
valid_orig_txt = []

max_label_len = 0

Training Variables

k=1

for i, pic in enumerate(os.listdir('/home/yosra/Desktop/imagetest')):
# Read image as grayscale
img = cv2.imread(os.path.join('/home/yosra/Desktop/imagetest', pic), cv2.IMREAD_GRAYSCALE)

    pic_target = pic[:-4]
    # convert each image of shape (32, 128, 1)
    w, h = img.shape

    if h > 128 or w > 32:
        continue
    # endif

    # Process the images to bring them to scale
    if w < 32:
        add_zeros = np.ones((32-w, h))*255
        img = np.concatenate((img, add_zeros))
    # endif
    if h < 128:
        add_zeros = np.ones((32, 128-h))*255
        img = np.concatenate((img, add_zeros), axis=1)
    # endif    

    img = np.expand_dims(img , axis = 2)

    # Normalise the image
    img = img/255.

    # Get the text for the image
    txt = pic_target.split('_')[1]
            
    # compute maximum length of the text
    if len(txt) > max_label_len:
        max_label_len = len(txt)

    if k%10 == 0:     
        valid_orig_txt.append(txt)   
        valid_label_length.append(len(txt))
        valid_input_length.append(31)
        valid_img.append(img)
        valid_txt.append(encode_to_labels(txt))
    else:
        orig_txt.append(txt)   
        train_label_length.append(len(txt))
        train_input_length.append(31)
        training_img.append(img)
        training_txt.append(encode_to_labels(txt))
    k+=1

pad each output label to maximum text length

train_padded_txt = pad_sequences(training_txt, maxlen=max_label_len, padding='post', value = len(char_list))
valid_padded_txt = pad_sequences(valid_txt, maxlen=max_label_len, padding='post', value = len(char_list))

input with shape of height=32 and width=128

input with shape of height=32 and width=128

inputs = Input(shape=(32,128,1))

convolution layer with kernel size (3,3)

conv_1 = Conv2D(64, (3,3), activation = 'relu', padding='same')(inputs)

poolig layer with kernel size (2,2)

pool_1 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_1)

conv_2 = Conv2D(128, (3,3), activation = 'relu', padding='same')(pool_1)
pool_2 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_2)

conv_3 = Conv2D(256, (3,3), activation = 'relu', padding='same')(pool_2)

conv_4 = Conv2D(256, (3,3), activation = 'relu', padding='same')(conv_3)

poolig layer with kernel size (2,1)

pool_4 = MaxPool2D(pool_size=(2, 1))(conv_4)

conv_5 = Conv2D(512, (3,3), activation = 'relu', padding='same')(pool_4)

Batch normalization layer

batch_norm_5 = BatchNormalization()(conv_5)

conv_6 = Conv2D(512, (3,3), activation = 'relu', padding='same')(batch_norm_5)
batch_norm_6 = BatchNormalization()(conv_6)
pool_6 = MaxPool2D(pool_size=(2, 1))(batch_norm_6)

conv_7 = Conv2D(512, (2,2), activation = 'relu')(pool_6)

squeezed = Lambda(lambda x: K.squeeze(x, 1))(conv_7)

bidirectional LSTM layers with units=128

blstm_1 = Bidirectional(LSTM(128, return_sequences=True, dropout = 0.2))(squeezed)
blstm_2 = Bidirectional(LSTM(128, return_sequences=True, dropout = 0.2))(blstm_1)

outputs = Dense(len(char_list)+1, activation = 'softmax')(blstm_2)

model to be used at test time

act_model = Model(inputs, outputs)

act_model.summary()

#the CTC loss fnction is to predict the output text, it is very helpfull for the
#text recognition topic.
labels = Input(name='the_labels', shape=[max_label_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')

def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args

return K.ctc_batch_cost(labels, y_pred, input_length, label_length)

loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([outputs, labels, input_length, label_length])

#model to be used at training time
model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)

model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer = 'adam')

filepath="/home/yosra/Downloads/best_model.hdf5"
checkpoint = ModelCheckpoint(filepath=filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
callbacks_list = [checkpoint]

#train the model

callbacks_list = [checkpoint]
training_img = np.array(training_img)
train_input_length = np.array(train_input_length)
train_label_length = np.array(train_label_length)

valid_img = np.array(valid_img)
valid_input_length = np.array(valid_input_length)
valid_label_length = np.array(valid_label_length)

batch_size = 256
epochs = 10

model.fit(x=[training_img, train_padded_txt, train_input_length, train_label_length], y=np.zeros(len(training_img)),
batch_size=batch_size,
epochs = epochs,
validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length], [np.zeros(len(valid_img))]),
verbose = 1, callbacks = callbacks_list)

act_model.save(filepath)

#test the model
from keras.models import load_model

load the saved best model weights

load the saved best model weights

act_model.load_weights(filepath)

image=valid_img[:10]

predict outputs on validation images

prediction = act_model.predict(image)

use CTC decoder

out = K.get_value(K.ctc_decode(prediction, input_length=np.ones(prediction.shape[0])*prediction.shape[1],
greedy=True)[0][0])

see the results

i = 0
for x in out:
print("original_text = ", valid_orig_txt[i])
print("predicted text = ", end = '')
print(x)
for p in x:
if int(p) != -1:
print(char_list[int(p)], end = '')
print('\n')
i+=1

Error when checking input: expected input_1 to have 4 dimensions, but got array with shape (0, 1)

Getting this error while training the images. Also, I have checked, the dataset is successfully stored in the variable. Why this error is occurring?

`ValueError Traceback (most recent call last)
in
6 epochs = epochs,
7 validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length],[np.zeros(len(valid_img))]),
----> 8 verbose = 1, callbacks = callbacks_list)

E:\Anaconda\envs\PythonCPU\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
1173 val_x, val_y,
1174 sample_weight=val_sample_weight,
-> 1175 batch_size=batch_size)
1176 if self._uses_dynamic_learning_phase():
1177 val_inputs = val_x + val_y + val_sample_weights + [0]

E:\Anaconda\envs\PythonCPU\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
577 feed_input_shapes,
578 check_batch_axis=False, # Don't enforce the batch size.
--> 579 exception_prefix='input')
580
581 if y is not None:

E:\Anaconda\envs\PythonCPU\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
133 ': expected ' + names[i] + ' to have ' +
134 str(len(shape)) + ' dimensions, but got array '
--> 135 'with shape ' + str(data_shape))
136 if not check_batch_axis:
137 data_shape = data_shape[1:]

ValueError: Error when checking input: expected input_9 to have 4 dimensions, but got array with shape (0, 1)
`
Here's my Code:
print(training_img.shape)
print(train_padded_txt.shape)
print(train_input_length.shape)
print(train_label_length.shape)

(3, 32, 128, 1)
(3, 13)
(3,)
(3,)

batch_size = 8
epochs = 20
model.fit(x=[training_img, train_padded_txt, train_input_length, train_label_length],
y=np.zeros(len(training_img)),
batch_size=batch_size,
epochs = epochs,
validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length],
[np.zeros(len(valid_img))]),
verbose = 1, callbacks = callbacks_list)

Predictions are blank

Hi TheAILearner, It's an urgent need could please tell me, why the predictions are blank, ie. nothing is being predicted when I train the model on our own using your code, but when we use your trained model for predictions, then the predictions are well good. (we tried with training 50000, 100000, 150000 images are well with validation images as 10% of total )
Thank you it would be great if you could help ASAP.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.