Giter VIP home page Giter VIP logo

Comments (24)

lgaida avatar lgaida commented on June 7, 2024 1

I also wanted to play around with the pre-trained weights of the holistic mode, so i downloaded 'vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5'

I used Keras with Tensorflow and assumed that using the vgg16 from keras.applications should work

from keras import applications
vgg = applications.VGG16(include_top=True, weights='PATH_TO_WEIGHTSFILE', classes=16)

Turns out you don't even have to convert the weights from theano to tensorflow on your own, since keras does this internally in model.load_weights (which is called inside vgg16 if you provide a weightsfile).

Initialization of the model + loading the weights seem to work, i didn't get any errors.
I then used a few examples from rvl-cdip to test everything. Sadly, every image tested was classified as memo.

Beeing suspicious about the weight-conversion, i set up a new project and installed keras with theano. And again, loading the model with weights worked but all test-images were classified as memo.

In 'IV-B Preprocessing' of the paper it is said that:

Following the resizing, all datasets were standardized

Can someone clarify what "standardized" means? Mean Pixel Substraction? Rescaling?

I would appreciate if someone could confirm that the provided weights actually work.

from document-image-classification-tl-sg.

martinnormark avatar martinnormark commented on June 7, 2024 1

For anyone looking to run this with Tensorflow 2.0, the following will work.

Install dependencies:

pip install tensorflow
pip install keras
pip install pillow (used for inference later)

Download a weights file, e.g. vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5 from Google Drive

Download the convert script from this repo.

Open the convert script, and make the following changes:

  • Set the model_weights array at the top to point to the weight file(s) you have downloaded
  • Replace K.set_image_dim_ordering('th') with K.common.set_image_dim_ordering('th').

Run python Weight_conversion_th_to_tf_Keras2.py from terminal/command prompt.

A new folder is created (tf-kernels-channels-last-dim-ordering) and contains the converted weights file.

Open the folder and create new file called test.py with the following code:

from keras import applications
vgg = applications.VGG16(include_top=True, weights='./vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5', classes=16)
import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

class_map = ['letter', 'form', 'email', 'handwritten', 'advertisement',
	'scientific report', 'scientific publication', 'specification', 'file folder',
	'news article', 'budget', 'invoice', 'presentation', 'questionnaire',
	'resume', 'memo']

def test(path):
	img = image.load_img(path, target_size=(224, 224))
	img = image.img_to_array(img)
	img = preprocess_input(img)
	x = np.expand_dims(img, 0)
	y = vgg.predict(x)
	print(y)

	idx = np.argmax(y)
	print('predicted class: {}', class_map[idx])

test('../form.jpg')

Now run the code: python test.py and it will print out the predicted class of the image.

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024 1

@martinnormark Hey thanks for the guide. I've added a link to this on the main Readme.

from document-image-classification-tl-sg.

hiepph avatar hiepph commented on June 7, 2024

Tks @lgaida, I successfully load the trained weights as you suggest. My input is preprocessed as:

from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img = image.load_img('my_image', target_size=(224, 224))
img = image.img_to_array(img)
img = preprocess_put(img)

x = np.expand_dims(img, 0)

But when I tried to predict with holistic model I had the same problem with you:

y = vgg.predict(x)
np.argmax(y) # always end up at id 8 (which is file folder)

from document-image-classification-tl-sg.

lgaida avatar lgaida commented on June 7, 2024

@hiepph too bad 😢

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

Can someone clarify what "standardized" means? Mean Pixel Substraction? Rescaling?

By "standardized", we mean subtract the mean and divide by the standard deviation.

Regarding the data loading issues, I can try to look into our old code and configurations and try to elaborate.

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

I can confirm however that everything we did was using theano as the backend.

So the input dimensions as well as the weights are in theano ordering. If you are using tensorflow as the backend, then you have to either switch backends to theano or change weight orderings for everything to work I think.

Turns out you don't even have to convert the weights from theano to tensorflow on your own, since keras does this internally in model.load_weights (which is called inside vgg16 if you provide a weightsfile).

I cannot however neither confirm nor deny this since I have not worked with that functionality myself.

from document-image-classification-tl-sg.

lgaida avatar lgaida commented on June 7, 2024

Hi @saikat-roy thank you for replying 👍
It would be fantastic if you could peek at your code again, maybe providing some code snippets. Playing around with dim-ordering is fine, but guessing and assuming preprocessing is way harder.

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

Hey @lgaida. We apologize for not replying sooner but the source code of the project was never really written for what you might call, public consumption (also known as, it's an absolute mess) so we are scrambling to dig it out of storage.

It would be fantastic if you could peek at your code again, maybe providing some code snippets. Playing around with dim-ordering is fine, but guessing and assuming preprocessing is way harder.

# X is the main data matrix organized as (samples,channel,height,width) formatting
# Initially X has been created with 3 channels to match original VGG16 input but
# since RVL-CDIP images are grayscale, we simply copy the 1st channel onto the 
# 2nd and 3rd channel. but after standardization as you will see below.

_mean = X[:,0,:,:].mean(axis=0)
_std  = X[:,0,:,:].std(axis=0)
	
_jmp = 1000 # We essentially do the standardization in mini-batches 
            # of size '_jmp' due to memory constraints

for i in range(0,X.shape[0],_jmp):
	end = min(i+_jmp,X.shape[0])
	X[i:end,0,:,:] = (X[i:end,0,:,:]-_mean)/_std # batch standardization
	X[i:end,1,:,:] = X[i:end,0,:,:] # batch copying to channel 2
	X[i:end,2,:,:] = X[i:end,0,:,:] # batch copying to channel 3

I am digging through our old files and this is the preprocessing snippet that I found we had used. I should however warn you that the _mean and _std calculation that we used are the naive versions and will consume a ridiculous amount of memory and if used without extremely large RAMs will probably lead to crashes. We used AWS EC2 instances (also we were being a bit lazy) so it wasn't a problem for us but I would recommend modifying it in some way (maybe doing it manually in mini-batches) to suit lower hardware configurations.

from document-image-classification-tl-sg.

lgaida avatar lgaida commented on June 7, 2024

Thanks for replying so quickly. I'm going to play around with your code snippet, and i'm currently implementing something very similar on my own.

To reduce even more assumptions:
X in your code snippet represents the train images of rvl-cdip, and you normalize the test samples with the mean & std of this X (=train samples), right?
Or is X the whole rvl-cdip including train, test, validation?

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

X in your code snippet represents the train images of rvl-cdip, and you normalize the test samples with the mean & std of this X (=train samples), right?
Or is X the whole rvl-cdip including train, test, validation?

While the first case you suggested might be more experimentally sound, we actually ran this snippet separately for train, test and validation sets, standardizing each dataset with their own mean and standard deviation.

from document-image-classification-tl-sg.

lgaida avatar lgaida commented on June 7, 2024

Hello again,
I installed Theano and tested both my own and your implementation of the normalization. Still not able to make good predictions 👎
If you don't want to publish the code, any chances i might get it? I would try to come up with a publishable code snippet, providing a small example on how to use the weights for prediction.

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

I installed Theano and tested both my own and your implementation of the normalization. Still not able to make good predictions -1

That's odd. I'm guessing you did the whole changes in the keras.json configuration file by setting the "backend" and "image_data_format" already. Strange that it wouldn't work.

If you don't want to publish the code, any chances i might get it? I would try to come up with a publishable code snippet, providing a small example on how to use the weights for prediction.

Sure. Give us a little time, like a day or so, and we'll give you the version of the code that we had used.

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

I installed Theano and tested both my own and your implementation of the normalization. Still not able to make good predictions

Hey @lgaida. I was digging around our code and I saw something. I know the last version I gave you didn't have a NaN guard for the standardization. Did your version have one?

_jmp = 1000
eps = 0.0001
for i in range(0,X.shape[0],_jmp):
	end = min(i+_jmp,X.shape[0])
	X[i:end,0,:,:] = (X[i:end,0,:,:]-_mean)/(_std+eps) # batch standardization
	X[i:end,1,:,:] = X[i:end,0,:,:] # batch copying to channel 2
	X[i:end,2,:,:] = X[i:end,0,:,:] # batch copying to channel 3		

from document-image-classification-tl-sg.

lgaida avatar lgaida commented on June 7, 2024

Hey @lgaida. I was digging around our code and I saw something. I know the last version I gave you didn't have a NaN guard for the standardization. Did your version have one?

Kind of, i initialized the array with zeroes.

Sure. Give us a little time, like a day or so, and we'll give you the version of the code that we had used.

Sounds great 👍 I'll be waiting until then :) Feel free to contact me via github or email (see github-profile)

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

Kind of, i initialized the array with zeroes.

I mean to say that (as far as I remember) the std of X in some places is 0. So you would be getting NaNs in the standardized input in some places. Do we mean the same thing? It was an issue for us if I am still remembering correctly. Try adding a small value like 0.0001 or something to the _std like above and try running the examples again if you haven't yet specifically guarded against this.

from document-image-classification-tl-sg.

hiarindam avatar hiarindam commented on June 7, 2024

Hello @lgaida , thanks for your interest in our work and reaching out to us.

Kind of, i initialized the array with zeroes.

I would repeat the same thing as mentioned by @saikat-roy that even though initialization was done with all zeros, unfortunately that doesn't guarantee that you won't get NaN. Please consider adding this safe guard in your code and let us know.

from document-image-classification-tl-sg.

lgaida avatar lgaida commented on June 7, 2024

Hello @lgaida , thanks for your interest in our work and reaching out to us.

Kind of, i initialized the array with zeroes.

I would repeat the same thing as mentioned by @saikat-roy that even though initialization was done with all zeros, unfortunately that doesn't guarantee that you won't get NaN. Please consider adding this safe guard in your code and let us know.

I just added the guard but still get Label 8 for every tested sample 😢

from document-image-classification-tl-sg.

hiepph avatar hiepph commented on June 7, 2024

Hi @saikat-roy, can you provide the mean and std values of your training set so I can standardize my inputs before forwarding through the trained model?

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

Hey sorry for the late reply.

Hi @saikat-roy, can you provide the mean and std values of your training set so I can standardize my inputs before forwarding through the trained model?

I'm really sorry but we don't have the computational environment setup, that we had set up for processing the dataset, available currently.

I just added the guard but still get Label 8 for every tested sample

We will however, be looking into releasing more of our code and testing the model weights ourselves since it is disturbing to hear the model weights do not load as expected. While we cannot do it immediately, we do plan to try it in a week or two.

So I would request your patience for a while longer and hopefully we can get back to you with better news than "we don't know what's wrong, this shouldn't be happening".

from document-image-classification-tl-sg.

lgaida avatar lgaida commented on June 7, 2024

Just want to remind you that i could also take a look at the code 👋

from document-image-classification-tl-sg.

puneetiitian avatar puneetiitian commented on June 7, 2024

Hi Saikat, Arindam,

First of all thanks for writing this great article
I am also getting everything predicted as 8. Below is my code: Kindly assist and let us know how to get this resolved

from keras import applications
vgg = applications.VGG16(include_top=True, weights='F:/Doc_Image_Classification/vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5', classes=16)
import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
img = image.load_img('F:/Doc_Image_Classification/images/pic1.png', target_size=(224, 224))
img = image.img_to_array(img)
img = preprocess_input(img)
x = np.expand_dims(img, 0)
y = vgg.predict(x)
np.argmax(y)

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

Okay so first and foremost we are sincerely sorry about the ridiculously late updates to this issue. Unfortunately as we mentioned, we have since stopped working on this project and have literally no hardware or software setup available to test the models any more. I know its frustrating to have your queries not answered but we have gotten little to no time to really go through the code for this bug - we have thought a lot about it and simply put, it did NOT exist when we worked on it.

The reason I am writing this update is to mention that we recently went through multiple issues on the keras forums regarding issues with model.save and model.load in keras. From our end, the code should run fine if the data is simply standardized as I had mentioned earlier, which everyone seems to be doing as well - so if you are still using our code I gently urge you to look into whether the keras bugs for serialization are to blame here. We will go over it ourselves if we can but without a proper hardware setup, we sincerely can't promise anything in terms of time.

I thank you for being patient with us and again we sincerely apologize for not actively helping out with the issue. To anyone who needs our code, we will attempt to simply just release the .py files with some minor cleaning soon - since we can't help out actively this is the least we can do at this point.

from document-image-classification-tl-sg.

saikat-roy avatar saikat-roy commented on June 7, 2024

An attempt to solve the weight loading has been added to the readme. So we'll be closing this issue.

from document-image-classification-tl-sg.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.