Giter VIP home page Giter VIP logo

hiarindam / document-image-classification-tl-sg Goto Github PK

View Code? Open in Web Editor NEW
42.0 3.0 16.0 183 KB

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

Home Page: https://arxiv.org/abs/1801.09321

License: MIT License

Python 100.00%
document-classification deep-learning transfer-learning structure-learning deep-convolutional-neural-networks image-classification document-image-classification training-strategies

document-image-classification-tl-sg's Introduction

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

Contributors: Arindam Das, Saikat Roy, Ujjwal Bhattacharya, S.K. Parui

This research work has been made available here.

This page is published with intention to provide region based pre-trained models for document image classification for document structure learning. For using weight matrices, please note that we used theano as the backend for all our experiments hence everything is ordered per theano's style.

Please cite our work if you find it useful for you research.

@inproceedings{das2018document,
  title={Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks},
  author={Das, Arindam and Roy, Saikat and Bhattacharya, Ujjwal and Parui, Swapan K},
  booktitle={2018 24th International Conference on Pattern Recognition (ICPR)},
  pages={3180--3185},
  year={2018},
  organization={IEEE}
}

Theano to Tensorflow Weight Convertor

There has been an ongoing issue by users unable to use (properly load) the weights in tensorflow using a convertor or otherwise since the version of theano and keras used for this project was pretty old (late 2017/early 2018). Please also look at the section on preprocessing the input. This section deals with weight conversion from theano to tensorflow. This particular module was developed by Auke Zijlstra ([email protected]) and although he was unable to replicate the exact results we had using this script, he did get things working. We provide excerpts from his communication with us on the usage of the script.

"... Although I have not been able to fully replicate your results, I have been able to achieve 0.87 accuracy score on the RVL-CDIP test set using your holistic model weights with a Keras+tensorflow setup. My steps to convert your Theano ordered weights into Tensorflow ordering were as follows:

Hopefully this gives a way forward for people having issues using our weights for newer versions of keras, theano, tensorflow and the like.

Detailed Guide for Tensorflow 2.0

Martin H. Normark was nice enough to provide a detailed guide for running the models with Tensorflow 2.0.

Dataset

RVL-CDIP has been used to validate the proposed methodology. This dataset consists of 400000 scanned grayscale images distributed among 16 categories. Also this collection is subdivided into training, validation and test sets each containing 320000, 40000 and 40000 images respectively.

Preprocessing

Please look at this comment to see a small example on how to preprocess the input for the networks.

Proposed Architecture

Experimental Results

Performance Comparison with State-of-the-art Approaches
Method Accuracy(%) Comments
Harley et al. [1] 89.90 Document region based DCNN models with transfer learning
Tensmeyer et al. [2] 89.31 Spatial pyramidal pooling based AlexNet without transfer learning
Tensmeyer et al. [2] 90.94 Same model as above with increased image dimension (384X384) keeping aspect ratio same
Csurka et al. [3] 90.70 GoogleNet with weights transferred from ImageNet
Afzal et al. [4] 90.97 VGG-16 with weights transferred from ImageNet
Kölsch et al. [5] 90.05 Weights transferred from ImageNet to VGG-16 and adding ELM in place of MLP
Proposed 91.11 VGG-16 model trained on holistic samples with weights transferred from ImageNet
Proposed 92.21 Inter and intra domain transfer learning on region based DCNNs and MLNN based stacking

Pre-trained Models

Trained models in this publication have been made available here. Please note that all weight matrices are formatted with theano as a background and not tensorflow. That also includes theano style input dimension ordering.

References

[1] A. W. Harley, A. Ufkes, and K. G. Derpanis, “Evaluation of deep convolutional nets for document image classification and retrieval,” in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015, pp. 991–995.

[2] C. Tensmeyer and T. Martinez, “Analysis of convolutional neural networks for document image classification,” arXiv preprint arXiv:1708.03273, 2017.

[3] G. Csurka, D. Larlus, A. Gordo, and J. Almazan, “What is the right way to represent document images?” arXiv preprint arXiv:1603.01076, 2016.

[4] M. Z. Afzal, A. K¨olsch, S. Ahmed, and M. Liwicki, “Cutting the error by half: Investigation of very deep cnn and advanced training strategies for document image classification,” arXiv preprint arXiv:1704.03557, 2017.

[5] Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, Marcus Liwicki, "Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification", arXiv preprint arXiv:1704.03557, 2017.

document-image-classification-tl-sg's People

Contributors

hiarindam avatar saikat-roy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

document-image-classification-tl-sg's Issues

train code

I'd like to use this algorithm to train some other ticket classification task. Could you offer the train code?

Concatenation of the base class softmax predictions

Dear author,

Please allow me to clarify some details regarding the implementation. To run inference on an image using your proposed "MLNN based stacking of holistic & region-based models with inter and intra-domain weights transfer" method, I would first run each of the 5 base models (with your given weights) on the image and get 5 different "base class softmax predictions". I would then concatenate those 5 base class softmax predictions... but since each softmax prediction has 16 scores wouldn't their concatenation have a total of 5*16 scores. Afterwards, I am supposed to feed those scores to an MLNN to get the final prediction. Is that right? Do you also have the weights for the MLNN? Or do I have to train it on the validation set myself?

Please help me as I would like to be able to reproduce your results (92.21% accuracy) for my research project.

Yours sincerely,
Gordon

Source code

Other than pretrained weights, are you planning to also provide the reference source code (preprocessing, meta-classifier, etc...) used for the article?

Thanks

For evaluation, is test set normalized in the same manner as the train set?

I computed mean and std for rvlcdip to be 0.9919 and 0.1853 so these are the following transforms I'm using for the train set:
dataset = RvlCdipDataset('labels/train.txt', 'images/', transform=transforms.Compose([transforms.Resize((224,224)), transforms.Normalize(0.9199, 0.1853), transforms.Lambda(lambda x: x.repeat(3,1,1))]))

Do I use the same transforms for the test set?

How to load pre-trained weights?

I see you post the link to pre-trained weights but no tutorial how to use them.

Tried with keras:

from keras.models import load_model
load_model('vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5')

but failed ValueError: No model found in config file.

How do I load these h5 weights? Can you provide the model source so I can use model.load_weights(...)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.