Giter VIP home page Giter VIP logo

ronnie-tian / attenscriptnetpr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ankanbhunia/attenscriptnetpr

0.0 1.0 0.0 5.99 MB

This repository contains the codes and instructions to use the trained models for all the four datasets described in the paper : 'Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network'

Python 28.81% Jupyter Notebook 71.19%

attenscriptnetpr's Introduction

Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network [PDF][Arxiv]

https://www.sciencedirect.com/science/article/pii/S0031320318302590

Published in Pattern Recognition 2019

This repository contains the codes and instructions to use the trained models for all the four datasets described in the paper.

Alt text

Introduction

Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging.

Pre-requisites

  • python 2.7
  • Tensorflow
  • OpenCV
  • matplotlib
  • jupyter notebook

Training

  • Network Architecture

In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First, we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Then we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion technique which dynamically weights the local and global features for an individual patch

Alt text

You can also interactively visualize the tensorboard graph by running the following:

tensorboard --logdir='logfiles'
  • Training curves

Tensorboard scaler plot of loss function and accuracy over number of epochs on CVSI-15 dataset

Alt text

  • Visualize Attention maps

Alt text

We can visualize the relative importance of the patches from the attention maps. High attention is shown in white and low attention is shown in black.

Datasets & Trained models

We evaluated our model on four datasets : CVSI-2015, SIW-13, ICDAR-2017, MLe2e. To faciliate further research, we are making available the trained models on all these four datasets.

Dataset Trained Model Accuracy
CVSI-15 Download 97.75%
SIW-13 Download 96.5%
ICDAR-17 Download 90.23%
MLe2e Download 96.7%

Usage

Clone or download this repository. Download the pre-trained models and unzip the models in the Models/<dataset_name> folder. Now, open jupyter notebook , run the main.ipynb file and execute the code.

Citations

If you find this code useful in your research, please consider citing:

  @article{BHUNIA2018,
  title = "Script Identification in Natural Scene Image and Video Frames using an Attention based Convolutional-LSTM Network",
  journal = "Pattern Recognition",
  year = "2018",
  issn = "0031-3203",
  doi = "https://doi.org/10.1016/j.patcog.2018.07.034",
  url = "http://www.sciencedirect.com/science/article/pii/S0031320318302590",
  author = "Ankan Kumar Bhunia and Aishik Konwer and Ayan Kumar Bhunia and Abir Bhowmick and Partha P. Roy and Umapada Pal",
  keywords = "Script Identification, Convolutional Neural Network, Long Short-Term Memory, Local feature, Global feature, Attention    Network, Dynamic Weighting"
  }

Authors

attenscriptnetpr's People

Contributors

ankanbhunia avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.