Giter VIP home page Giter VIP logo

fendaq / weakly-supervised-text-detection Goto Github PK

View Code? Open in Web Editor NEW

This project forked from penny4860/weakly-supervised-text-detection

0.0 2.0 0.0 417.8 MB

I implemented a detection algorithm with a classification data set that does not have annotation information for the bounding box. Based on resnet50 network, I implemented text detector using class activation mapping method.

Python 100.00%

weakly-supervised-text-detection's Introduction

Weakly Supervised Text Detector

I am implementing a detection algorithm with a classification data set that does not have annotation information for the bounding box. I used the class activation mapping proposed in Learning Deep Features for Discriminative Localization.

Requirements

  • python 3.6
  • tensorflow 1.2.1
  • keras 2.1.1
  • opencv 3.3.0

Usage

The procedure to build detector is as follows:

1. Fine Tuning (1_train.py)

Pre-trained weight files are stored at https://drive.google.com/drive/folders/1kj398ZW3zwk-KMDA0_Y5tq6Gr09oFDbZ. This file allows you to skip the fine tuning process.

  • First build a binary classifier using the pretrained resnet50 structure. As with the structure in the Learning Deep Features for Discriminative Localization, I removed all layers after the activation layer with spatial resoultion of 14x14, and added a convolution layer of 3x3 size instead. The last few layers of the network have the following structure.
activation_40 (Activation)      (None, 14, 14, 1024) 0           add_13[0][0]                     
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 14, 14, 1024) 9438208     activation_40[0][0]              
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 14, 14, 1024) 4096        conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_50 (Activation)      (None, 14, 14, 1024) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
cam_average_pooling (AveragePoo (None, 1, 1, 1024)   0           activation_50[0][0]              
__________________________________________________________________________________________________
flatten_2 (Flatten)             (None, 1024)         0           cam_average_pooling[0][0]        
__________________________________________________________________________________________________
cam_cls (Dense)                 (None, 2)            2050        flatten_2[0][0]                  
  • Next, you should train a model that can distinguish between images that contain text and those that do not. The dataset is stored in this repository. To reproduce the result, just run 1_train.py.

Running the 1_train.py will do all of this.

2. Plot Class Actication Map (2_cam_plot.py)

  • Now configure the model to output the class activation map. In the previous structure, modify several layers with the following structure. Please refer to the paper for details.
activation_50 (Activation)      (None, 14, 14, 1024) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
binear_up_sampling2d_1 (BinearU (None, 224, 224, 102 0           activation_50[0][0]              
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 50176, 1024)  0           binear_up_sampling2d_1[0][0]     
__________________________________________________________________________________________________
cam_cls (Dense)                 (None, 50176, 2)     2050        reshape_1[0][0]                  
__________________________________________________________________________________________________
reshape_2 (Reshape)             (None, 224, 224, 2)  0           cam_cls[0][0]                    
  • Load the weight we learned earlier on this network. image classifiers and CAM networks have different structures. However, since the layer naming is set in advance, the weight file can be used in both networks in common.

  • Entering image on this network will generate a text activation map.

Running the 2_cam_plot.py will do all of this.

Results

References

weakly-supervised-text-detection's People

Contributors

penny4860 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.