Giter VIP home page Giter VIP logo

visual-semantic-relatedness-with-word-embedding's Introduction

Visual Semantic Relatedness with Word Embedding (SWE)

Improved implementation of the paper Visual Re-ranking with Natural Language Understanding for Text Spotting. Sabir et al. ACCV 2018.

image

Introduction

Many scene text recognition approaches are based on purely visual information and ignore the semantic relation between scene and text. In this paper, we tackle this problem from natural language processing perspective to fill the gap between language and vision. We propose a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text. For this, we initially rely on an off-the-shelf deep neural network, already trained with large amount of data, which provides a series of text hypotheses per inputimage. These hypotheses are then re-ranked using word frequencies and semantic relatedness with objects or scenes in the image. As a result of this combination, the performance of the original network is boosted with almost no additional cost. We validate our approach on ICDAR’17dataset.

Model

Count-based word embedding visual re-ranker

Requirement

conda create -n Visual_w2v python=3.8 anaconda
conda activate Visual_w2v
pip install gensim==4.1.0

Data

Install GloVe pre-trained word vectors glove.6B.300d.txt bigger is better, the 840B pre-trained word vectors is recommneded. We use Glove as main in this work. The advantage of Glove over Word2Vec is that it does not rely on local word-context information, but it incorporates global co-occurrence statistics.

For w2v install GoogleNews-vectors-negative300.bin

For fastext install crawl-300d-2M.vec

Quick Start

Familiarize yourself with the model architecture by running it in Colab

How to run

To be able to use w2v/Glove as visual re-ranker, we need the following information

  • The spotted text text_spotted.txt: word candidates from the baseline
  • The original hypothesis score from the baseline baseline.txt softmax output
  • The hypothesis LM.txt: initialized by common observation (ie LM)
  • Visual information from the image visual-context_label.txt: initialized visual context or classifer confident
  • Visual information confidence visual-context_prob.txt from the classifier -ie RseNet152

After having all the required information run as shown in Example 1 (below)

For GloVe

quarters-example/python glove-visual.py --ulm LM.txt --bl baseline.txt --text spotted-text.txt --vis visual-context_label.txt --vis_prob visual-context_prob.txt

For w2v

quarters-example/python w2v-visual.py --ulm LM.txt --bl baseline.txt --text spotted-text.txt --vis visual-context_label.txt --vis_prob visual-context_prob.txt

For fasttext

quarters-example/python fastext-visual.py --ulm LM.txt --bl baseline.txt --text spotted-text.txt --vis visual-context_label.txt --vis_prob visual-context_prob.txt

Example 1

full image -->

Orignial baseline softmax score

quartos  0.060192
quotas   0.040944	
quarters 0.03037

After visual re-ranking visual_glove_result.txt

quarters 7.040899415659617e-06
quotas   4.0903987856408736e-07
quartos  2.0644119047556385e-09

Example 2

full image -->

Orignial baseline softmax score

stook 0.4865732956	
sioux 0.0919743552	
stock 0.0703927792

After visual re-ranking visual_glove_result.txt

stock 0.00018136249963338343
sioux 7.23838175424e-06
stook 8.07711670696e-07

Citation

Please use the following bibtex entry:

@inproceedings{sabir2018visual,
  title={Visual re-ranking with natural language understanding for text spotting},
  author={Sabir, Ahmed and Moreno-Noguer, Francesc and Padr{\'o}, Llu{\'\i}s},
  booktitle={Asian Conference on Computer Vision},
  pages={68--82},
  year={2018},
  organization={Springer}
}

visual-semantic-relatedness-with-word-embedding's People

Contributors

ahmedssabir avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.