Giter VIP home page Giter VIP logo

caac's Introduction

Character Aware Alignment Contrastive Learning for Chinese Scene Text Recognition

Official PyTorch implementation for Character Aware Alignment Contrastive Learning for Chinese Scene Text Recognition (CAAC).

Abstract

Indistinguishably caused by enormous character categories and significant character similarity is main challenge for Chinese scene text recognition. In this paper, we explore the topic of discriminative Chinese character representations extraction to cope with the above difficulties. The Character Aware Alignment Contrastive learning (CAAC) is proposed for Chinese scene text recognition, which achieves superior performance in an overall concise framework. We leverage the character aware properties of attentional decoder to instantiate character level contrastive learning with more fine-grained atomic elements than previous sub-word level self-supervised contrastive learning based text recognition methods. In addition, the projection-free strategy for directly coupling task loss and supervised contrastive loss is investigated to jointly guide the recognizer to be flexible to Chinese character identification and tradeoff between intra- and inter-domain generalization. All the proposed strategies are plug-and-play, we demonstrate that the CAAC induces stable performance boosting to existing methods and projection-free brings superior cross-domain generalization than three projection heads. Extensive experiments have been conducted on multiple text recognition benchmarks including a self-collected ship license plate dataset to verify the recognition performance, generalization capability and transferability. The experimental results show that our method outperforms previous methods by 2.81% and 1.34% on Chinese Scene and Web text recognition datasets.

Runtime Environment

  • Inference requires PyTorch >= 1.7.1
  • For training and evaluation, install the dependencies
pip install -r requirements.txt

Datasets

  • Download Scene and Web lmdb dataset for training and evaluation.
  • For cross-domain generalization analysis, the commonly used English datasets can be downloaded in IIIT5K (IIIT), Street View Text (SVT), ICDAR 2015 (IC15 1811), Street View Text Perspective (SVTP), and WordArt.
  • The new collected SLPR and SLPR-P dataset consists of 6,922 artistic text images with 5,131 training images and 1,791 testing images. The dataset is available at Project.

Training and Evaluation

  • Training
python main.py --config=configs/xxx.yaml
  • Evaluation
python main.py --config=configs/xxx.yaml --phase test --image_only

Pretrained Models

Get the pretrained models from GoogleDrive. Performances of some pretrained models are summaried as follows, ACC / NED follow the percentage format and decimal format, respectively.

Model Scene Web log (Scene) log (Web) SLPR SLPR-P log (SLPR) log (SLPR-P)
ResNet-45-CAAC 65.11 / 0.815 63.40 / 0.801 Link Link 92.85 / 0.977 86.60 / 0.945 Link Link
ResNet-45-no-CAAC 64.07 / 0.809 62.86 / 0.799 Link Link 92.85 / 0.976 88.50 / 0.954 Link Link
Swin-S-CAAC 74.91 / 0.88. 64.74 / 0.813 Link Link - - - -
Swin-S-no-CAAC 72.93 / 0.873 62.64 / 0.800 Link Link - - - -

Evaluation logs

Please click the hyperlinks to see the detailed experimental results, following the format of ([gt] [pt])

  • ResNet-45-Scene-CAAC: Link
  • Swin-S-Scene-CAAC: Link
  • ResNet-45-Web-CAAC: Link
  • Swin-S-Web-CAAC: Link

Acknowledgements

This implementation is based on the repository ABINet, SupContrast, FudanVI /benchmarking-chinese-text-recognition, WordArt.

License

Follow the ABINet, this project is only free for academic research purposes, licensed under the 2-clause BSD License - see the LICENSE file for details.

caac's People

Contributors

ml-hdu avatar

Forkers

teamslpr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.