Giter VIP home page Giter VIP logo

sequence_digit_recognization's Introduction

Sequence_Digit_Recognization

Flowchart

Status

Overall Process

    • Auto-rotate
    • Detection cropping
    • Recognition
    • Connect all the section
    • Auxiliary methods (see Recognition Part in TODO)
  • 2021/02/25 Results:
    • Rotation Head : 98~100 %
    • YOLO map: 96.8 -> 100
    • Upper part: 92.77 -> 96.39 %
    • Lower part: 92.77 -> 98.80 %
    • Combination: 84.34 -> 90.36~92 %

Information

Note

  • data.csv: contains overall annotations
    • col: |ID|file_name|GT_1|GT_2|xmin|ymin|xmax|ymax|ymax_2|mode|
    • GT_1: ground truth of upper region
    • GT_2: ground truth of lower region
    • mode: 0/1/2: training/validation/testing
    (xmin,ymin)  --------------------
                  |   upper region   |
                  -------------------- (xmax,ymax_2)
                  |   lower region   |
                  --------------------
                                      (xmax,ymax)
    
  • MultiplicativeNoise, RandomBrightness, GaussNoise data augmentation hurt LPRnet accuracy
  • Rewrite the decoding of LPRnet helps a lotttttttt!

How to Train

  • rotation model: train_rotation.py
    • data in 'data/20201229/EXT/resize/new_image'
  • YOLO : yolov4/train.py
    • data in 'yolov4/VOCdevkit/VOC2007'
  • lprnet : train_lprnet.py
    • data in 'data/20201229/EXT/resize/new_image'
    • will use the data.csv to crop the images

How to Inference

Rotation Model

  • Single image: check rotation part in output_pipeline.py
  • Evaluate the performance: check test_rotation.py

YOLO

  • Qualitative Result (bbox on images):Check yolov4/predict.py
  • Quantitative Result (Map, F1 and diagrams... etc.):
    1. Modifing (change to your training result) the model path in yolov4/yolo.py
    2. Run yolov4/get_dr_txt.py
    3. Run yolov4/get_gt_txt.py
    4. Run yolov4/get_map.py
    5. Results will be saved in yolov4/results

LPRnet

  • Inference and image output is in test_lprnet.py
  • Drawing diagrams on validation set is in test_lprnet.py
  • Video output of the LPRnet result is in utils.py

TODO

Improvements

    • Use cutmix augmentation (postpone)
    • Shuffle the number sequence
    • Test-time augmentation
    • Use the pattern remove image

Other methods

Finished

    • Change image resolution
    • Enlarge a bit of annotation in x direction
    • Change to cosine scheduler
    • Use different decoder
    • Move to the upper region
    • Split the data into two sets
    1. One with 9 digits
    2. One with 3 digits + whatever
    • Rotation
    • the rotation model is in 'model.py'
    • the weight is in 'weights/rotation/acc_100.00_loss_0.747.pth'
    • Result:
      • val : 98.21% (56 images)
      • test: 100 % (55 images)
  • YOLO
      • modify the xml files
      • check the code in EDA.ipynb (in "new annotation for yolo")
      • Put in the YOLO

================================== Decrypted ==================================

MTCNN

    • combine two the regions
  • Add the src file to increase the features (-)
    • write the metric
  • Refine the trainings (-)
  • Use the DFT (V)
    • Apply to the data, check the gain (-)
  • kmean the bbox (-)

MTCNN Workflow

  1. Might need to rotate the images first. In preprocessing.py
  2. data_check
  3. annotation data add in, also col note added
  4. dataframe_creation
  5. Below will be save in yourDefineName_EXT_clear_2data_mode_resize.csv ID file_name GT_1 GT_2 xmin ymin xmax ymax ymax_2 mode 0 0 20201229080315_0EXT.bmp 389708 102MV 237 756 1267 1336 1046 0 1 1 20201229080446_0EXT.bmp 389708 207V 257 553 1318 1143 848 1 ...
  6. pnet_traindata for pnet data
  7. assemble_split assemble image list
  8. train Pnet in train_pnet.py
  9. onet_traindata
  10. assemble_split again
  11. train_onet.py
  12. MTCNN_evaluation.py can give qualitative result
  13. mtcnn_metric.py gives the quantitative resulrt

Rewrite the genrative data

  • 'MTCNN/data_preprocessing/gen_Pnet_train_data.py' (V)
  • 'MTCNN/data_preprocessing/gen_Onet_train_data.py' (V)
  • 'MTCNN/data_preprocessing/assemble_Pnet_imglist.py' (V)
  • 'MTCNN/data_preprocessing/assemble_Onet_imglist.py' (V)

Training

  • 'MTCNN/train/Train_Pnet.py' (V)
  • 'MTCNN/train/Train_Onet.py (V)

Evalution

  • mtcnn_visual.py (V)
  • mtcnn_metric.py (V)

sequence_digit_recognization's People

Contributors

ricosuaveguapo avatar

Watchers

 avatar

Forkers

facadedevil

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.