Giter VIP home page Giter VIP logo

Comments (11)

LiangHao92 avatar LiangHao92 commented on June 7, 2024 2

@sbillburg 哈哈哈,谢谢你了。我觉得你加了stn效果并不比没加stn效果好的原因是stn加在了后面,如果字符行本身旋转角度不大,那么其实形变比较小,后面的特征图,特别是经过了maxpooling的特征图,的特征都是经过了提炼的,你再去stn仿射变换可能效果不如直接在输入的时候做stn效果来的妥当。

from crnn-with-stn.

sbillburg avatar sbillburg commented on June 7, 2024

The input size is set by you before starting the training, and it's fixed. Once you train a model in one input shape, than rest inputs should be in the same size, including training dataset and test dataset.

My method is, set a aspect ratio like width:height = 5:1, and only a few inputs are bigger than this ratio, I resize them to 5:1. The neural network will learn features from these resized images, and if a image is so long, it will contains some features that is unique and good for recognize.
For those images which are smaller than this ratio, I add vain block(a pure black RGB(0, 0, 0) image) on both side of the image. Or say, generate a pure black image in 5:1 aspect ratio, then put the input image whose aspect ratio is smaller than 5:1 into the center of the black image.
You can find my method in the CRNN-with-STN/Batch_Generator.py, line38~line44.

My statement maybe nor clear, if you still get any question, please tell me. My English is not very good, but I'd love to help you.

from crnn-with-stn.

LiangHao92 avatar LiangHao92 commented on June 7, 2024

@sbillburg thanks a lot! I have got your point.

from crnn-with-stn.

sbillburg avatar sbillburg commented on June 7, 2024

看了一下才发现您是国人,那我就直接再用中文给你说一遍了。
输入长宽比不一样,在resize以后确实会影响识别结果。

所以对我来说,我的思路就是尽量少的去resize。比如我设定一个宽高比5:1, 然后在数据集里生成训练batch的时候,把所有宽高比高于5:1的图片(说明图片很宽,横向很长)直接压缩为5:1,虽然会有图像上的损失或者说失真,但是如果宽高比很高,就说明单词很长,特征很明显,对于网络来说也不难识别了。

对于长宽比小于5;1的图片,说明其宽度较窄,我会在其两遍加上纯黑色的色块,生成一个5:1的图像,原始的图像长宽比并没有改变,而是靠额外的拼接使得图像达到了需要的比例。纯黑色的色块对于网络来说也会学习为‘什么都不输出’,所以不必担心识别错误的问题。

相关的实现方法在CRNN-with-STN/Batch_Generator.py, line38~line44 可以看到,如果您还有不明白的地方可以直接问我或发邮件。

from crnn-with-stn.

qwzhong1988 avatar qwzhong1988 commented on June 7, 2024

CRNN-with-STN/Batch_Generator.py里面的38行
if (img_size[1]/img_size[0]*1.0) < 6.4:
要加个括号
if (img_size[1]/(img_size[0]*1.0)) < 6.4:
76行类似。

from crnn-with-stn.

sbillburg avatar sbillburg commented on June 7, 2024

CRNN-with-STN/Batch_Generator.py里面的38行
if (img_size[1]/img_size[0]*1.0) < 6.4:
要加个括号
if (img_size[1]/(img_size[0]*1.0)) < 6.4:
76行类似。

Can you tell me the difference? It seems the same in Python3 with or without the parentheses

from crnn-with-stn.

qwzhong1988 avatar qwzhong1988 commented on June 7, 2024

Python3没有问题,Python2的时候会有区别,习惯上加个括号比较好

from crnn-with-stn.

qwzhong1988 avatar qwzhong1988 commented on June 7, 2024

想问下,STN加在batchnorm_7这个位置,有什么论文或者理论依据吗??

from crnn-with-stn.

sbillburg avatar sbillburg commented on June 7, 2024

想问下,STN加在batchnorm_7这个位置,有什么论文或者理论依据吗??

没有,STN整个部分相当于一个模块,我只是加在了CNN和RNN之间,你可以把这一模块放在网络的任意位置,说不定可以取得更好的效果。本项目只是对于CRNN的Keras实现,以及STN的一些尝试。

from crnn-with-stn.

jingwanli6666 avatar jingwanli6666 commented on June 7, 2024

在调用loc_net函数时报错
image,请问如何解决,谢谢!

from crnn-with-stn.

sbillburg avatar sbillburg commented on June 7, 2024

from crnn-with-stn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.