Comments (11)
@sbillburg 哈哈哈,谢谢你了。我觉得你加了stn效果并不比没加stn效果好的原因是stn加在了后面,如果字符行本身旋转角度不大,那么其实形变比较小,后面的特征图,特别是经过了maxpooling的特征图,的特征都是经过了提炼的,你再去stn仿射变换可能效果不如直接在输入的时候做stn效果来的妥当。
from crnn-with-stn.
The input size is set by you before starting the training, and it's fixed. Once you train a model in one input shape, than rest inputs should be in the same size, including training dataset and test dataset.
My method is, set a aspect ratio like width:height = 5:1, and only a few inputs are bigger than this ratio, I resize them to 5:1. The neural network will learn features from these resized images, and if a image is so long, it will contains some features that is unique and good for recognize.
For those images which are smaller than this ratio, I add vain block(a pure black RGB(0, 0, 0) image) on both side of the image. Or say, generate a pure black image in 5:1 aspect ratio, then put the input image whose aspect ratio is smaller than 5:1 into the center of the black image.
You can find my method in the CRNN-with-STN/Batch_Generator.py, line38~line44.
My statement maybe nor clear, if you still get any question, please tell me. My English is not very good, but I'd love to help you.
from crnn-with-stn.
@sbillburg thanks a lot! I have got your point.
from crnn-with-stn.
看了一下才发现您是国人,那我就直接再用中文给你说一遍了。
输入长宽比不一样,在resize以后确实会影响识别结果。
所以对我来说,我的思路就是尽量少的去resize。比如我设定一个宽高比5:1, 然后在数据集里生成训练batch的时候,把所有宽高比高于5:1的图片(说明图片很宽,横向很长)直接压缩为5:1,虽然会有图像上的损失或者说失真,但是如果宽高比很高,就说明单词很长,特征很明显,对于网络来说也不难识别了。
对于长宽比小于5;1的图片,说明其宽度较窄,我会在其两遍加上纯黑色的色块,生成一个5:1的图像,原始的图像长宽比并没有改变,而是靠额外的拼接使得图像达到了需要的比例。纯黑色的色块对于网络来说也会学习为‘什么都不输出’,所以不必担心识别错误的问题。
相关的实现方法在CRNN-with-STN/Batch_Generator.py, line38~line44 可以看到,如果您还有不明白的地方可以直接问我或发邮件。
from crnn-with-stn.
CRNN-with-STN/Batch_Generator.py里面的38行
if (img_size[1]/img_size[0]*1.0) < 6.4:
要加个括号
if (img_size[1]/(img_size[0]*1.0)) < 6.4:
76行类似。
from crnn-with-stn.
CRNN-with-STN/Batch_Generator.py里面的38行
if (img_size[1]/img_size[0]*1.0) < 6.4:
要加个括号
if (img_size[1]/(img_size[0]*1.0)) < 6.4:
76行类似。
Can you tell me the difference? It seems the same in Python3 with or without the parentheses
from crnn-with-stn.
Python3没有问题,Python2的时候会有区别,习惯上加个括号比较好
from crnn-with-stn.
想问下,STN加在batchnorm_7这个位置,有什么论文或者理论依据吗??
from crnn-with-stn.
想问下,STN加在batchnorm_7这个位置,有什么论文或者理论依据吗??
没有,STN整个部分相当于一个模块,我只是加在了CNN和RNN之间,你可以把这一模块放在网络的任意位置,说不定可以取得更好的效果。本项目只是对于CRNN的Keras实现,以及STN的一些尝试。
from crnn-with-stn.
from crnn-with-stn.
from crnn-with-stn.
Related Issues (20)
- Can you please post your trained weights? HOT 17
- alternative way of concatenating two LSTM cell HOT 6
- y_true (label) in CTC HOT 2
- Bi-LTSM's implementation HOT 1
- concept of y_pred[:,2:,:] tensor?
- I encountered a problem when predicting? HOT 5
- advice for training STN
- 你好,想问一下你的train数据格式是什么样的
- 你好,加载模型时遇到了一些问题 HOT 2
- 你好,加入新数据后遇到的问题 HOT 1
- 请问The channel dimension of the inputs should be defined. Found `None`.是为什么呢?
- Building OCR to detect and recognise HOT 4
- STN location
- why decoding starts from 3rd position? HOT 3
- I'm having a problem to Text
- Problem related to Text Detection HOT 1
- Learning rate HOT 2
- input and output name HOT 3
- error while using the saved model HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crnn-with-stn.