Comments (10)
Hi!
Please include the chinese words you were unable to print.
Thanks
from textrecognitiondatagenerator.
Thank you for you replying!
If I use the font that you provided ,I can generate good sample but when I change the font(eg.use other .ttf file ) the result maybe not so good.
from textrecognitiondatagenerator.
That would be because the font you used does not support all characters. I'll try to provide more fonts in the future.
In the meantime, I know https://github.com/JarveeLee/SynthText_Chinese_version/tree/master/data/fonts has a lot of choice but I cannot add them to this project over copyright infringements concerns.
from textrecognitiondatagenerator.
Thank you very much !!!
from textrecognitiondatagenerator.
@DLUTfangping , have you tried training using dataset generated by this? I'm using crnn to train, but the result is not good.
from textrecognitiondatagenerator.
@liangshuang1993 I can't say for Chinese, but I got decent result in English when using lowercase only (lowercase plus uppercase was a challenge). Also, while I don't know which implementation of CRNN you used, mine takes a long time to train (+50 hours on GTX 1080Ti) so it's very normal if the initial performance is very poor.
from textrecognitiondatagenerator.
Hi @Belval , thanks for you answer.
Maybe I generated training data wrongly.
First I generated one dataset with word length is 5, using Gaussian Noise as background. the performance is good on training dataset and validation dataset, bad on some real pictures.
Then I generated another dataset with word length is 8, using given pictures as background. And I trained crnn on the whole dataset(dataset1 and dataset2). I have trained 7000 epochs, but the training dataset accuracy is still 0.5590. Strange thing is when I did some test on training data, I found it can barely recognize the word. So is this means I must have same length dataset? Thanks a lot.
By the way, each dataset has 500,000 pictures, containing English, Chinese and number.(They may appear in the same picture).
from textrecognitiondatagenerator.
@liangshanghuang1993 That is indeed a rather ambitious idea to learn both English and Chinese. Most implementation I know only do one.
But yes, same word count is required. The original author even went as far as only recognizing single words instead of multi-word sentences.
from textrecognitiondatagenerator.
OK. Thanks a lot!!
from textrecognitiondatagenerator.
您好@Belval,谢谢您的回答。
也许我错误地生成了训练数据。
首先,我以高斯噪声为背景,生成了一个字长为5的数据集。在训练数据集和验证数据集上表现良好,在某些真实图片上表现不佳。
然后,使用给定图片作为背景,生成了另一个字长为8的数据集。我对整个数据集(数据集1和数据集2)进行了crnn训练。我已经训练了7000个时期,但是训练数据集的准确性仍然是0.5590。奇怪的是,当我对训练数据进行一些测试时,我发现它几乎无法识别该词。那么这是否意味着我必须具有相同长度的数据集?非常感谢。
顺便说一下,每个数据集都有500,000张图片,包含英语,中文和数字(它们可能出现在同一张图片中)。
你好,请问这个问题你解决了吗,我也遇到这个问题了,在真实样本上的效果很差
from textrecognitiondatagenerator.
Related Issues (20)
- requirement for Handwritten not installed HOT 4
- How to specify the font to be used? HOT 3
- Can I input with text instead a txt file? HOT 1
- DataGenerator for Cyrillic HOT 1
- Question: size of the text on the image.
- [Feature Request] support generating patterns
- tibeten generating(ligature)issue HOT 1
- Different font for each character or each word?
- Good way to cover a wide range of different fonts
- TypeError: Image data of dtype object cannot be converted to float For Handwritten Generation
- Curve Text Generation: For Scene Text Images
- Pillow Version HOT 3
- Missing modules for handwritten text generation. HOT 1
- ModuleNotFoundError: Arabic Reshaper
- For the dictionary passed in by "--dict", I want the dictionary to produce only one image in order of each line
- Option for other image formats HOT 1
- Write the labels to separate files
- sentence length of generated image.
- Freely control the number of characters
- question about generate multiline text HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from textrecognitiondatagenerator.