Comments (4)
- Your problem is with this specific kind of images, correct? Answer: YES
- If you e.g. try to recognize text like this paragraph of my comment, it works? Answer: there is an issue with
I
(upperi
) that it was readed as|
(pipe). Same issue if I choose italian+english languages or only english.
I tried with v0.4.0.
I will try tesseract and in case of success I will post solution.
Thanks
from normcap.
@dynobo I asked for help in Google Groups and Nguyen answer to me.
Hope it help you to improve (if you want to) NormCap.
I faithfully reproduce the answer.
I think you may need to do some preprocessing for your image before send it to tesseract:
For example:
----------- image -----------
----------------------
----------- gray_image -----------
----------------------
----------- blur1 -----------
----------------------
----------- otsu -----------
----------------------
----------- erosion -----------
----------------------
----------- blur -----------
----------------------
SINGLE_LINE
6KDYT?79M"
AUTO
6KDYT?79M"
RAW_LINE
6KDYT79M
SPARSE_TEXT_OSD
6KDYT?79M"
SINGLE_WORD
6KDYT79M
As you can see, 2 PSM modes could give the correct results:
Here is the full code in python:
image_org = cv2.imread("unnamed.png")
height, width = image_org.shape[:2]
# calculate the amount of pixels to crop from the border
x_border = int(width * 0.1)
y_border = int(height * 0.1)
image = image_org[y_border:height-y_border, x_border:width-x_border]
cv2_show("image", image, 600)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2_show("gray_image", gray_image, 600)
blur1 = cv2.GaussianBlur(gray_image,(21,21),0)
cv2_show("blur1", blur1, 600)
# global thresholding
ret, otsu = cv2.threshold(blur1,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
cv2_show("otsu", otsu, 800)
kernel = np.ones((3,3),np.uint8)
erosion = cv2.erode(otsu,kernel,iterations = 1)
cv2_show("erosion", erosion, 800)
blur = cv2.GaussianBlur(erosion,(5,5),0)
cv2_show("blur", blur, 600)
results = get_text(255-blur)
for ret in results:
print(ret[0][0])
print(ret[1][0])
from normcap.
I'm glad you found a solution, and thanks a lot for taking your time to share it here 🙂
I probably won't include the sequence of filters in NormCap, as these seem very use-case specific and might hurt detection under different circumstances.
But your experiments regarding PSM modes are really interesting. In the past, I also stumbled upon the semi-good detection quality for characters which are not real words (like UUIDs, hashes or something), and always wanted to add a mode to NormCap that helps in such use-cases. There are also the tesseract
-settings load_system_dawg
and load_freq_dawg
to disable the dictionary based heuristics, and I can image that those settings, combined with PSM setting RAW_LINE or SINGLE_WORD could be added as such a new mode...
I've create a new issue #412 to follow up on that idea, and close this issue here.
from normcap.
@danibs , thanks for reporting this issue and submitting a sample!
Just to be sure: Your problem is with this specific kind of images, correct? If you e.g. try to recognize text like this paragraph of my comment, it works?
I tried to detect your sample, and the result is indeed a complete mess. I locally tried a lot of different settings, downloaded the larger "best" .traineddata
files and tested various pre-processing of the image (especially scaling it down, as the font seems to be made for very small text), but I wasn't able to improve the detection quality significantly. 🙁
I'm afraid, the problem is too difficult for NormCap with its general purpose settings. Especially the combination of an unusual "dotted" font with the random letters (no "real" words) makes it really hard to detect.
If you have a lot of those sequences to detect, you could try to run tesseract
directly and try to tweak preprocessing and settings for your specific use case.
I'll leave this issue open for some weeks, maybe someone else has an idea...
from normcap.
Related Issues (20)
- NormCap AppImage Failing at OCR HOT 8
- [Linux, AWM] Can't OCR From All Monitors HOT 4
- [Linux] Crash due to "Timeout when taking screenshot" HOT 5
- Cannot download other languages. AppImage version HOT 4
- [Windows] Crash immediately after launch (UnicodeDecodeError) HOT 5
- [Linux, FlatPak] provide aarch64/arm64 package HOT 3
- [Linux, FlatPak] Trying to update form Flathub. Corrupted checksum HOT 3
- Windows 10 - locked in screenshot mode HOT 5
- Failing to copy to clipboard with version 0.5.2 on X11 HOT 1
- [Linux, Wayland] Multi monitor issues - Help needed! HOT 2
- Permission denied for Screenshot via xdg-portal! HOT 6
- Crashes on MacOS (CRITICAL - normcap:152) HOT 7
- Sending result to another app HOT 2
- Enhancement Suggestion - Add Customizable Hotkeys (Keyboard Shortcuts) HOT 3
- Arabic instead of English HOT 2
- Flatpak notifications are shown as hidden
- Screenshots are saved in /home/Pictures with every OCR HOT 4
- Looking for PySide/PyQT expert to help with GUI code
- Auto language switch HOT 1
- The application crashes on startup. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from normcap.