Giter VIP home page Giter VIP logo

Comments (4)

danibs avatar danibs commented on June 10, 2024 1
  • Your problem is with this specific kind of images, correct? Answer: YES
  • If you e.g. try to recognize text like this paragraph of my comment, it works? Answer: there is an issue with I (upper i) that it was readed as | (pipe). Same issue if I choose italian+english languages or only english.

I tried with v0.4.0.

I will try tesseract and in case of success I will post solution.

Thanks

from normcap.

danibs avatar danibs commented on June 10, 2024 1

@dynobo I asked for help in Google Groups and Nguyen answer to me.
Hope it help you to improve (if you want to) NormCap.
I faithfully reproduce the answer.

I think you may need to do some preprocessing for your image before send it to tesseract:

For example:
----------- image -----------

immagine

----------------------
----------- gray_image -----------

immagine

----------------------
----------- blur1 -----------

immagine

----------------------
----------- otsu -----------

immagine

----------------------
----------- erosion -----------

immagine

----------------------
----------- blur -----------

immagine

----------------------

SINGLE_LINE
6KDYT?79M"

AUTO
6KDYT?79M"

RAW_LINE
6KDYT79M

SPARSE_TEXT_OSD
6KDYT?79M"

SINGLE_WORD
6KDYT79M

As you can see, 2 PSM modes could give the correct results:

Here is the full code in python:

image_org = cv2.imread("unnamed.png")
height, width = image_org.shape[:2]

# calculate the amount of pixels to crop from the border
x_border = int(width * 0.1)
y_border = int(height * 0.1)

image = image_org[y_border:height-y_border, x_border:width-x_border]
cv2_show("image", image, 600)

gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2_show("gray_image", gray_image, 600)

blur1 = cv2.GaussianBlur(gray_image,(21,21),0)
cv2_show("blur1", blur1, 600)


# global thresholding
ret, otsu = cv2.threshold(blur1,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
cv2_show("otsu", otsu, 800)

kernel = np.ones((3,3),np.uint8)
erosion = cv2.erode(otsu,kernel,iterations = 1)
cv2_show("erosion", erosion, 800)

blur = cv2.GaussianBlur(erosion,(5,5),0)
cv2_show("blur", blur, 600)


results = get_text(255-blur)
for ret in results:
    print(ret[0][0])
    print(ret[1][0])

from normcap.

dynobo avatar dynobo commented on June 10, 2024 1

I'm glad you found a solution, and thanks a lot for taking your time to share it here 🙂

I probably won't include the sequence of filters in NormCap, as these seem very use-case specific and might hurt detection under different circumstances.

But your experiments regarding PSM modes are really interesting. In the past, I also stumbled upon the semi-good detection quality for characters which are not real words (like UUIDs, hashes or something), and always wanted to add a mode to NormCap that helps in such use-cases. There are also the tesseract-settings load_system_dawg and load_freq_dawg to disable the dictionary based heuristics, and I can image that those settings, combined with PSM setting RAW_LINE or SINGLE_WORD could be added as such a new mode...

I've create a new issue #412 to follow up on that idea, and close this issue here.

from normcap.

dynobo avatar dynobo commented on June 10, 2024

@danibs , thanks for reporting this issue and submitting a sample!

Just to be sure: Your problem is with this specific kind of images, correct? If you e.g. try to recognize text like this paragraph of my comment, it works?

I tried to detect your sample, and the result is indeed a complete mess. I locally tried a lot of different settings, downloaded the larger "best" .traineddata files and tested various pre-processing of the image (especially scaling it down, as the font seems to be made for very small text), but I wasn't able to improve the detection quality significantly. 🙁

I'm afraid, the problem is too difficult for NormCap with its general purpose settings. Especially the combination of an unusual "dotted" font with the random letters (no "real" words) makes it really hard to detect.

If you have a lot of those sequences to detect, you could try to run tesseract directly and try to tweak preprocessing and settings for your specific use case.

I'll leave this issue open for some weeks, maybe someone else has an idea...

from normcap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.