Giter VIP home page Giter VIP logo

Comments (5)

zdenop avatar zdenop commented on May 22, 2024

Please provide input images and example C++ code that demonstrate your problem.

from tesseract.

MK-3PP avatar MK-3PP commented on May 22, 2024

Input image

input

Code

#include "leptonica/allheaders.h"
#include "leptonica/pix_internal.h"
#include "tesseract/baseapi.h"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include <memory>

int main() {
    cv::Mat in_img = cv::imread("./input.png", cv::ImreadModes::IMREAD_GRAYSCALE);
    tesseract::TessBaseAPI tess;

    // Set tesseract parameters.
    tess.Init(".", "eng");
    tess.SetVariable("thresholding_method", "2"); // Tiled Sauvola
    tess.SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_BLOCK);
    tess.SetImage(in_img.data, in_img.cols, in_img.rows, in_img.channels(), static_cast<int>(in_img.step1()));

    // Output thresholded image.
    std::unique_ptr<Pix, void(*)(Pix*)> thrs_pix(tess.GetThresholdedImage(), [](Pix* val) { pixDestroy(&val); });
    cv::Mat out_img(cv::Size(thrs_pix->w, thrs_pix->h), CV_8UC1);
    for (uint32_t y = 0; y < thrs_pix->h; ++y) {
        for (uint32_t x = 0; x < thrs_pix->w; ++x) {
            l_uint32 val;
            if (0 == pixGetPixel(thrs_pix.get(), x, y, &val)) {
                out_img.at<unsigned char>(y, x) = val ? 255 : 0;
            }
        }
    }
    cv::cvtColor(out_img, out_img, cv::COLOR_GRAY2BGR); // prepare colored output image

    // Perform recognition.
    if (0 == tess.Recognize(nullptr))
        return 1;

    std::unique_ptr<tesseract::ResultIterator> res_iter(tess.GetIterator());

    if (nullptr == res_iter)
        return 2;

    // Extract image information. Generate output image for symbols and words.
    for (auto block_level : { tesseract::PageIteratorLevel::RIL_SYMBOL , tesseract::PageIteratorLevel::RIL_WORD }) {
        cv::Mat curr_img;
        cv::cvtColor(in_img, curr_img, cv::COLOR_GRAY2BGR); // prepare colored current image
        res_iter->Begin();

        do {
            // Only text blocks.
            if (PTIsTextType(res_iter->BlockType())) {
                cv::Point2i p1, p2;

                if (res_iter->BoundingBox(block_level, &p1.x, &p1.y, &p2.x, &p2.y)) {
                    // Draw bounding box.
                    cv::rectangle(curr_img, cv::Rect(p1, p2), cv::Scalar(0, 255, 0));

                    // Prapare text output.
                    const int font = cv::HersheyFonts::FONT_HERSHEY_PLAIN;
                    cv::Size text_size;

                    // Write confidence.
                    std::stringstream conf;
                    conf.precision(0);
                    conf << std::fixed << res_iter->Confidence(block_level) << '%';
                    text_size = cv::getTextSize(conf.str(), font, 1.0, 1, nullptr);
                    cv::putText(curr_img, conf.str(), cv::Point2i(p2.x - text_size.width - 2, p2.y - 2), font, 1.0, cv::Scalar(255, 100, 0));

                    // Write detected text (OpenCV does only have ASCII, but close enough).
                    std::unique_ptr<const char[]> raw_text(res_iter->GetUTF8Text(block_level));
                    if (raw_text != nullptr) {
                        text_size = cv::getTextSize(raw_text.get(), font, 1.0, 1, nullptr);
                        cv::putText(curr_img, raw_text.get(), cv::Point2i(p1.x + 2, p1.y + text_size.height + 2), font, 1, cv::Scalar(0, 0, 255));
                    }
                }
            }
        } while (res_iter->Next(block_level));

        // Stack current image on top of output image.
        cv::vconcat(curr_img, out_img, out_img);
    }

    cv::imwrite("./output.png", out_img);

    return 0;
}

Output

output

Remarks

The program above reproduces the error shown in the original issue post, but in a self-contained program. Hence coloring, fonts etc are deviating.
The output consists of three stacked augmented verisons of the input image:

  • Recognized words
  • Recognized symbols
  • Threshold image (for visual proof of Tesseract's working space)

Each word or symbol comes with it's bounding box (green), the recognized text (red) and the confidence (blue).

Dependencies

  • Tesseract
  • Leptonica
  • OpenCV

Setup

To execute the program, you need to put the input image into the executable's current directory as "input.png".
Also, you need the english language model from here in the same folder.
The output will be saved as "output.png" in the same folder.

Discussion

As you can see in the output image provided, the word "29M1" is recognized as "29M" with 0% confidence, albeit consisting of three characters '2', '9' and 'M' with above 90% confidence each. The 'M' is a misdetection of the actual printed "M1".

Noticeably, the next character might screw things up: the first '1' of "10210A" gets detected as 3 different Symbols, '1', '1' and 'T', where the glitched '1' and 'T' seem to share the exact same location. They got a higher bounding box than the neighboring characters but are only 1 px wide. It seems, those glitched symbols screw up the word "29M110210A", divide it in two parts and subsequently set their confidences to zero.
Detail shot from our customer application (I can zoom in there, but the boxes are drawn 0.5 pixels off - it is just a quick debug view):
grafik

And just for funsies, on the left side the word "paper" is recognized from random cracks. With 16% confidence, which is infinitely more than the 0% for second line of the actual printed text.

from tesseract.

zdenop avatar zdenop commented on May 22, 2024

I just manually preprocess image based on documentation:

input4175p

and the result is:

tesseract input4175p.png -
9200795018 -
20M110210A

=>

  • tesseract is not suitable for text detection (usually)
  • tesseract is OCR engine for good output there is a need to give a good input image.

from tesseract.

MK-3PP avatar MK-3PP commented on May 22, 2024

Thank you. As you guessed, text detection is what we aimed for.

Just to reemphasize, I was neither being thrown off by the random junk being detected outside the obvious text label or by the inserted blank between '1' and '1'.

What caught my attention was that

  • "M1" became "M"
  • "1" became "11" (and this was not a '1' being carried over the blank, it was a coincidentally occuring actual '1' that was detected with a very deformed bounding box)
  • The confidence dropped to 0 %
  • and the broken overlapping bounding boxes left of the second '1' glyph in the second line.
    And all that while the same image rotated 1 ° or 2 ° to the left or right yielded OK results.

I think this is dangerous: there is a continuous sweep of angles the image can be rotated for good results. and then, amodst those, there is a discontinuity in the results where obvious recognition artifacts screw up the result.
Even for non-optimal inputs the reults should not glitch out like that.

But I understand, there is machine learning behind the scenes and those models tend to have that kind of discontinuity issues.

from tesseract.

MK-3PP avatar MK-3PP commented on May 22, 2024

One last question:

Do you have any educated guess on why this is happening?

grafik

As far as I understand documentation, the image acquired by GetThresholdedImage() is the true image presented to the OCR. How come that there is a character, 'a', recognised in a pitch black area with not a single white pixel?

To me this looks as if the character recognition model has not been trained with empty images as part of the rejection class(es).

from tesseract.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.