Current Behavior Upon recognition with language model <a href="htt

Input image <a target="_blank" rel="noopener noreferrer" href="htt

Characters assigned to wrong RIL_WORD block, 0 % confidence. about tesseract HOT 5 OPEN

MK-3PP commented on May 22, 2024

Characters assigned to wrong RIL_WORD block, 0 % confidence.

from tesseract.

Comments (5)

zdenop commented on May 22, 2024

Please provide input images and example C++ code that demonstrate your problem.

from tesseract.

MK-3PP commented on May 22, 2024

Input image

Code

#include "leptonica/allheaders.h"
#include "leptonica/pix_internal.h"
#include "tesseract/baseapi.h"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include <memory>

int main() {
    cv::Mat in_img = cv::imread("./input.png", cv::ImreadModes::IMREAD_GRAYSCALE);
    tesseract::TessBaseAPI tess;

    // Set tesseract parameters.
    tess.Init(".", "eng");
    tess.SetVariable("thresholding_method", "2"); // Tiled Sauvola
    tess.SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_BLOCK);
    tess.SetImage(in_img.data, in_img.cols, in_img.rows, in_img.channels(), static_cast<int>(in_img.step1()));

    // Output thresholded image.
    std::unique_ptr<Pix, void(*)(Pix*)> thrs_pix(tess.GetThresholdedImage(), [](Pix* val) { pixDestroy(&val); });
    cv::Mat out_img(cv::Size(thrs_pix->w, thrs_pix->h), CV_8UC1);
    for (uint32_t y = 0; y < thrs_pix->h; ++y) {
        for (uint32_t x = 0; x < thrs_pix->w; ++x) {
            l_uint32 val;
            if (0 == pixGetPixel(thrs_pix.get(), x, y, &val)) {
                out_img.at<unsigned char>(y, x) = val ? 255 : 0;
            }
        }
    }
    cv::cvtColor(out_img, out_img, cv::COLOR_GRAY2BGR); // prepare colored output image

    // Perform recognition.
    if (0 == tess.Recognize(nullptr))
        return 1;

    std::unique_ptr<tesseract::ResultIterator> res_iter(tess.GetIterator());

    if (nullptr == res_iter)
        return 2;

    // Extract image information. Generate output image for symbols and words.
    for (auto block_level : { tesseract::PageIteratorLevel::RIL_SYMBOL , tesseract::PageIteratorLevel::RIL_WORD }) {
        cv::Mat curr_img;
        cv::cvtColor(in_img, curr_img, cv::COLOR_GRAY2BGR); // prepare colored current image
        res_iter->Begin();

        do {
            // Only text blocks.
            if (PTIsTextType(res_iter->BlockType())) {
                cv::Point2i p1, p2;

                if (res_iter->BoundingBox(block_level, &p1.x, &p1.y, &p2.x, &p2.y)) {
                    // Draw bounding box.
                    cv::rectangle(curr_img, cv::Rect(p1, p2), cv::Scalar(0, 255, 0));

                    // Prapare text output.
                    const int font = cv::HersheyFonts::FONT_HERSHEY_PLAIN;
                    cv::Size text_size;

                    // Write confidence.
                    std::stringstream conf;
                    conf.precision(0);
                    conf << std::fixed << res_iter->Confidence(block_level) << '%';
                    text_size = cv::getTextSize(conf.str(), font, 1.0, 1, nullptr);
                    cv::putText(curr_img, conf.str(), cv::Point2i(p2.x - text_size.width - 2, p2.y - 2), font, 1.0, cv::Scalar(255, 100, 0));

                    // Write detected text (OpenCV does only have ASCII, but close enough).
                    std::unique_ptr<const char[]> raw_text(res_iter->GetUTF8Text(block_level));
                    if (raw_text != nullptr) {
                        text_size = cv::getTextSize(raw_text.get(), font, 1.0, 1, nullptr);
                        cv::putText(curr_img, raw_text.get(), cv::Point2i(p1.x + 2, p1.y + text_size.height + 2), font, 1, cv::Scalar(0, 0, 255));
                    }
                }
            }
        } while (res_iter->Next(block_level));

        // Stack current image on top of output image.
        cv::vconcat(curr_img, out_img, out_img);
    }

    cv::imwrite("./output.png", out_img);

    return 0;
}

Output

Remarks

The program above reproduces the error shown in the original issue post, but in a self-contained program. Hence coloring, fonts etc are deviating.
The output consists of three stacked augmented verisons of the input image:

Recognized words
Recognized symbols
Threshold image (for visual proof of Tesseract's working space)

Each word or symbol comes with it's bounding box (green), the recognized text (red) and the confidence (blue).

Dependencies

Tesseract
Leptonica
OpenCV

Setup

To execute the program, you need to put the input image into the executable's current directory as "input.png".
Also, you need the english language model from here in the same folder.
The output will be saved as "output.png" in the same folder.

Discussion

As you can see in the output image provided, the word "29M1" is recognized as "29M" with 0% confidence, albeit consisting of three characters '2', '9' and 'M' with above 90% confidence each. The 'M' is a misdetection of the actual printed "M1".

Noticeably, the next character might screw things up: the first '1' of "10210A" gets detected as 3 different Symbols, '1', '1' and 'T', where the glitched '1' and 'T' seem to share the exact same location. They got a higher bounding box than the neighboring characters but are only 1 px wide. It seems, those glitched symbols screw up the word "29M110210A", divide it in two parts and subsequently set their confidences to zero.
Detail shot from our customer application (I can zoom in there, but the boxes are drawn 0.5 pixels off - it is just a quick debug view):

And just for funsies, on the left side the word "paper" is recognized from random cracks. With 16% confidence, which is infinitely more than the 0% for second line of the actual printed text.

from tesseract.

zdenop commented on May 22, 2024

I just manually preprocess image based on documentation:

and the result is:

tesseract input4175p.png -
9200795018 -
20M110210A

tesseract is not suitable for text detection (usually)
tesseract is OCR engine for good output there is a need to give a good input image.

from tesseract.

MK-3PP commented on May 22, 2024

Thank you. As you guessed, text detection is what we aimed for.

Just to reemphasize, I was neither being thrown off by the random junk being detected outside the obvious text label or by the inserted blank between '1' and '1'.

What caught my attention was that

"M1" became "M"
"1" became "11" (and this was not a '1' being carried over the blank, it was a coincidentally occuring actual '1' that was detected with a very deformed bounding box)
The confidence dropped to 0 %
and the broken overlapping bounding boxes left of the second '1' glyph in the second line.
And all that while the same image rotated 1 ° or 2 ° to the left or right yielded OK results.

I think this is dangerous: there is a continuous sweep of angles the image can be rotated for good results. and then, amodst those, there is a discontinuity in the results where obvious recognition artifacts screw up the result.
Even for non-optimal inputs the reults should not glitch out like that.

But I understand, there is machine learning behind the scenes and those models tend to have that kind of discontinuity issues.

from tesseract.

MK-3PP commented on May 22, 2024

One last question:

Do you have any educated guess on why this is happening?

As far as I understand documentation, the image acquired by GetThresholdedImage() is the true image presented to the OCR. How come that there is a character, 'a', recognised in a pitch black area with not a single white pixel?

To me this looks as if the character recognition model has not been trained with empty images as part of the rejection class(es).

from tesseract.

Characters assigned to wrong RIL_WORD block, 0 % confidence. about tesseract HOT 5 OPEN

Comments (5)

Input image

Code

Output

Remarks

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent