Comments (5)
Please provide input images and example C++ code that demonstrate your problem.
from tesseract.
Input image
Code
#include "leptonica/allheaders.h"
#include "leptonica/pix_internal.h"
#include "tesseract/baseapi.h"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/imgproc.hpp"
#include <memory>
int main() {
cv::Mat in_img = cv::imread("./input.png", cv::ImreadModes::IMREAD_GRAYSCALE);
tesseract::TessBaseAPI tess;
// Set tesseract parameters.
tess.Init(".", "eng");
tess.SetVariable("thresholding_method", "2"); // Tiled Sauvola
tess.SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_BLOCK);
tess.SetImage(in_img.data, in_img.cols, in_img.rows, in_img.channels(), static_cast<int>(in_img.step1()));
// Output thresholded image.
std::unique_ptr<Pix, void(*)(Pix*)> thrs_pix(tess.GetThresholdedImage(), [](Pix* val) { pixDestroy(&val); });
cv::Mat out_img(cv::Size(thrs_pix->w, thrs_pix->h), CV_8UC1);
for (uint32_t y = 0; y < thrs_pix->h; ++y) {
for (uint32_t x = 0; x < thrs_pix->w; ++x) {
l_uint32 val;
if (0 == pixGetPixel(thrs_pix.get(), x, y, &val)) {
out_img.at<unsigned char>(y, x) = val ? 255 : 0;
}
}
}
cv::cvtColor(out_img, out_img, cv::COLOR_GRAY2BGR); // prepare colored output image
// Perform recognition.
if (0 == tess.Recognize(nullptr))
return 1;
std::unique_ptr<tesseract::ResultIterator> res_iter(tess.GetIterator());
if (nullptr == res_iter)
return 2;
// Extract image information. Generate output image for symbols and words.
for (auto block_level : { tesseract::PageIteratorLevel::RIL_SYMBOL , tesseract::PageIteratorLevel::RIL_WORD }) {
cv::Mat curr_img;
cv::cvtColor(in_img, curr_img, cv::COLOR_GRAY2BGR); // prepare colored current image
res_iter->Begin();
do {
// Only text blocks.
if (PTIsTextType(res_iter->BlockType())) {
cv::Point2i p1, p2;
if (res_iter->BoundingBox(block_level, &p1.x, &p1.y, &p2.x, &p2.y)) {
// Draw bounding box.
cv::rectangle(curr_img, cv::Rect(p1, p2), cv::Scalar(0, 255, 0));
// Prapare text output.
const int font = cv::HersheyFonts::FONT_HERSHEY_PLAIN;
cv::Size text_size;
// Write confidence.
std::stringstream conf;
conf.precision(0);
conf << std::fixed << res_iter->Confidence(block_level) << '%';
text_size = cv::getTextSize(conf.str(), font, 1.0, 1, nullptr);
cv::putText(curr_img, conf.str(), cv::Point2i(p2.x - text_size.width - 2, p2.y - 2), font, 1.0, cv::Scalar(255, 100, 0));
// Write detected text (OpenCV does only have ASCII, but close enough).
std::unique_ptr<const char[]> raw_text(res_iter->GetUTF8Text(block_level));
if (raw_text != nullptr) {
text_size = cv::getTextSize(raw_text.get(), font, 1.0, 1, nullptr);
cv::putText(curr_img, raw_text.get(), cv::Point2i(p1.x + 2, p1.y + text_size.height + 2), font, 1, cv::Scalar(0, 0, 255));
}
}
}
} while (res_iter->Next(block_level));
// Stack current image on top of output image.
cv::vconcat(curr_img, out_img, out_img);
}
cv::imwrite("./output.png", out_img);
return 0;
}
Output
Remarks
The program above reproduces the error shown in the original issue post, but in a self-contained program. Hence coloring, fonts etc are deviating.
The output consists of three stacked augmented verisons of the input image:
- Recognized words
- Recognized symbols
- Threshold image (for visual proof of Tesseract's working space)
Each word or symbol comes with it's bounding box (green), the recognized text (red) and the confidence (blue).
Dependencies
- Tesseract
- Leptonica
- OpenCV
Setup
To execute the program, you need to put the input image into the executable's current directory as "input.png".
Also, you need the english language model from here in the same folder.
The output will be saved as "output.png" in the same folder.
Discussion
As you can see in the output image provided, the word "29M1" is recognized as "29M" with 0% confidence, albeit consisting of three characters '2', '9' and 'M' with above 90% confidence each. The 'M' is a misdetection of the actual printed "M1".
Noticeably, the next character might screw things up: the first '1' of "10210A" gets detected as 3 different Symbols, '1', '1' and 'T', where the glitched '1' and 'T' seem to share the exact same location. They got a higher bounding box than the neighboring characters but are only 1 px wide. It seems, those glitched symbols screw up the word "29M110210A", divide it in two parts and subsequently set their confidences to zero.
Detail shot from our customer application (I can zoom in there, but the boxes are drawn 0.5 pixels off - it is just a quick debug view):
And just for funsies, on the left side the word "paper" is recognized from random cracks. With 16% confidence, which is infinitely more than the 0% for second line of the actual printed text.
from tesseract.
I just manually preprocess image based on documentation:
and the result is:
tesseract input4175p.png -
9200795018 -
20M110210A
=>
- tesseract is not suitable for text detection (usually)
- tesseract is OCR engine for good output there is a need to give a good input image.
from tesseract.
Thank you. As you guessed, text detection is what we aimed for.
Just to reemphasize, I was neither being thrown off by the random junk being detected outside the obvious text label or by the inserted blank between '1' and '1'.
What caught my attention was that
- "M1" became "M"
- "1" became "11" (and this was not a '1' being carried over the blank, it was a coincidentally occuring actual '1' that was detected with a very deformed bounding box)
- The confidence dropped to 0 %
- and the broken overlapping bounding boxes left of the second '1' glyph in the second line.
And all that while the same image rotated 1 ° or 2 ° to the left or right yielded OK results.
I think this is dangerous: there is a continuous sweep of angles the image can be rotated for good results. and then, amodst those, there is a discontinuity in the results where obvious recognition artifacts screw up the result.
Even for non-optimal inputs the reults should not glitch out like that.
But I understand, there is machine learning behind the scenes and those models tend to have that kind of discontinuity issues.
from tesseract.
One last question:
Do you have any educated guess on why this is happening?
As far as I understand documentation, the image acquired by GetThresholdedImage() is the true image presented to the OCR. How come that there is a character, 'a', recognised in a pitch black area with not a single white pixel?
To me this looks as if the character recognition model has not been trained with empty images as part of the rejection class(es).
from tesseract.
Related Issues (20)
- Tesseract fails to OCR text with very clear hexadecimal digits HOT 5
- Two little bugs for tesseract HOT 1
- multithreaded tesseract causes Linux crash HOT 5
- Linker Error for tesseract53.lib HOT 1
- Add redirect function HOT 1
- Add ICD Codes in english trained Data HOT 2
- Some CI jobs (GitHub Actions) are failing HOT 10
- uuencode-generated text is OCRed with many mistakes HOT 2
- Error! The command "tesseract" was not found. HOT 2
- Error! The command "tesseract" was not found
- unicharset_extractor segfault HOT 31
- Please add the API call to translate the language code to the full language name HOT 3
- Warning: LSTMTrainer deserialized an LSTMRecognizer! Error, data/eng/eng_num_vert.lstm is an integer (fast) model, cannot continue training HOT 7
- Add the NN for a 'random' ASCII language HOT 1
- "min_characters_to_try" parameter does not work HOT 2
- phonetic symbols and special characters HOT 1
- inform where we can find tesseract.exe HOT 2
- Native Crash in otsuthr.cpp HOT 2
- CI: vcpkg failure due to missing xz tarball HOT 4
- link error LNK1120 with text2image.exe
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesseract.