Comments (3)
I replicated the .jpg
result using the provided file, but was unable to replicate the .png
result, so if you think this represents a distinct case please upload a sample image.
Regarding the .jpg
image, it sounds like the core issue here is that there's a disconnect between the JavaScript exception thrown (and handled by errorHandler
) and the messages printed to stderr
, with the latter being more informative. While I would agree this is not ideal, I do not think changing this would be feasible.
Of the messages listed, the only one that is created within this repo (and the only one that is a JavaScript exception) is Error attempting to read image.
. As can be seen in the code that throws this error (below), the only information we have to go on when creating this exception is an integer return code 1
indicating the image was not read correctly, so the error message is as informative as it could be given that information.
The other messages listed are printed to stderr
by dependencies, and are not created by the code in this repo. For example, the Invalid SOS parameters
message is printed by libjpeg
and the Error in pixReadStreamJpeg
errors are printed by leptonica
.
We cannot change the fact that these dependencies send these messages to stderr
, as we do not edit dependencies. Furthermore, we cannot send all stderr
text to errorHandler
, as errorHandler
is for JavaScript exceptions, and not all messages printed to stderr
will result in an exception (and vice versa). While many Tesseract.js exceptions are accompanied by meaningful messages printed to stderr
by a dependency, this cannot be assumed as a rule. With these limitations in mind, I think that throwing the Error attempting to read image
exception and having the stderr
messages print to console is a reasonable behavior.
from tesseract.js.
thanks for looking into this, btw I'm running this with a scheduler and 6 workers, 4GB RAM limit, after a while the container is killed by the kernel because it consumes too much RAM, did not investigate, but it looks like a leak. I'll see if I can create a repro for you when I have time!
from tesseract.js.
@didiercolens Okay, sounds good. I opened a new Git Issue (#900) to describe worker memory increasing due to large images, which is one cause of worker memory usage increasing over time. This may or may not be related to what you are experiencing. I am not currently aware of any leaks, however there have been memory leaks in past versions, so it is possible.
from tesseract.js.
Related Issues (20)
- Missing "languages" attributes on default export HOT 1
- Switch from CommonJS to ESM modules HOT 1
- setImage is re-run unnecessarily when rotateAuto is enabled
- `debug` output missing from types HOT 1
- Custom traindata do not work HOT 2
- JSDelivr CDN not accessible in China HOT 8
- Large images cause excessive memory usage
- Worker stuck on "loading language traineddata" HOT 3
- Updated types to infer output formats
- Inference of Chinese handwritten characters is bad HOT 3
- Add line size metrics (ascender, descender, size) to `line` objects in `blocks` output HOT 1
- Font attributes incorrect even when font is properly identified (`is_italic`, `is_serif`, etc.) HOT 1
- Focusing area HOT 1
- Multiple issues: Discussion
- Disable non-text output formats by default
- Tesseract - Running in Browser Console HOT 1
- Execution `worker.recognize` repeatedly causes "Out of Memory" error in JSFiddle HOT 5
- Error: Network error while fetching HOT 1
- how to use installed tessercat lib on windows for tesseract.js? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesseract.js.