Comments (9)
In Arch Linux you have to install the package tesseract-data-<lang>
eg tesseract-data-eng
for english.
from pytesseract.
My system is OSX El Captan and I'm running Python 3.6.
I had the same problem but now with a little fix of the pytesseract.py, the problem is resolved. The error occurs because we didn't set the environment variable TESSDATA_PREFIX in the subprocess. Therefore, in the pytesseract.py file, under the function "run_tesseract", add
my_env = {"TESSDATA_PREFIX":"/opt/local/share"}
(for me the /opt/local/share is the parent folder that contains tessdata, plz change accordingly)
Then in the same function change the "proc=..." to
proc = subprocess.Popen(command, env=my_env,
stderr=subprocess.PIPE)
Lastly, at the beginning of the file, change the definition of tesseract_cmd to:
tesseract_cmd = '/opt/local/bin/tesseract'
(namely to specify the absolute path, if you aren't sure, can go to shell and enter "which tesseract" to find out).
After making the above changes, you can run "python3 setup.py install" to install the pytesseract package.
Good luck!
from pytesseract.
Installing the training data for particular language as shown here solves the problem.
from pytesseract.
same error in archliux
from pytesseract.
Okay, it seems like the problem isn't related to Pillow at all. There actually is an error in tesseract.
But on the Python end the error occurs because error_string
is returning a byte-literal, and the geterrors
call appears to have trouble with it.
status, error_string = run_tesseract(input_file_name,
output_file_name_base,
lang=lang,
boxes=boxes,
config=config)
The error itself was:
b'Tesseract Open Source OCR Engine v3.04.01 with Leptonica\r\nError opening data file \\msys64\\mingw64\\bin\\tessdata/eng.traineddata\r\nPlease make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.\r\nFailed loading language \'eng\'\r\nTesseract couldn\'t load any languages!\r\nCould not initialize tesseract.\r\n'
from pytesseract.
@GelaniNijraj : Thank you, that works perfectly.
from pytesseract.
The error is an issue with pytesseract - not with the language pack per se. Essentially there is an encoding conflict in python3 when errors are returned from the console. I fixed it in pull request #48 which hasn't been merged as of yet.
from pytesseract.
I have download the training data into C:\Program Files (x86)\Tesseract-OCR\tessdata
,like this:
but the error still exists.
Traceback (most recent call last):
File "D:/test.py", line 11, in <module>
print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string
errors = get_errors(error_string)
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors
error_lines = tuple(line for line in lines if line.find('Error') >= 0)
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr>
error_lines = tuple(line for line in lines if line.find('Error') >= 0)
TypeError: a bytes-like object is required, not 'str'
and,I download @john-francis96 's pytesseract.py
file to replace the original pytesseract.py
,and there is an error like this:
Traceback (most recent call last):
File "D:/test.py", line 11, in <module>
print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string
raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')
from pytesseract.
Ubuntu, Python3.5, same issue, I have the training data.
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 pytesseract.image_to_string(image)
/home/borisov/anaconda3/lib/python3.5/site-packages/pytesseract/pytesseract.py in image_to_string(image, lang, boxes, config)
161 config=config)
162 if status:
--> 163 errors = get_errors(error_string)
164 raise TesseractError(status, errors)
165 f = open(output_file_name)
/home/borisov/anaconda3/lib/python3.5/site-packages/pytesseract/pytesseract.py in get_errors(error_string)
109
110 lines = error_string.splitlines()
--> 111 error_lines = tuple(line for line in lines if line.find('Error') >= 0)
112 if len(error_lines) > 0:
113 return '\n'.join(error_lines)
/home/borisov/anaconda3/lib/python3.5/site-packages/pytesseract/pytesseract.py in (.0)
109
110 lines = error_string.splitlines()
--> 111 error_lines = tuple(line for line in lines if line.find('Error') >= 0)
112 if len(error_lines) > 0:
113 return '\n'.join(error_lines)
TypeError: a bytes-like object is required, not 'str'`
from pytesseract.
Related Issues (20)
- pytesseract's openMP runtime conflicts with CLIP HOT 6
- Python 3.11.4 changes the output of image_to_data HOT 4
- Can't pass citation mark character into tessedit_char_whitelist HOT 2
- Update PyPI package to pytesseract v0.3.13 HOT 4
- [Feature Request] Wrapper around training HOT 3
- image_to_data default output type is string HOT 2
- Deprecation warning raised in python 3.12 HOT 1
- Unsupported image object when using numpy.ndarray image HOT 2
- I think you need to improve character recognition by using and implementing ChatGPT in OCR HOT 3
- FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\guess\\AppData\\Local\\Temp\\tess_gtrqc8za.hocr' HOT 3
- Tesseract OCR Language Data Configuration Error in Python Environment HOT 4
- PyTesseract cannot read my number HOT 2
- Questions about Copilot + Open Source Software Hierarchy HOT 1
- NPM can't find Tesseract OCR even though it's installed and I can't update git HOT 1
- Solving environment: killed HOT 2
- Rpmlint error in Fedora
- greek langage letter HOT 4
- Image to osd,
- pytesseract.image_to_osd() error HOT 3
- get_languages HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytesseract.