Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str' about pytesseract HOT 9 CLOSED

madmaze commented on July 30, 2024 9

Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str'

from pytesseract.

Comments (9)

z3ntu commented on July 30, 2024 4

In Arch Linux you have to install the package tesseract-data-<lang> eg tesseract-data-eng for english.

from pytesseract.

400yk commented on July 30, 2024 4

My system is OSX El Captan and I'm running Python 3.6.
I had the same problem but now with a little fix of the pytesseract.py, the problem is resolved. The error occurs because we didn't set the environment variable TESSDATA_PREFIX in the subprocess. Therefore, in the pytesseract.py file, under the function "run_tesseract", add

my_env = {"TESSDATA_PREFIX":"/opt/local/share"}

(for me the /opt/local/share is the parent folder that contains tessdata, plz change accordingly)

Then in the same function change the "proc=..." to

proc = subprocess.Popen(command, env=my_env,
        stderr=subprocess.PIPE)

Lastly, at the beginning of the file, change the definition of tesseract_cmd to:

tesseract_cmd = '/opt/local/bin/tesseract'

(namely to specify the absolute path, if you aren't sure, can go to shell and enter "which tesseract" to find out).

After making the above changes, you can run "python3 setup.py install" to install the pytesseract package.

Good luck!

from pytesseract.

GelaniNijraj commented on July 30, 2024 3

Installing the training data for particular language as shown here solves the problem.

from pytesseract.

doomzhou commented on July 30, 2024 1

same error in archliux

from pytesseract.

barik commented on July 30, 2024

Okay, it seems like the problem isn't related to Pillow at all. There actually is an error in tesseract. But on the Python end the error occurs because error_string is returning a byte-literal, and the geterrors call appears to have trouble with it.

status, error_string = run_tesseract(input_file_name,
                                             output_file_name_base,
                                             lang=lang,
                                             boxes=boxes,
                                             config=config)

The error itself was:

b'Tesseract Open Source OCR Engine v3.04.01 with Leptonica\r\nError opening data file \\msys64\\mingw64\\bin\\tessdata/eng.traineddata\r\nPlease make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.\r\nFailed loading language \'eng\'\r\nTesseract couldn\'t load any languages!\r\nCould not initialize tesseract.\r\n'

from pytesseract.

avindra commented on July 30, 2024

@GelaniNijraj : Thank you, that works perfectly.

from pytesseract.

johnfrancisgit commented on July 30, 2024

The error is an issue with pytesseract - not with the language pack per se. Essentially there is an encoding conflict in python3 when errors are returned from the console. I fixed it in pull request #48 which hasn't been merged as of yet.

from pytesseract.

zwl1619 commented on July 30, 2024

I have download the training data into C:\Program Files (x86)\Tesseract-OCR\tessdata,like this:

but the error still exists.



    Traceback (most recent call last):
      File "D:/test.py", line 11, in <module>
        print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
      File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string
        errors = get_errors(error_string)
      File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors
        error_lines = tuple(line for line in lines if line.find('Error') >= 0)
      File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr>
        error_lines = tuple(line for line in lines if line.find('Error') >= 0)
    TypeError: a bytes-like object is required, not 'str'

and,I download @john-francis96 's pytesseract.py file to replace the original pytesseract.py,and there is an error like this:

Traceback (most recent call last):
  File "D:/test.py", line 11, in <module>
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
  File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string
    raise TesseractError(status, errors)
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')

from pytesseract.

unnir commented on July 30, 2024

Ubuntu, Python3.5, same issue, I have the training data.

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 pytesseract.image_to_string(image)

/home/borisov/anaconda3/lib/python3.5/site-packages/pytesseract/pytesseract.py in image_to_string(image, lang, boxes, config)
161 config=config)
162 if status:
--> 163 errors = get_errors(error_string)
164 raise TesseractError(status, errors)
165 f = open(output_file_name)

/home/borisov/anaconda3/lib/python3.5/site-packages/pytesseract/pytesseract.py in get_errors(error_string)
109
110 lines = error_string.splitlines()
--> 111 error_lines = tuple(line for line in lines if line.find('Error') >= 0)
112 if len(error_lines) > 0:
113 return '\n'.join(error_lines)

/home/borisov/anaconda3/lib/python3.5/site-packages/pytesseract/pytesseract.py in (.0)
109
110 lines = error_string.splitlines()
--> 111 error_lines = tuple(line for line in lines if line.find('Error') >= 0)
112 if len(error_lines) > 0:
113 return '\n'.join(error_lines)

TypeError: a bytes-like object is required, not 'str'`

from pytesseract.

Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str' about pytesseract HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent