Giter VIP home page Giter VIP logo

pytesser's Introduction

Hi there! 👋

I'm Robin - I am Software Security Researcher @ Quarkslab. You can find below some projects I am developing or contributing to.

⚡ Projects

Fuzzing / Symbolic Execution:

  • PASTIS: Collaborative fuzzing infrastructure. It leverages AFL++, Honggfuzz and TritonDSE (thus greybox and whitebox fuzzing) for program coverage and bug discovery.
  • TritonDSE: Symbolic Execution engine based on Triton built for automatic program coverage exploration. It has been design to encompass multiple program analysis use-cases.

Deobfuscation / Program Synthesis:

Diffing / Firmware Analysis: Portal

Other:

  • pydimacs, simple module to manipule CNF (Conjonctive Normal Form), graphs using Z3 Python API

📞 Contact

⬅️ just here

✍️ Technical Blog Posts

📈 Stats

Github stats Top Langs

More details 🔬 https://www.githubwrapped.io/RobinDavid

pytesser's People

Contributors

owlran avatar robindavid avatar zchen24 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytesser's Issues

Error in the readme file

In the last section of the readme, for giving pytesser a mat it says:

image = cv2.imwrite("myimage.jpg")
txt = pytesser.mat_to_string(image) 

I think you meant (imread instead of imwrite):

image = cv2.imread("myimage.jpg")
txt = pytesser.mat_to_string(image) 

FileNotFoundError due to line 61

If using image_to_string it will throw a FileNotFoundError.
The methods calls image_file_to_string and the TEMP_IMAGE file will get deleted there already.
Solution: Just delete line 61 from the pytesser.py

please tell me why my image not working?

below is attached image and output i got after using this repo

values

Assessment Year

INDIAN INCOME TAX RETURN ACKNOWLEDGEMENT
[mm the an: d m. Rmm of Income in Form mu (SAHAJ). mm. mm.
mm . rrR-s. "mum-7 hmmmod um vomm dumb-fly]

Below is attached Output

PA

5'
BAH’PNS?”

ALOK PATII.

Rand/Sum.“ Oflk:

Sums Indwidual

w XXXX XXXX 0939

ITO 2(2). ND Original or Revised ORIGINAL
E-filjng Acknowledgemenl Number 09103470170717 DuctDD/MM/YYYY) 17-07-2017
_11
mauwommwa-w-A am

  • 675370

Currant Ya: loss. if any V

"E RSONAI. NmRM ITIOIANII TIIF
DATI OI' LLNTROIK‘
TR ANSWISSIIH‘

Designation of A0(Ward/Circle)

II:
é _
g g - Ncluxpayabl: n “376
h‘
g g I: mm 5—-
E F b Totaluxmdinmpayablc s 6l876
E i
‘_ 1-
3 a
I.
a i
5

Tau] T1715 Pad (761* 7b‘ 7c # 711)

_n
E—n

_—

Tell me how could i improve it

how to recognize small char

after search pytesser,

>>> im = Image.open(r'C:\Users\martlee2\Documents\words\30.tif')
>>> text = image_to_string(im)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\pytesser\__init__.py", line 32, in image_to_string
    text = util.retrieve_text(scratch_text_name_root)
  File "C:\Python27\lib\pytesser\util.py", line 10, in retrieve_text
    inf = open(scratch_text_name_root + '.txt','r')
IOError: [Errno 2] No such file or directory: 'temp.txt'
import pytesser
txt = pytesser.image_to_string(r'C:\Users\martlee2\Documents\words\30.png',"en",pytesser.PSM_SINGLE_WORD)
>>> import pytesser
>>> txt = pytesser.image_to_string(r'C:\Users\martlee2\Documents\words\30.png',"en",pytesser.PSM_SINGLE_WORD)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\pytesser\__init__.py", line 78, in image_to_string
    process_request(file, TEMP_FILE, lang, psm) #Process command
  File "C:\Python27\lib\pytesser\__init__.py", line 64, in process_request
    raise TesseractException, ret[1]
pytesser.TesseractException
>>>
even if using original file, error temp file 
after edit to args = [tesseract_exe_name, input_filename, output_filename,'-psm','7']

#http://code.google.com/p/tesseract-ocr/
#pip install pytesseract
from PIL import Image
from pytesser import *

im = Image.open(r'C:\Users\martlee2\Documents\words\30.png')

im.save(r'C:\Users\martlee2\Documents\words\30.tif')
im = Image.open(r'C:\Users\martlee2\Documents\words\30.tif')

raise TesseractNotFound

I download this library , and import it to my program . But , when I use the function :
image_to_string()
it broke by a error :

Traceback (most recent call last):
  File "/home/jwc.py", line 48, in <module>
    getNum()
  File "/home/jwc.py", line 43, in getNum
    text = image_to_string(out)
  File "/home/pytesser.py", line 77, in image_to_string
    check_path() #Check if tesseract available in the path
  File "/home/pytesser.py", line 44, in check_path
    raise TesseractNotFound
pytesser.TesseractNotFound

i don't have idea . so i wanner what's wrong and how to fix it ?

pytesser.TesseractNotFound

I keep getting the error:

Traceback (most recent call last):
  File "C:\Users\phijon0412\Desktop\Ny mapp (2)\test.py", line 6, in <module>
    txt = pytesser.image_to_string(image)
  File "C:\Users\phijon0412\Desktop\Ny mapp (2)\pytesser.py", line 60, in image_to_string
    txt = image_file_to_string(TEMP_IMAGE, lang, psm)
  File "C:\Users\phijon0412\Desktop\Ny mapp (2)\pytesser.py", line 65, in image_file_to_string
    check_path() #Check if tesseract available in the path
  File "C:\Users\phijon0412\Desktop\Ny mapp (2)\pytesser.py", line 35, in check_path
    raise TesseractNotFound()
pytesser.TesseractNotFound

But I'm really unsure why.

pytesser.py

import sys
from subprocess import Popen, PIPE
import os
import tempfile
import cv2

PROG_NAME = 'tesseract.exe'
TEMP_IMAGE = tempfile.mktemp()+'.bmp'
TEMP_FILE = tempfile.mktemp()

#All the PSM arguments as a variable name (avoid having to know them)
PSM_OSD_ONLY = 0
PSM_SEG_AND_OSD = 1
PSM_SEG_ONLY = 2
PSM_AUTO = 3
PSM_SINGLE_COLUMN = 4
PSM_VERTICAL_ALIGN = 5
PSM_UNIFORM_BLOCK = 6
PSM_SINGLE_LINE = 7
PSM_SINGLE_WORD = 8
PSM_SINGLE_WORD_CIRCLE = 9
PSM_SINGLE_CHAR = 10

class TesseractException(Exception): #Raised when tesseract does not return 0
    pass

class TesseractNotFound(Exception): #When tesseract is not found in the path
    pass

def check_path(): #Check if tesseract is in the path raise TesseractNotFound otherwise
    for path in os.environ.get('PATH', '').split(':'):
        filepath = os.path.join(path, PROG_NAME)
        if os.path.exists(filepath) and not os.path.isdir(filepath):
            return True
    raise TesseractNotFound()

def process_request(input_file, output_file, lang=None, psm=None):
    args = [PROG_NAME, input_file, output_file] #Create the arguments
    if lang is not None:
        args.append("-l")
        args.append(lang)
    if psm is not None:
        args.append("-psm")
        args.append(str(psm))
    proc = Popen(args, stdout=PIPE, stderr=PIPE) #Open process
    ret = proc.communicate() #Launch it

    code = proc.returncode
    if code != 0:
        if code == 2:
            raise TesseractException("File not found")
        if code == -11:
            raise TesseractException("Language code invalid: "+ret[1])
        else:
            raise TesseractException(ret[1])

def image_to_string(im, lang=None, psm=None):
    grayscale_image = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
    cv2.imwrite(TEMP_IMAGE, grayscale_image)
    txt = image_file_to_string(TEMP_IMAGE, lang, psm)
    os.remove(TEMP_IMAGE)
    return txt

def image_file_to_string(file, lang=None, psm=None):
    check_path() #Check if tesseract available in the path
    grayscale_image = cv2.cvtColor(cv2.imread(file), cv2.COLOR_BGR2GRAY)
    cv2.imwrite(TEMP_IMAGE, grayscale_image)
    #process_request(file, TEMP_FILE, lang, psm)
    process_request(TEMP_IMAGE, TEMP_FILE, lang, psm)
    f = open(TEMP_FILE+".txt", "r") #Open back the file
    txt = f.read()
    f.close()
    os.remove(TEMP_FILE+".txt")
    os.remove(TEMP_IMAGE)
    return txt


if __name__ =='__main__':
    print(image_file_to_string(sys.argv[2], sys.argv[1], PSM_AUTO))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.