Giter VIP home page Giter VIP logo

Comments (11)

GREGOR2000 avatar GREGOR2000 commented on June 12, 2024 1

Change two lines (258,259) in setup.py:

install_requires=[
"tensorflow",
"numpy",

"six~=1.15.0",
"datefinder==0.7.1",
"opencv-python==4.5.1.48",
"pdf2image==1.14.0",
"pdfplumber==0.5.27",
"PyPDF2==1.27.9",
"pytesseract==0.3.7",
"python-dateutil==2.8.1",
"PyYAML==5.4.1",
"simplejson==3.17.2",
"tqdm==4.59.0",
"google-api-python-client",
"google-cloud-vision"
])

from invoicenet.

PanosHatz avatar PanosHatz commented on June 12, 2024

Change two lines (258,259) in setup.py:

install_requires=[ "tensorflow", "numpy", "six~=1.15.0", "datefinder==0.7.1", "opencv-python==4.5.1.48", "pdf2image==1.14.0", "pdfplumber==0.5.27", "PyPDF2==1.27.9", "pytesseract==0.3.7", "python-dateutil==2.8.1", "PyYAML==5.4.1", "simplejson==3.17.2", "tqdm==4.59.0", "google-api-python-client", "google-cloud-vision" ])

Thank you very much, it worked!

from invoicenet.

eshsu avatar eshsu commented on June 12, 2024

Have you implement this repo successfully in windows

from invoicenet.

GREGOR2000 avatar GREGOR2000 commented on June 12, 2024

Yes. On Win 10 with miniconda.

from invoicenet.

PanosHatz avatar PanosHatz commented on June 12, 2024

Yes. On Win 10 with miniconda.

I ran into some other problems and kind of gave up. Any idea if it works for Windows 11?

from invoicenet.

GREGOR2000 avatar GREGOR2000 commented on June 12, 2024

Please tell us what problems or errors you have.

from invoicenet.

PanosHatz avatar PanosHatz commented on June 12, 2024

Please tell us what problems or errors you have.

Thanks a lot for the immediate response. Actually, I think I managed to make it work after a fresh "reinstall"
Just two questions:
Can I train using a regular CPU? If my invoices are in Greek Language will it work?

from invoicenet.

GREGOR2000 avatar GREGOR2000 commented on June 12, 2024

You can easily train the network using only the CPU. The tensorflow library will detect what it can run on.

As for the language, by default ORC tesseract has English enabled. The program must force the language to be Greek or English+Greek.
https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html

File InvoiceNet\invoicenet\common\util.py, line 95.

data = pytesseract.image_to_data(img, output_type=Output.DICT)

data = pytesseract.image_to_data(img, lang='grc', output_type=Output.DICT)

from invoicenet.

GREGOR2000 avatar GREGOR2000 commented on June 12, 2024

You need to check what languages ​​tesseract-ocr supports:

c:\Program Files\Tesseract-OCR\tesseract.exe --list-langs

from invoicenet.

PanosHatz avatar PanosHatz commented on June 12, 2024

You can easily train the network using only the CPU. The tensorflow library will detect what it can run on.

As for the language, by default ORC tesseract has English enabled. The program must force the language to be Greek or English+Greek. https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html

File InvoiceNet\invoicenet\common\util.py, line 95.

data = pytesseract.image_to_data(img, output_type=Output.DICT)

data = pytesseract.image_to_data(img, lang='grc', output_type=Output.DICT)

Hi, I tried training using only CPU, it took a huge amount of time. Can I somehow use Google Colab's free GPUs for this? Do I have to make any modification to the code?

from invoicenet.

GREGOR2000 avatar GREGOR2000 commented on June 12, 2024

On a normal computer, 5,000 invoices are processed and trained in about a few hours. It's enough once. Then the trained network works quickly.

The only thing I see in the Google OCR code is the util.py file line 37:

API keys for google ocr

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="google_api_keys.json"

from invoicenet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.