Comments (11)
Change two lines (258,259) in setup.py:
install_requires=[
"tensorflow",
"numpy",
"six~=1.15.0",
"datefinder==0.7.1",
"opencv-python==4.5.1.48",
"pdf2image==1.14.0",
"pdfplumber==0.5.27",
"PyPDF2==1.27.9",
"pytesseract==0.3.7",
"python-dateutil==2.8.1",
"PyYAML==5.4.1",
"simplejson==3.17.2",
"tqdm==4.59.0",
"google-api-python-client",
"google-cloud-vision"
])
from invoicenet.
Change two lines (258,259) in setup.py:
install_requires=[ "tensorflow", "numpy", "six~=1.15.0", "datefinder==0.7.1", "opencv-python==4.5.1.48", "pdf2image==1.14.0", "pdfplumber==0.5.27", "PyPDF2==1.27.9", "pytesseract==0.3.7", "python-dateutil==2.8.1", "PyYAML==5.4.1", "simplejson==3.17.2", "tqdm==4.59.0", "google-api-python-client", "google-cloud-vision" ])
Thank you very much, it worked!
from invoicenet.
Have you implement this repo successfully in windows
from invoicenet.
Yes. On Win 10 with miniconda.
from invoicenet.
Yes. On Win 10 with miniconda.
I ran into some other problems and kind of gave up. Any idea if it works for Windows 11?
from invoicenet.
Please tell us what problems or errors you have.
from invoicenet.
Please tell us what problems or errors you have.
Thanks a lot for the immediate response. Actually, I think I managed to make it work after a fresh "reinstall"
Just two questions:
Can I train using a regular CPU? If my invoices are in Greek Language will it work?
from invoicenet.
You can easily train the network using only the CPU. The tensorflow library will detect what it can run on.
As for the language, by default ORC tesseract has English enabled. The program must force the language to be Greek or English+Greek.
https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
File InvoiceNet\invoicenet\common\util.py, line 95.
data = pytesseract.image_to_data(img, output_type=Output.DICT)
data = pytesseract.image_to_data(img, lang='grc', output_type=Output.DICT)
from invoicenet.
You need to check what languages tesseract-ocr supports:
c:\Program Files\Tesseract-OCR\tesseract.exe --list-langs
from invoicenet.
You can easily train the network using only the CPU. The tensorflow library will detect what it can run on.
As for the language, by default ORC tesseract has English enabled. The program must force the language to be Greek or English+Greek. https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
File InvoiceNet\invoicenet\common\util.py, line 95.
data = pytesseract.image_to_data(img, output_type=Output.DICT)
data = pytesseract.image_to_data(img, lang='grc', output_type=Output.DICT)
Hi, I tried training using only CPU, it took a huge amount of time. Can I somehow use Google Colab's free GPUs for this? Do I have to make any modification to the code?
from invoicenet.
On a normal computer, 5,000 invoices are processed and trained in about a few hours. It's enough once. Then the trained network works quickly.
The only thing I see in the Google OCR code is the util.py file line 37:
API keys for google ocr
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="google_api_keys.json"
from invoicenet.
Related Issues (20)
- Index HOT 1
- New Thread in training
- Predict Accuracy
- How to extract the address information from the Invoice HOT 1
- IndexError: list index out of range HOT 2
- Multiple Fields training HOT 1
- Time Complexity of Training each field
- How to change Tesseractlanguage to french ?
- Multiple pages Training
- run application in docker container
- Project dead? HOT 3
- Data Preparation
- Training data for the model
- Training: ValueError: Cannot reshape a tensor
- Error installing on Ubuntu HOT 1
- pytesseract.pytesseract.TSVNotSupported: TSV output not supported. Tesseract >= 3.05 required HOT 1
- Can not predict on the Centos 7 platform
- Not able to prepare Train Data HOT 1
- illegal hardware instruction on m1 mac
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from invoicenet.