Giter VIP home page Giter VIP logo

receipt-parser-legacy's Introduction

A fuzzy receipt parser written in Python

This is a fuzzy receipt parser written in Python. It extracts information like the shop, the date, and the total from scanned receipts. It can work as a standalone script or as part of our IOS and Android application.

Dependencies

The receipt-parser-core library depend on imagemagick. Please install imagemagick with your favorite package manager.

Usage

To convert all images from the data/img/ folder to text using tesseract and parse the resulting text files, run

make run

Docker

A Dockerfile is available with all dependencies needed to run the program.
To build the image, run

make docker-build

To run it on the sample files, try

make docker-run

By default, running the image will execute the make run command. To use with your own images, run the following:

docker run -v <path_to_input_images>:/app/data/img mre0/receipt_parser

History

This project started as a hackathon idea. Read more about it on the trivago techblog. Also read the comments on HackerNews There's also a talk about the project. The library is now available at PyPi.

receipt-parser-legacy's People

Contributors

bram-atmire avatar ddzwiedziu avatar denvaar avatar dependabot-preview[bot] avatar dependabot[bot] avatar dielee avatar jonasmh avatar kiwita88 avatar lyro1 avatar michaelschem avatar monolidth avatar mre avatar rmad17 avatar sando1 avatar sirfoga avatar smallstepman avatar starbuck93 avatar tomgross avatar uvg avatar vdmitriyev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

receipt-parser-legacy's Issues

Fix date verification

In #10, @kiwita88 discovered, that there is no date verification right now.
This means that dates like 32.08.2016 will not throw errors. We should fix that by creating proper dates from the parsed string, similar to this snippet:

from dateutil.parser import parse

a = "2012-10-09T19:00:55Z"

b = parse(a)

print(b.weekday())
# 1 (equal to a Tuesday)

Docker image not working in Ubuntu 22.04

Issue: Docker image not working in Ubuntu 22.04

Expected behaviour: Running "make docker-run" completes without error and processes the sample images.
Actual behaviour: Running make docker-run spits out error "make": executable file not found in $PATH: unknown.

Environment: Ubuntu 22.04
Python: 3.7.9 (from docker official images)
Docker version 20.10.16, build aa7e414

Additional info: make docker-build completed without errors after ensuring dockerfile and pyproject.toml pointed to python version 3.7.9

Attaching screenshot of exact error:

image

Linking Receipt Parser with Open Food Facts

hi @mre
We're building an open worldwide database of food products. You give it a barcode, or a product name, and you get detailed information about the product.
It's a crowdsourced database fed by smartphone apps.

It would be cool to create a database of supermarket receipts, and be able to look them up on Open Food Facts (how many calories did I buy today, or even how much did the average calorie cost)

Unclear documentation

What does this project actually do?

I read the README, the blog post, and the hacker news comments and I still have no idea what this actually outputs or how I would even use it. The most relevant thing in the README is "To convert all images from the data/img/ folder to text using tesseract and parse the resulting text files, run ..."

Parse the resulting text files into what?

Is this only meant to be used from the command line or is this a python library? Then only reason I think it might be meant to be used as a Python library is because I looked at the tests.

Parsing date fails with unsanitized input

Using the included images:

❯ LANG=C make run
poetry run python parser/importer.py
Found the following images in /home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img
['IMG0007.jpg', 'IMG0003.jpg', 'IMG0001.jpg', 'IMG0004.jpg', 'IMG0008.jpg', 'IMG0006.jpg']
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0007.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0007.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0007.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0007.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0007.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0007.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Detected 233 diacritics
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0003.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0003.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0003.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0003.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0003.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0003.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Detected 8 diacritics
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0001.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0001.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0001.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0001.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0001.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0001.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0004.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0004.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0004.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0004.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0004.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0004.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Detected 62 diacritics
Image too small to scale!! (2x36 vs min width of 3)
Line cannot be recognized!!
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0008.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0008.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0008.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0008.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0008.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0008.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0006.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0006.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0006.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0006.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0006.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0006.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
poetry run
Text, Market, Date, Sum
2 real
 1.0 Real
data/txt/IMG0004.jpg.out.txt.txt Real None 9.31
rewe
 1.0 REWE
data/txt/IMG0001.jpg.out.txt.txt REWE 04.12.2014 0.99
dm dm-drogerie markt
 0.8 Drogerie
data/txt/IMG0008.jpg.out.txt.txt Drogerie 11.12.2014 5.85
penny h-milch
 1.0 Penny
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/__init__.py", line 6, in main
    stats = ocr_receipts(config, receipt_files)
  File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/parse.py", line 124, in ocr_receipts
    receipt = Receipt(config, receipt.readlines())
  File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/receipt.py", line 40, in __init__
    self.parse()
  File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/receipt.py", line 62, in parse
    self.date = self.parse_date()
  File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/receipt.py", line 94, in parse_date
    dateutil.parser.parse(date_str)
  File "/home/dzwiedziu/.cache/pypoetry/virtualenvs/parser-dlSOXmLn-py3.8/lib/python3.8/site-packages/dateutil/parser/_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/home/dzwiedziu/.cache/pypoetry/virtualenvs/parser-dlSOXmLn-py3.8/lib/python3.8/site-packages/dateutil/parser/_parser.py", line 649, in parse
    raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: 06.06. 2015
make: *** [Makefile:7: parse] Error 1

Notice the space in the date: "06.06. 2015".

Add some unit tests

Right now, we only have some "integration tests", where we take a receipt and we pass it to parser.py. It would be better to write some proper unit tests for the different functionality, like the date parser.

make run

I keep getting the following error when I try to execute make run:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/rsorage/workspaces/majoris/receipt-parser/parser/__main__.py", line 1, in <module>
    from parser import parser
  File "parser/parser.py", line 25, in <module>
    from parser.objectview import ObjectView
ImportError: No module named objectview
Makefile:7: recipe for target 'parse' failed
make: *** [parse] Error 1

Explaination about sum_format and date_format?

Hi
Thanks for your tutorial, indeed its nice heads up. I was reading config.yml and unable to understand how sum format and date_format is working. Can you explain a little bit, based on it I will add some more fields in the parser.

Thanks in advance
Sagar

make docker-run not working

Hi people,

today I'm evaluating your project, but when I run make docker-run it gives me an error:

docker run -v `pwd`/data/img:/app/data/img mre0/receipt-parser
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/app/parser/__init__.py", line 5, in main
    receipt_files = get_files_in_folder(config.receipts_path)
  File "/app/parser/parse.py", line 57, in get_files_in_folder
    files = os.listdir(os.path.join(BASE_PATH,folder))  # list content of folder
FileNotFoundError: [Errno 2] No such file or directory: '/app/data/txt'
make: *** [Makefile:26: docker-run] Error 1

Next, I tried with the suggested command:

$ docker run -v "$(pwd)/data/img:/usr/src/app/data/img" mre0/receipt-parser
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/app/parser/__init__.py", line 5, in main
    receipt_files = get_files_in_folder(config.receipts_path)
  File "/app/parser/parse.py", line 57, in get_files_in_folder
    files = os.listdir(os.path.join(BASE_PATH,folder))  # list content of folder
FileNotFoundError: [Errno 2] No such file or directory: '/app/data/txt'

then, with

$ docker run -v "$(pwd)/data/img:/usr/src/app/data/img" -v "$(pwd)/data/txt:/app/data/txt:rw" mre0/receipt-parser
Text, Market, Date, Sum
1587323551,0,0,0,0,

Then, I erased all images in data/img, and it gave more or less the same result, the first number changes, and that's it.

Am I doing something wrong?

Regards.

convert: no decode delegate for this image format `JPEG' @ error/constitute.c/ReadImage/508.

I receive this error when trying to run the program, in addition to many following errors.

Solutions that didn't work

brew unlink jpeg and then brew link jpeg
brew install jpeg; brew link jpeg

brew uninstall imagemagick jpeg libtiff jasper; brew install imagemagick

make run
pipenv run python parser/importer.py
/Usr/.local/share/virtualenvs/receipt-parser-master-rh6oEQUF/bin/python
('Found the following images in', '/Usr/Downloads/receipt-parser-master/data/img')
['IMG0008.jpg', 'IMG0003.jpg', 'IMG0001.jpg', 'IMG0004.jpg', 'IMG0007.jpg', 'IMG0006.jpg']
('Running', "convert -rotate ' 90' '/Usr/Downloads/receipt-parser-master/data/img/IMG0008.jpg' '/Usr/Downloads/receipt-parser-master/data/tmp/IMG0008.jpg'")
convert: no decode delegate for this image format JPEG' @ error/constitute.c/ReadImage/508. convert: no images defined /Usr/Downloads/receipt-parser-master/data/tmp/IMG0008.jpg' @ error/convert.c/ConvertImageCommand/3235.
('Running', "convert -auto-level -sharpen 0x4.0 -contrast '/Usr/Downloads/receipt-parser-master/data/tmp/IMG0008.jpg' '/Usr/Downloads/receipt-parser-master/data/tmp/IMG0008.jpg'")
convert: unable to open image '/Usr/Downloads/receipt-parser-master/data/tmp/IMG0008.jpg': No such file or directory @ error/blob.c/OpenBlob/2695.
convert: no decode delegate for this image format JPG' @ error/constitute.c/ReadImage/508. convert: no images defined /Usr/Downloads/receipt-parser-master/data/tmp/IMG0008.jpg' @ error/convert.c/ConvertImageCommand/3235.
('Running', "tesseract -l deu '/Usr/Downloads/receipt-parser-master/data/tmp/IMG0008.jpg' '/Usr/Downloads/receipt-parser-master/data/txt/IMG0008.jpg.out.txt'")
sh: tesseract: command not found
('Running', "convert -rotate ' 90' '/Usr/Downloads/receipt-parser-master/data/img/IMG0003.jpg' '/Usr/Downloads/receipt-parser-master/data/tmp/IMG0003.jpg'")
convert: no decode delegate for this image format JPEG' @ error/constitute.c/ReadImage/508. convert: no images defined /Usr/Downloads/receipt-parser-master/data/tmp/IMG0003.jpg' @ error/convert.c/ConvertImageCommand/3235.
('Running', "convert -auto-level -sharpen 0x4.0 -contrast '/Usr/Downloads/receipt-parser-master/data/tmp/IMG0003.jpg' '/Usr/Downloads/receipt-parser-master/data/tmp/IMG0003.jpg'")

OCR support for single articles

First of all, the script works very well, thanks for that.
Is it possible to read single articles per ocr from the receipt ?

This would be very nice!

Vertical receipt OCR outputs garbage

Vertical receipts, are rotated and otuputs garbage OCR.

Image:
noExif

OCR:

—
oO
x
4 J SL a; Jg —J
DOT nom CD oo Or GICO oOO QD+rOoO ADn2O n RDDLDLDDNnD m RANnrnO oo 18)
BODOOOORGHSNAOIOODOH-DNOGOHD Dr OO DR ND OSJON NM DL OODOVOODODDPDDIDIODPDNGCSOO DON DXDNN DM O1
DOSIOOONDDSIOSOSJIIIINDOOITOOGIIDRB IS OGISIND OD TIAISIOOOOO OO m Hmm OO AG OIH HH MD MIND OO N O0 (Od
BDH-OCONDONDOD DO SIGIHHADODOOSNS- OIOCO OO DAGSI- ALDODDIDODTIODHTDODOOOGIGIGIO OT DD OD OD OO SS SSH O-m OO co
= OOo TI III DODOOTITSNSITITS-TOr-DDONDONSNIDOSIDLGIDODAROODTONT-DBODDOIOIGIDOCGIITDODDDOOON CID O1 O
€ TOT UV oO <TD... zz vo u> UVOoy  TUUDODU{VTUOVDUUODUIVUDUDU do UV OO DVD U P>;> > UUUD ZZ T2Z3ET
=& 9 CC — 7 Doo9 - OO 2 — << Do — OO —- < u () —- —- 09 — 00T) u u u Do — —- 7 TODD CO cc CC - -— - a -+--. 09
>t rDOo00%o08 - D35 oo - TS C To 5 JO I Too VO Oo CE DIDI PCOOO Oo Oo oT oo oo OL cc. cLO CE TC OO HH OO D ED ER
+5 ı ıı OO -O9 I 2—-00—-0 010-1 31 _I) JS5=s ı 1 1 ı  Lı DD ı ı __ID2IS ID 1 11V VD ı 1 I DDIDdDd A
> VS ZU u un OD AT — NIOD oo DD — ı Aw Ävy SIuvzZzu3 dd -— m << o vo oO DB. DI DD nn Tv DD zz OO VO > JH A or ©
xu_n ° cc: 0709 adornd-er I DDr DIO TO TA DO OHIO DCCOoO PP OT D OD — — ——— DO AT CC CHI IT ID oıo ” oo —
ca o N O00ZRFI 9 TO AO DD DoVOODOO —-O TI VO TO —- Oo cc rt 3 3 o- NN 3 — —- DD — DIVE rTITITO DD > bs yon ttnmdr-
<D a rt oda än a aoe esnn ı — 3; rt0 O0 © 02 X Zu 9) ONNCTOTOTDOTDOONDTODDI OO N Hd —_- —-
SE DV —-- Io TD ı DZ TUI TO dm I Dom IT D + I ON DOO—OD D-O0—929 000 oo Do DT.) = Or ntnro DD Dx x —- I
—- ge TITII-- oO -— oo TI) cc oo DD od —-D 9 rHtQO co ı DTIT7’WVOo ı TI DR RTITI RAR TID zCı ı O9 rHrtrtrt TI I — —ı
<D DO vv. AI DD I DI Oo u CZ 5 c [em A —- 0 c 7’ DpoD pp I ı VO CC DD U DI — — og TIororoIT, ı  ı 1ı 433m m =OoO
nm Tao 1) Jj -- oo x ch ı1 DI +07 vV or > O0 —-—- O7VOvVyvVv C vv DDD THIOTıboıvy om ovyovpgogogc mh —-
<D x — >. DD -5-...:. 7905 -00O0CcC > I a re 3 5 _—o0— —- co oe» OT AOODOTDT TC — 2x Dıxı — oO
=> MO > E 00 OX—-0O0d0 5 — OD +» © D N “D ON 07 an ze ı @ —NNI Do (o DD —.2Z2Z “oO
— Io O2 — J1O3 >= nn IT I 73 > —— D< _ —: + — ho DD. I —- _— . NNN -—- —.- Om Orm-
>t D-%0 oT OO ı © _- om + ua Q — oO © . — —— 2 no oO ge) {on =——r v0. %.
€ OD —- >D- Or De Tr &D oc —D- Ed “& DO 0 — —— vo © OOMNCOIND O0]
= DV A Ar on © . = 23 — 0 N — aA EC DD re = = a  e B oo 01
- Orts. os O-HOrs Hm DOHr HH Or OH Hs Or Or OO OO HH OO Hmm Dr OO HH. OOrHmdHe0o C
SORTITGTONSOOHDOCNOOROODSFENIGSODOOSITCOLAONSJTDTDOGTIDOALII OTTO ONSINNSIOIINSNSNSNS2OOGHEOImW u
DISOOOITOSOOSGOTASDTD TAI OCOODOCOOOOCOODSOCOODVOCOODODOOGOCOCOOOCODODOOOCOOCOSOSOGCOOGOOVDTOOCOOOO DO AO
VU>>r>>>>>- >> >> oT py,>->>>UVU>>>r>->>>r>r;,>>r>>->,D>->r>r>r>->->->,>,>r,>r,>r,>r,>-r>r,>>r>,>r>,>>->-UDOD ur


Any ideas ?

Running the code on Windows (and tried Mac)

Sorry, if I am missing something. However, I did run the following commands to receive some errors.
On Windows:
Opened the project using VSCode ran pipenv install and finally ran parser to get the following
PS C:\Users\lobvi02\Downloads\OCR\receipt-parser> & python c:/Users//Downloads/OCR/receipt-parser/parser.py
Text, Market, Date, Sum
1539958502,0,0,0,0,

On Mac:
As instructed ran make run and got
**make: *No rule to make target 'Importer', needed by 'run'. Stop.

I am new to Python and so have my questions.

Add contiuous integration

Would be nice to use Travis CI or Drone.io to run some tests on every change.
This would help a lot with refactoring the tool in the future without breaking it.
Accepting PRs for this. Just comment here if you like to work on it. I can provide support if you like.

UnicodeEncodeError while reading image data

Env:

Windows 10, Python 3.9.

Issue:

Got the UnicodeEncodeError while processing example tesseract image data with make run.
UnicodeEncodeError: 'charmap' codec can't encode character '\xfc' in position 57: character maps to <undefined>

Solution:

Specify utf-8 encoding while reading the data.
out = open(output_file, "w", encoding='utf-8')

How to Run This code

When I Run Command: make run
Then it shows me a message : make: Nothing to be done for 'run'.

Refactor code

Although the receipt-parser works, it's really not too well structured. Might be better for testability (see #4) to clean up the parsers and make the code a bit more object-oriented overall.
Would be happy for anyone who wants to tackle this. Providing mentorship if needed.

make docker-run

Hi,

When I run "make docker-run", I get the following error:
"Removing tmp folder
pipenv run python -m parser
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/src/app/parser/main.py", line 11, in
main()
File "/usr/src/app/parser/main.py", line 6, in main
stats = parser.ocr_receipts(config, receipt_files)
File "/usr/src/app/parser/parser.py", line 125, in ocr_receipts
receipt = Receipt(config, receipt.readlines())
File "/usr/src/app/parser/receipt.py", line 40, in init
self.parse()
File "/usr/src/app/parser/receipt.py", line 62, in parse
self.date = self.parse_date()
File "/usr/src/app/parser/receipt.py", line 94, in parse_date
dateutil.parser.parse(date_str)
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.8/site-packages/dateutil/parser/_parser.py", line 1374, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/root/.local/share/virtualenvs/app-lp47FrbD/lib/python3.8/site-packages/dateutil/parser/_parser.py", line 649, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: 06.06. 2015
Text, Market, Date, Sum
rewe
1.0 REWE
data/txt/IMG0001.jpg.out.txt.txt REWE 04.12.2014 0.99
kaiser's tengelmanrı gmbh
0.8 Kaiser's
data/txt/IMG0006.jpg.out.txt.txt Kaiser's 31.08.2015 15.95
dm dm-drogerie markt
0.8 Drogerie
data/txt/IMG0008.jpg.out.txt.txt Drogerie 11.12.2014 5.85
penny h-milch
1.0 Penny
make: *** [Makefile:7: parse] Error 1
Makefile:22: recipe for target 'docker-run' failed
make: *** [docker-run] Error 2
"
Any ideas what the problem could be?

Add pipenv

I'm a big fan of pipenv. It would be nice to use it for this project.
Accepting PRs for this. If you need support, just add a comment here. 😃

No image found

Hey,

When I try running OCR it does not found any images in my specific folder... Why ?

image

Did I miss something ?

Thanks for your help.

Support for PDF receipts

Not sure if this use case is shared among others: I use Scanbot to scan my receipts as multi-page PDFs. Would be great if this tool could work on these pdfs.

Scanbot does a sort of OCR itself, but it doesn't seem to be that good, in the sense that it adds too much noise: a receipt contains so much text, and I'm only interested in the articles, price per article, to see price evolution across multiple weeks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.