❯ LANG=C make run
poetry run python parser/importer.py
Found the following images in /home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img
['IMG0007.jpg', 'IMG0003.jpg', 'IMG0001.jpg', 'IMG0004.jpg', 'IMG0008.jpg', 'IMG0006.jpg']
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0007.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0007.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0007.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0007.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0007.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0007.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Detected 233 diacritics
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0003.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0003.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0003.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0003.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0003.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0003.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Detected 8 diacritics
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0001.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0001.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0001.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0001.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0001.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0001.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0004.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0004.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0004.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0004.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0004.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0004.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Detected 62 diacritics
Image too small to scale!! (2x36 vs min width of 3)
Line cannot be recognized!!
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0008.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0008.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0008.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0008.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0008.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0008.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Running convert -rotate ' 90' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/img/IMG0006.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0006.jpg'
Running convert -auto-level -sharpen 0x4.0 -contrast '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0006.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0006.jpg'
Running tesseract -l deu '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/tmp/IMG0006.jpg' '/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/data/txt/IMG0006.jpg.out.txt'
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
poetry run
Text, Market, Date, Sum
2 real
1.0 Real
data/txt/IMG0004.jpg.out.txt.txt Real None 9.31
rewe
1.0 REWE
data/txt/IMG0001.jpg.out.txt.txt REWE 04.12.2014 0.99
dm dm-drogerie markt
0.8 Drogerie
data/txt/IMG0008.jpg.out.txt.txt Drogerie 11.12.2014 5.85
penny h-milch
1.0 Penny
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/__init__.py", line 6, in main
stats = ocr_receipts(config, receipt_files)
File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/parse.py", line 124, in ocr_receipts
receipt = Receipt(config, receipt.readlines())
File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/receipt.py", line 40, in __init__
self.parse()
File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/receipt.py", line 62, in parse
self.date = self.parse_date()
File "/home/dzwiedziu/Softwarez/gitbuckets/receipt-parser/parser/receipt.py", line 94, in parse_date
dateutil.parser.parse(date_str)
File "/home/dzwiedziu/.cache/pypoetry/virtualenvs/parser-dlSOXmLn-py3.8/lib/python3.8/site-packages/dateutil/parser/_parser.py", line 1374, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/home/dzwiedziu/.cache/pypoetry/virtualenvs/parser-dlSOXmLn-py3.8/lib/python3.8/site-packages/dateutil/parser/_parser.py", line 649, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: 06.06. 2015
make: *** [Makefile:7: parse] Error 1
Notice the space in the date: "06.06. 2015".