Comments (3)
Hey @rayrr,
thanks for your input. Yes, switching to Levenshtein would be worth a try. Whether we use a library or not doesn't matter to me. Also GPLv2 is fine in my book.
So if you like and you find the time, please go ahead and whip up a PR for this. 👍
from receipt-parser-legacy.
difflib
is what is currently in use for this buggy feature. The diffing algorithm used by difflib
is called Ratcliff-Obershelp and seems to be generic in regards to data type (binary data, strings, etc.). There are better algorithms for determining fuzzy string similarity such as Levenshtein. I believe switching algorithms is the best solution here.
What do you think, @mre? I could be convinced to write up a PR if no other contributor can. If you're comfortable adding a dependency, it might make sense to lean on https://github.com/seatgeek/fuzzywuzzy for this too.
from receipt-parser-legacy.
Update: the existing packages I mentioned are GPLv2 licensed which may not be desired so perhaps just a direct implementation of the Levenshtein algorithm could be added for this feature. Plenty of inspiration is available.
from receipt-parser-legacy.
Related Issues (20)
- Docker image not working in Ubuntu 22.04 HOT 4
- How to Run This code HOT 1
- No image found HOT 1
- Unclear documentation HOT 9
- convert: no decode delegate for this image format `JPEG' @ error/constitute.c/ReadImage/508. HOT 3
- Consider moving to Jazzband (a team of Python maintainers) HOT 1
- Support for PDF receipts HOT 1
- Switch to ImageMagick auto-orient instead of a hard coded 90 degree rotation HOT 4
- make docker-run HOT 2
- make run HOT 1
- make docker-run not working HOT 9
- Parsing date fails with unsanitized input HOT 3
- Dependabot couldn't authenticate with https://pypi.python.org/simple/
- [APPLICATION] Receipt parser front end HOT 3
- OCR support for single articles HOT 21
- Update README.md with Windows Commands HOT 2
- Vertical receipt OCR outputs garbage HOT 4
- add API endpoints HOT 1
- UnicodeEncodeError while reading image data HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from receipt-parser-legacy.