Comments (2)
Why do you consider the score 46 as incorrect? I understand you would like to find Real Madrid
as best match for real
, but according to the normalized Indel similarity it simply isn't.
To convert from real
to Barcelona
the following operations are required:
__r_e___al
Barcelona_
These are 7 operation. Now this is normalized in the following way: 1.0 - (dist / (len(s1) + len(s2)))
-> 0.46
.
On the other hand for Real Madrid
the following operations are required:
_real_______
R_eal Madrid
These are 9 operations. Normalized in the same way you get 0.4
.
Note, without python-Levenshtein install at all (clean, checked with colab with and without) it works like a charm.
No clue how that would work, since in this specific case the difflib fallback actually returns the same results:
>>> import difflib
>>> difflib.SequenceMatcher(None, 'real', 'Barcelona').ratio()
0.46153846153846156
>>> difflib.SequenceMatcher(None, 'real', 'Real Madrid').ratio()
0.4
from fuzzywuzzy.
after investigation you are indeed right, how do you offer to resolve such problem?
partial_ratio is not an offer as "Real Madrid" and "Real Saragosa" are the same for "Real", and their partial_ratio rate should be the same.
I wonder how to attack such problem
I went with fuzz.partial_token_set_ratio, as it should fit better I think
from fuzzywuzzy.
Related Issues (20)
- `process.dedupe()` gives IndexError: list index out of range because of bug in `process.extractWithoutOrder()`
- Missing functions after import. HOT 2
- What is the max possible value (upper bound) for fuzz.ratio? HOT 4
- Measuring Small changes over large documents HOT 1
- Wired behavior of partial_ratio HOT 1
- process.extract broken in fuzzywuzzy=0.13 HOT 3
- How to compare each and every row with every row in same column and delete matching rows with ratio > 90
- String fuzzy-matching From R to Python HOT 1
- Installing python-Levenshtein as suggested by the warnings gives different results. HOT 1
- utils.full_process executed when processor=None HOT 1
- Please rename this package to "FuzzyMatch" or similar. HOT 2
- Search for matches in an array of complex objects.
- Mark repository as archived
- token_set_ratio Degenerate Case
- 'list' object has no attribute 'items'
- How to decrease False positive matches? (process.extract / WRatio) HOT 3
- NameError: name 'ratio' is not defined HOT 3
- license issue HOT 3
- Questions about Copilot + Open Source Software Hierarchy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuzzywuzzy.