systran / fuzzy-match Goto Github PK
View Code? Open in Web Editor NEWLibrary and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.
License: MIT License
Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.
License: MIT License
Hi, this looks very promising for a project where we would run this on iOS/macOS wrapped in Swift, and also compiled to wasm for web.
Are there any production experiences with this to share, or demos / products which integrate it that we could try out? How does it compare with the popular "fzf" algorithm for instance, or other popular fuzzy match tools? Thank you.
This could help to get rid of most types of crashes.
Generate a large index with random entries of varied size:
Then fuzz the pattern, using different options.
A CI would be useful to check the compilation is working and tests are passing.
Let's try to use GitHub Actions.
We use FuzzyMatch-cli for bigger data, so indexing time counts.
Do you have a plan when this incremental add feature - mentioned in the TODO.md - will be implemented?
Hi,
We tried to build and use the utility but in the end this is all what happened:
FuzzyMatch-cli -c mycorpus
STEP Importing TM: mycorpus ELAPSE 0.005 TOTAL 0.005
STEP Sorting Index ELAPSE 0.117 TOTAL 0.123
STEP Dump: mycorpus.fmi ELAPSE 0.029 TOTAL 0.152
Segmentation fault
We have not faced any compilation problems that's why we don't really have any idea what could have gone wrong. This is our cmake (v. 3.18.4) short report:
-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Boost: /usr/include (found version "1.57.0") found components: serialization iostreams system regex
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Found ICU: /usr/lib64/libicuuc.so;/usr/lib64/libicui18n.so;/usr/lib64/libicudata.so (found version "50.2.0")
-- Found Boost: /usr/include (found version "1.57.0") found components: program_options
-- Found GTest: /share/local/src/googletest/build/lib/libgtest.a
-- Found Boost: /usr/include (found version "1.57.0") found components: filesystem system
-- Configuring done
-- Generating done
-- Build files have been written to: /share/local/src/fuzzy-match/build
And everything compiles fine.
Scanning dependencies of target FuzzyMatch
[ 7%] Building CXX object src/CMakeFiles/FuzzyMatch.dir/fuzzy_match.cc.o
[ 15%] Building CXX object src/CMakeFiles/FuzzyMatch.dir/ngram_matches.cc.o
[ 23%] Building CXX object src/CMakeFiles/FuzzyMatch.dir/suffix_array_index.cc.o
[ 30%] Building CXX object src/CMakeFiles/FuzzyMatch.dir/vocab_indexer.cc.o
[ 38%] Building CXX object src/CMakeFiles/FuzzyMatch.dir/suffix_array.cc.o
[ 46%] Building CXX object src/CMakeFiles/FuzzyMatch.dir/sentence.cc.o
[ 53%] Building CXX object src/CMakeFiles/FuzzyMatch.dir/fuzzy_matcher_binarization.cc.o
[ 61%] Building CXX object src/CMakeFiles/FuzzyMatch.dir/edit_distance.cc.o
[ 69%] Linking CXX shared library libFuzzyMatch.so
[ 69%] Built target FuzzyMatch
Scanning dependencies of target FuzzyMatch-cli
Scanning dependencies of target FuzzyMatch-test
[ 76%] Building CXX object cli/src/CMakeFiles/FuzzyMatch-cli.dir/FuzzyMatch-cli.cc.o
[ 84%] Building CXX object test/CMakeFiles/FuzzyMatch-test.dir/test.cc.o
[ 92%] Linking CXX executable FuzzyMatch-test
[ 92%] Built target FuzzyMatch-test
[100%] Linking CXX executable FuzzyMatch-cli
[100%] Built target FuzzyMatch-cli
Any help is greatly appreciated.
Explained at the end of this lecture https://www.youtube.com/watch?v=NinWEPPrkDQ
We must see if the (possibly) increase memory consumption is not prohibitive.
A regression was fixed in this commit
b3c9e17
But that regression was not caught by tests.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.