pombredanne / tlsh-2 Goto Github PK
View Code? Open in Web Editor NEWJava version of the TLSH similarity hashing algorithm
License: Other
Java version of the TLSH similarity hashing algorithm
License: Other
TLSH ======================================= TLSH is a fuzzy matching library. Given a binary object, it generates a hash value. The hash values can be used for similarity comparison. Similary objects have similar hash values. Similar hash values signal similar objects. The output hash is 35 bytes long. The first 3 bytes are used to capture the global similarity. The last 32 bytes are used to capture the local similarity. Here are the design choices. * To improve comparison accuracy, TLSH tracks counting bucket height distribution in quartiles. Bigger quartile difference results in higher difference score. * Use specially 6 trigrams to give equal representation of the bytes in the 5 byte sliding window which produces improved results. * Pearson hash is used to distribute the trigram counts to the counting buckets. * The global similarity score distances objects with significant size difference. Global similarity can be disabled. It also distances objects with different quartile distributions. * TLSH can be compiled to generate 70 or 134 characters hash strings. The longer version is more accurate. TLSH similarity is expressed as a difference score. * A score of 0 means the objects are almost identical. * For the 70 characters hash, a score of 200 or higher means the objects are very different. For the 134 characters hash, a score of 400 or higher means the objects are very different. LICENSE ======================================= TLSH was originally written by Trend Micro: https://github.com/trendmicro/tlsh Code converted to Java by TripleCheck: https://github.com/triplecheck Released under the Apache License (see LICENSE.txt)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.