Giter VIP home page Giter VIP logo

dhash's Introduction

dHash.php

A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the features are similar.

This code was based on:

Installation

Just put dhash.php where you want it.

Supported systems

Most likely all (Windows/Linux/OSX, in 32bit and 64bit variants, are fine)

Usage

Calculating a perceptual hash for an image:

include 'dhash.php';

$hash = dhash('path/to/image.jpg');

The resulting hash is a 64 bit hexadecimal image fingerprint that can be stored in your database once calculated. The hamming distance is used to compare two image fingerprints for similarities. Low distance values will indicate that the images are similar or the same, high distance values indicate that the images are different. Use the following method to detect if images are similar or not:

$distance = dhash_distance($hash1, $hash2);

Equal images will not always have a distance of 0, so you will need to decide at which distance you will evaluate images as equal. In my tests distance of 5 means images are almost identical. But this will depend on the images and their number. For example; when comparing a small set of images, a lower maximum distances should be acceptable as the chances of false positives are quite low. If however you are comparing a large amount of images, 5 might already be too much. If you want to check if a given high resolution image is the same as some thumbnail you should try a distance of 0.

Demo

These images are similar:

Equals1 Equals2

Image 1 hash: e0f8fef6f2ecfcf4 (1110000011111000111111101111011011110010111011001111110011110100)
Image 2 hash: e0f8eed4d2ecfcf4 (1110000011111000111011101101010011010010111011001111110011110100)
Hamming distance: 4

These images are different:

Equals1 Equals2

Image 1 hash: 68484849535b7575 (0110100001001000010010000100100101010011010110110111010101110101)
Image 2 hash: e1c1e2a7bbaf6faf (1110000111000001111000101010011110111011101011110110111110101111)
Hamming distance: 33

Differences from ImageHash by Jenssegers

There are some similarites to the code written by Jenssegers (and this readme is heavily inspired). So if you need a library for PHP and are wondering which one to choose, here's a few differences:

  • dHash works on PHP 32 bit and 64 bit but always returns 64 bit hashes as hex strings while ImageHash return 64 bit hashes on 64 bit PHP and 32 bit hashes on 32 bit PHP
  • dHash is just 50 lines of code, single file
  • dHash implements only 1 hashing algorithm (difference hash) while ImageHash implements 2 more (perceptual hash and average hash)
  • dHash has a few optimizations to make it faster especially when reading JPEG files. In some cases it's over 10 times faster
  • while calculated distances are similar the actual hashes are different you can't compare hashes from dHash with those from ImageHash. This is due to a different implementation of the algorithm

dhash's People

Contributors

tom64b avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dhash's Issues

Hamming Distance in PostgreSQL

I've calculated hashes of over 400 000 images using your code. Thanks! I've used hamming_distance function from this https://github.com/palestamp/hamming_distance git. I've found that most ot the time it calculates distance as your dhash_distance, but sometimes it is very different. Is it possible to change you code, to works on postgresql to calculate distance from stored hashes? I probably could change hex hashes to bit hashes and do bit xor and popcount. Is it a better way?

Remove border noise

First, I'dd like to thank you for you extremely lightweight implementation of dhash!

I've submitted my first update via GitHub desktop, I added the function to remove the outer pixel, to remove border noise. This improves file comparisons for me.

I do have a small question, where do the numbers come from in:
$counts = array(0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4);

Mysql Bit Count

Hi

Firstly thanks for your work as this will help my project alot.

I am coing to be stroring hashes in a mysql database generated with your php code and there will be around 100,000.

I then want to be able to query the above database to check if a duplicate or simular image is in the above database and return matching rows with a hamnering distance of upto 10. Mysql Bit_count i thought would be the best way to so this but thought i would ask for your thoughts and advise as i thought bit_count was integra only and thought there may be a bteer way to do this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.