Giter VIP home page Giter VIP logo

blockhash's Introduction

Tests

blockhash

This is a perceptual image hash calculation tool based on algorithm descibed in Block Mean Value Based Image Perceptual Hashing by Bian Yang, Fan Gu and Xiamu Niu.

Build and install

Blockhash requires libmagickwand. On Debian/Ubuntu it can be installed using the following command:

sudo apt-get install libmagickwand-dev

On Fedora and friends:

sudo dnf install ImageMagick-devel

To build blockhash cd to the source directory and type:

./waf configure
./waf

The program binary will land in ./build. To install it to /usr/local/bin/ type:

./waf install

Usage

Run blockhash [list of images] for calculating hashes.

Run blockhash --help for the list of options.

License

Copyright 2014 Commons Machinery http://commonsmachinery.se/

Distributed under an MIT license, please see LICENSE in the top dir.

Contact: [email protected]

blockhash's People

Contributors

artfwo avatar bobobo1618 avatar dsoprea avatar equal-l2 avatar eredotpkfr avatar jonasob avatar petli avatar ploober avatar wjt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blockhash's Issues

animated gif

hello,

during an automatic image processing, I came across this animated GIF image that completely freezes my computer when blockhash tries to calculate its checksum.

the problem does not occur with the Python version.

regards, lacsaP.

pdf-ownership-variant1-2x.gif.gz (this image ~900ko is provided by Adobe for Acrobat Reader)

Algorithm for video hashing

We're looking to extend the blockhash software to cover videos in addition to the images it already processes. There are existing alternatives for this, such as the pHash DCT hashes which were originally conceived for videos but then adapted for images. The first rough draft of what this could look like in blockhash is done in #12

That version is dirt simple in that it only picks four key frames from the video (one at the beginning, one at the end, and two at defined points in between), does a standard 64-bit blockhash of those videos and the concatenates them together to a 256 bit hash. I haven't done the cross-compare yet to evaluate accuracy or false positives/negatives, but it seems to work, and any alternative to this will surely only be an improvement. So that will be the "baseline" comparison :)

Problems encountered so far with videos include:

  • (Overall) there are much more variations to video than to images, and there are literally thousands of variations to how a video can be encoded
  • Neither frame nor length counting is a precise science: calculating the frames based on video length and fps will fail for variable-fps movies, as well as that it doesn't account for situations where you have duplicate frames.
  • Not all frames are key frames (I frames). If you just seek to a position, you may end up on a P or B frame, which depend on a previous I frame to make a complete image. In theory, you may end up with very long streams of P frames, and H.264 takes this to even further extremes.
  • Stepping through each frame (decoding each frame, with reference to previous frames) is the most reliable in order to get accurate frame information, but takes time to do.

That said, it's not obnoxiously slow to step through videos, but of course, it depends heavily on the length of the video itself and the processing power available. But we're talking about seconds, rather than milliseconds.

Add version flag argument

It would be nice if the executable could tell us what version it is.

Something like blockhash --version or blockhash -v and get output like 0.3.1 would be perfect.

waf raise StopIteration with python 3.7

hello,

since python3 update (3.6.6-1 -> 3.7.0-3), blockhash no longer build :

python3.7 waf configure

Traceback (most recent call last):
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Node.py", line 282, in ant_iter
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Scripting.py", line 97, in waf_entry_point
    run_commands()
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Scripting.py", line 149, in run_commands
    parse_options()
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Scripting.py", line 127, in parse_options
    Context.create_context('options').execute()
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Options.py", line 134, in execute
    super(OptionsContext,self).execute()
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Context.py", line 84, in execute
    self.recurse([os.path.dirname(g_module.root_path)])
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Context.py", line 125, in recurse
    user_function(self)
  File "/tmp/blockhash-0.3.1/wscript", line 8, in options
    opt.load('compiler_c')
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Context.py", line 81, in load
    fun(self)
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Tools/compiler_c.py", line 31, in options
    opt.load_special_tools('c_*.py',ban=['c_dumbpreproc.py'])
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Context.py", line 261, in load_special_tools
    lst=self.root.find_node(waf_dir).find_node('waflib/extras').ant_glob(var)
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Node.py", line 331, in ant_glob
    ret=[x for x in self.ant_iter(accept=accept,pats=[to_pat(incl),to_pat(excl)],maxdepth=kw.get('maxdepth',25),dir=dir,src=src,remove=kw.get('remove',True))]
  File "/tmp/blockhash-0.3.1/.waf3-1.7.16-9ca17eb492c97b689870b4ff9db75880/waflib/Node.py", line 331, in <listcomp>
    ret=[x for x in self.ant_iter(accept=accept,pats=[to_pat(incl),to_pat(excl)],maxdepth=kw.get('maxdepth',25),dir=dir,src=src,remove=kw.get('remove',True))]
RuntimeError: generator raised StopIteration

no issue with pyhon2.7

python2.7 waf configure

Setting top to                           : /tmp/blockhash-0.3.1 
Setting out to                           : /tmp/blockhash-0.3.1/build 
Checking for 'gcc' (c compiler)          : /bin/gcc 
Checking for library m                   : yes 
Checking for program pkg-config          : /bin/pkg-config 
Checking for 'MagickWand'                : yes 
Checking for 'MagickWand' version        : yes 
'configure' finished successfully (1.043s)

python2 waf

Waf: Entering directory `/tmp/blockhash-0.3.1/build'
[1/2] c: blockhash.c -> build/blockhash.c.1.o
[2/2] cprogram: build/blockhash.c.1.o -> build/blockhash
Waf: Leaving directory `/tmp/blockhash-0.3.1/build'
'build' finished successfully (0.583s)

regards.

Tests are failure, hash values are not match

Tests are failure when I run following commands:

:~# cd blockhash/ && ./waf configure
:~# ./waf
:~# ./test.sh

The output is:

--- exact-hashes.txt	2021-08-20 19:48:07.233964244 +0300
+++ -	2021-08-20 19:49:03.403555970 +0300
@@ -1,5 +1,5 @@
 00000000fffffffff803f807f807f80ff90ff90fb90c9980ffffffff00000000  puffy_white.png
-00000fe07ff8fff81ff8399831983bcc35ac303c384c3ffc0ef08660c003ffff  00133601.jpg
+00000fe07ff8fff81ff8399831983bcc35ac303c384c3ffc8ef00660c003ffff  00133601.jpg
 00007ff07ff07fe07fe67ff07560600077fe701e7f5e000079fd40410001ffff  clipper_ship.jpg
 000083f887fe8fff0fe00fe00ff80ff807fc07f807f803f803e003e007e0ffff  00136101.jpg
 0002001f1fff1fff7ff71ff301f300007f9278f0f8700fc0ff98cc88c1cc03fc  32499201.jpg

00133601.jpg file hash value does not match. I looked at other blockhash implementations (blockhash-python). 00133601.jpg file hash values are followings:

exact: 00000fe07ff8fff81ff8399831983bcc35ac303c384c3ffc8ef00660c003ffff
quick(even): 00000fe07ff8fff81ff839d831883bcc39dc31ac300c3ffc9ef88e708041f1bf

00133601.jpg file hash values in this repository:
exact: 00000fe07ff8fff81ff8399831983bcc35ac303c384c3ffc0ef08660c003ffff
quick: 00000fe07ff8fff81ff839d831883bcc39dc31ac300c3ffc9ef88e708041f1bf

As I understund, exact hash values are not match, so tests are failure. Can you fix it?

Use EXIF orientation tag when processing JPEG images

Hello, wanted to document a drastic different hamming distance between two identical images of different compression.

compressed.jpg fe7ffefffefffc3cf839f823f81cf818f00ff00ff007e007e003e003c0000000
original.jpg ecfce4fcf2f0fbc0fb00fc00e00060000000e000f000ff00ffe0fffcfffefffe

Using the hamming distance function in blockhash-js, it returns a value of 120.

The images:
https://s3.amazonaws.com/denisnazarov/blockhash/compressed.jpg
https://s3.amazonaws.com/denisnazarov/blockhash/original.jpg

Has this implementation been tested with high resolution images? I can add some failing tests.

Test alternate hashing mechanisms

According to https://twitter.com/jonaso/status/549967116485808128

Instead of taking a mean over horizontal bands and comparing the reduced pixel (ie the 16x16 image) against this mean, make a local mean over a 3x3 pixel cell and compare the pixel against this mean. Set the bit to 1 or 0 according to whether the pixel value is above or below the mean. This will generate a hash with more vivid bit changes and thus less hash collisions.

It is also possible to calculate the gradient for each cell (pixel), and comparing that to a mean for neighboring cells like the previous described method. This will generate a bit that is independent of local intensity variations, only the gradient will remain.

If you use a gradient it could be wise to use two orthogonal vectors to describe the plane, and it could also be wise to only use the absolute value.

One interesting variation is to hash the gradient into bins and form fingerprints from that. Such a fingerprint can be somewhat resistant to cropping if done right.

Always even? Never odd!

When I was looking at the statistics of our generated hashes from WMC, it struck me that the hamming distance between two hashes always seem to be an even number. This means we see differences of 2, 4, 6 bits etc but never just 1 or 3 bits. I wonder if this is an implementation fault or a peculiarity of the algorithm: it seems to effectively make it 128 bits.

Something for @petli to consider during his travels :)

Print detailed error messages for failed operations

Seems to work reasonably well, but I've found a PNG file on which it breaks. Might be good to figure out at least why it breaks on this one :)

jonas@dev:~$ blockhash/build/blockhash EZSYG.png
Error opening image file EZSYG.png

(it doesn't actually have a problem opening it, but it seems this is the error thrown for any error parsing a file once opened)

ezsyg

Error opening image file. -- Please improve Exception handling.

After downloading all required packages that are needed to run blockhash on my system I kept encountering the message "Error opening image file «image»." I thought this was rather odd seeing how the images are not broken, and the program seems to report this for every image I tried to open.

After spending some time with gdb I came to the conclusion that an exception was being raised by ImageMagick, but this exception was completely ignored in the script's output. The exception I was a MissingDelegateError with the message "no decode delegate for this image format: JPG"

After a quick search the cause of this seems to be that some Linux distributions that offer ImageMagick and the dev packages don't include the delegates that ImageMagick provides when you build from source. -- I resolved the "Error opening image file «image»." issue by compiling the ImageMagick package (from here) again and verifying that it now did have the necessary delegates.

blockhash is now working as expected, but the error could really use some improvement! I spent close to two hours on resolving this issue, and a more detailed error message could really save other people this headache! (also see this page)

Steps I took to compile the delegates, for anyone who happens to stumble across this:

$ wget https://www.imagemagick.org/download/releases/ImageMagick-7.0.8-8.tar.gz
$ extract ImageMagick-7.0.8-8.tar.gz
$ cd ImageMagick-7.0.8-8
$ ./configure
$ make
$ buildpkg

Compiler warnings with clang

==> ./waf configure --prefix=/usr/local/Cellar/blockhash/0.2
Setting top to                           : /private/tmp/blockhash-20170113-97602-92yo9u/blockhash-0.2 
Setting out to                           : /private/tmp/blockhash-20170113-97602-92yo9u/blockhash-0.2/build 
Checking for 'gcc' (c compiler)          : clang 
Checking for library m                   : yes 
Checking for program pkg-config          : /usr/local/opt/pkg-config/bin/pkg-config 
Checking for 'MagickWand'                : yes 
Checking for 'MagickWand' version        : yes 
'configure' finished successfully (0.175s)
==> ./waf
Waf: Entering directory `/private/tmp/blockhash-20170113-97602-92yo9u/blockhash-0.2/build'
[1/2] c: blockhash.c -> build/blockhash.c.1.o
../blockhash.c:96:35: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
            blocks[j] = v > m || (abs(v - m) < 1 && m > half_block_value);
                                  ^
../blockhash.c:96:35: note: use function 'fabsf' instead
            blocks[j] = v > m || (abs(v - m) < 1 && m > half_block_value);
                                  ^~~
                                  fabsf
../blockhash.c:114:35: warning: using integer absolute value function 'abs' when argument is of floating point type [-Wabsolute-value]
            result[j] = v > m || (abs(v - m) < 1 && m > half_block_value);
                                  ^
../blockhash.c:114:35: note: use function 'fabsf' instead
            result[j] = v > m || (abs(v - m) < 1 && m > half_block_value);
                                  ^~~
                                  fabsf
../blockhash.c:352:1: warning: control may reach end of non-void function [-Wreturn-type]
}
^
../blockhash.c:364:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main (int argc, char **argv) {
^
../blockhash.c:364:1: note: change return type to 'int'
void main (int argc, char **argv) {
^~~~
int
4 warnings generated.
[2/2] cprogram: build/blockhash.c.1.o -> build/blockhash
Waf: Leaving directory `/private/tmp/blockhash-20170113-97602-92yo9u/blockhash-0.2/build'
'build' finished successfully (0.232s)

C++11 implementation

Of possible interest: I have ported your implementation to C++11. All the relevant code is here in this one include file. Specifically it leverages iterators, C++11’s std::bitset, and RAII (nearly all allocations are on the stack). No benchmarks yet, and I am still tweaking the hexify function to match the output of your C99 code – but I thought someone might want to see it. Feedback is of course welcome!

Algorithm improvement

We're coming across a few images like the ones below that result in hashes that are close enough that they fall below or at 10 bits:

http://commons.wikimedia.org/wiki/File:Field,_tracks_and_The_Wrekin_-_geograph.org.uk_-_780016.jpg
http://commons.wikimedia.org/wiki/File:N734511468_542359_2084.jpg

It's the age old problem of images with a lighter shade at the top and darker shade at the bottom. Not surprisingly, many landscape images are exactly like that :-)

It strikes me that perhaps there's a way to improve the algorithm by normalising images first. This could probably be very elaborate, but a simple mechanism would be to take every second column of the image and flip it 180 degrees. It should likely not have any effect on normal images, but on images like the above, it would result in hashes that are a bit more divergent since the differences between the lighter and darker parts would even out more.

@artfwo and @petli, what do you think about that?

New release

Could you release a new version? I'm mainly interested in getting Python 3 to work, which was done in 07268ae, but is not included in the latest release. Relevant is NixOS/nixpkgs#149400 which currently needs to use the master blockhash version to allow removal of Python 2.

Uses: Web server modules

Just wanted to share an idea after hearing your talk at FOSDEM.

Web servers like nginx, lighttpd, apache, could use a blockhash module that would calculate (before serving an image) or fetch the blockhashes from an existing database and then keep the hashes in a cache.

As images are served to web clients, the blockhash metadata could be part of the HTTP headers served along the image, then web browsers/web clients could use the blockhash information for automatic attribution purposes.

blockhash.io domain doesn't resolve

Josephs-MacBook-Pro:~ joe$ curl -IL http://blockhash.io
curl: (6) Couldn't resolve host 'blockhash.io'
Josephs-MacBook-Pro:~ joe$ ping blockhash.io
ping: cannot resolve blockhash.io: Unknown host
Josephs-MacBook-Pro:~ joe$ 

Same in the browser.

Support for ImageMagick >= 7

Build fails when ImageMagick 7.0.2 is present during compilation . I could fix the failure by changing the path to MagickWand/MagickWand.h when the following error showed up -

../blockhash.c:14:10: fatal error: 'wand/MagickWand.h' file not found
#include <wand/MagickWand.h>

Complete build logs - https://gist.github.com/02477d5bcbbbf2b4697991d18cd48add

Could you please add support for ImageMagick 7.0.x?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.