Giter VIP home page Giter VIP logo

ihashdna's Introduction

iHashDNA

Perceptual hashing library in python (with redis), a wannabe PhotoDNA

What is Perceptual Hashing

Perceptual hashing is the use of an algorithm that produces a snippet or fingerprint of various forms of multimedia.[1][2] Perceptual hash functions are analogous if features of the multimedia are similar, whereas cryptographic hashing relies on the avalanche effect of a small change in input value creating a drastic change in output value. Perceptual hash functions are widely used in finding cases of online copyright infringement as well as in digital forensics because of the ability to have a correlation between hashes so similar data can be found (for instance with a differing watermark). Based on research at Northumbria University,[3] it can also be applied to simultaneously identify similar contents for video copy detection and detect malicious manipulations for video authentication. The system proposed performs better than current video hashing techniques in terms of both identification and authentication.

Wikipedia, Perceptual Hashing

TLDR: How Perceptual Hashing works

Why we created 'Imageid' and saved 47% of the moderation effort | by Diego  Essaya | Taringa! | Medium

Pic Source: Why we created 'Imageid' and saved 47% of the moderation effort | by Diego Essaya | Taringa! | Medium

Perceptual hashing converts an image, by degrading it and turning it into "pixels", into a binary (or hexadecimal) sequence. Unlike cryptographic hashing, perceptual hashing lacks of avalanche effect, making any change in the image easily perceivable in the hash.

What iHashDNA does

It uses phash and whash by checking initially phash, then whash.

By combining these two with a db (redis), you get this library.

You can:

  1. Ban images: Add the hash of the image to the DB (and checks if already in it). This includes rotations (90 degrees left right 180 up down) of the pictures.
  2. Unban images: Remove the hash and all the similar hashes from DB;
  3. Whitelist images: Ignore a picture hash.

Practical examples

Perceptual hashing is a good way to recognize two similar images. If you need to:

  • Fast indexing similar images;
  • Check for prohibited content without saving it into your DB (child pornography, pornography, porn, gore...);
  • Check for watermarked original copyrighted content.

and more...

The library can easily detect an edited photo if it has:

  • Color changes;
  • Random garbage over it (watermarks, stickers....);
  • slight cropping.

Issues and limitations

Remember that this is not ML-Based.

It can be easily bypassed by cropping the image.

Here you will find an interesting article that evaluates the various functions of perceptual hashing.

This library is a wannabe PhotoDNA.

How to use it

Requirements

  1. Install redis

  2. Start redis

  3. git clone https://github.com/matteounitn/iHashDNA.git

  4. cd into folder

  5. (Optional) create a venv:

    python3 -m venv venv && source venv/bin/activate

  6. pip3 install -r requirements.txt

Then you are good to go!

Example

Checkout this example.

ihashdna's People

Contributors

matteounitn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ihashdna's Issues

Conversion to SQLite

Due to lower speed, redis pottery should be removed.

An approach should be using sqlite.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.