Giter VIP home page Giter VIP logo

flexihash's Introduction

Flexihash

Build Status Coverage Status

Flexihash is a small PHP library which implements consistent hashing, which is most useful in distributed caching. It requires PHP5 and uses PHPUnit for unit testing.

Installation

Composer is the recommended installation technique. You can find flexihash on Packagist so installation is as easy as

composer require flexihash/flexihash

or in your composer.json

{
    "require": {
        "flexihash/flexihash": "^3.0.0"
    }
}

Usage

$hash = new Flexihash();

// bulk add
$hash->addTargets(['cache-1', 'cache-2', 'cache-3']);

// simple lookup
$hash->lookup('object-a'); // "cache-1"
$hash->lookup('object-b'); // "cache-2"

// add and remove
$hash
  ->addTarget('cache-4')
  ->removeTarget('cache-1');

// lookup with next-best fallback (for redundant writes)
$hash->lookupList('object', 2); // ["cache-2", "cache-4"]

// remove cache-2, expect object to hash to cache-4
$hash->removeTarget('cache-2');
$hash->lookup('object'); // "cache-4"

Tests

Unit Test

% vendor/bin/phpunit

Benchmark Test

% vendor/bin/phpunit tests/BenchmarkTest.php

Further Reading

flexihash's People

Contributors

dmnc avatar myxobek avatar pda avatar sergeyklay avatar serima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flexihash's Issues

Using MD5Hasher with large server pool breaks correctness of algorithm

Here is a fun bug.

In a simulation/test we had 10 nodes added like this:

$hasher->addTargets(['node1:port1', ... 'node10:port10']);

We also set num replicas to 100 but you can reproduce with 64.

Then we tested that 1000 keys looked up a roughly evenly distributed (pass), and that if we remove a node (node4:port4 in our case) they remain even which works fine. We also test that each node retains all of the keys it had originally after one node is removed (as well as something close to 1/N-1 of the 1/N keys the failed host had in the original ring).

This part fails - somehow removing a node (or strictly building same ring without node4 but they should be equivalent), makes some small percentage of keys move which node they are on even if they weren't on the failed node.

After some debugging, I think I can explain this.

When using MD5Hasher you chose to keep hex strings at the ring positions rather than converting back to ints. That in itself is not necessarily wrong, however it falls into a PHP dynamic typing gotcha: the internal positionToTarget lookup ends up looking like this (after sorting):

...
  'ff4a2aff' => 'host7:port7',
  'ff6c8fa3' => 'host5:port5',
  'ff820a72' => 'host5:port5',
  'ffc8b06f' => 'host6:port6',
  'fff31063' => 'host10:port10',
  11023620 => 'host4:port4',
  13715227 => 'host7:port7',
  13952778 => 'host4:port4',
  16385505 => 'host6:port6',
  17878142 => 'host2:port2',
...

i.e. using an all-decimal-digit string as an array key in PHP causes PHP to convert it to an int key which then sorts after all string keys not in natural hex-digit order. Keys that are in these ring segments then move unpredictably between ring reconfigurations because the algorithm is no longer correct and comparisons are not working as expected.

Switching to CRC32Hasher with identical params fixes the issue because those are always integer keys.

2 possible fixes:

  1. You //could// do some magic in Flexihash to make this work correctly - it's not really MD5Hasher's fault PHP does funny things with "int-like" strings. For example you could prefix string keys in the array so they look like _ffaabb33, the leading underscore would stop PHP mangling to an int and would keep sort order correct. You'd obviously need to also prepend the hashed key with _ at lookup time etc.
  2. Alternatively, you could decide that HasherInterface requires returned values to be ints and no larger than PHP_INT_MAX and fix MD5Hasher to meet this - just need hexdec() around the return statement.

Please let us know which you'd prefer both are simple changes.

Create a composer package

It would be great (and appreciated) if you could create a composer package for this library. Thank you!

Versioning, SemVer, tags, CHANGELOG

  • Current version, which has barely been touched for seven years, should be tagged as v1.0.0.
  • CHANGELOG.md should be created.
  • Future versioning should follow SemVer
    • Changing autoloaders, namespacing etc should be a major release e.g. v2.0.0

Consistent hashing question

This isn't an issue with the library, just a question about how consistent hashing works with weights. If I change the weight of a target by some small amount, let's say out of 10 targets each with a weight of 1, I change the weight of one target to 3. How significantly will the keys be redistributed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.