Giter VIP home page Giter VIP logo

hat-trie's Introduction

hat-trie

HAT-Trie structure for Python (2.x and 3.x).

This package is a Python wrapper for hat-trie C library.

https://travis-ci.org/kmike/hat-trie.svg?branch=master

Installation

pip install hat-trie

Usage

Create a new trie:

>>> from hat_trie import Trie
>>> trie = Trie()

trie variable is a dict-like object that support unicode keys and can have any Python object as a value. For keys that share prefixes it usually uses less memory than Python dict.

There is also hat_trie.IntTrie which only supports positive integers as values. It can be more efficient when you don't need arbitrary objects as values. For example, if you need to store float values then storing them in an array (either numpy or stdlib's array.array) and using IntTrie values as indices could be more memory efficient than storing Python float objects directly in hat_trie.Trie.

Another way to store float values is to use hat_trie.FloatTrie(). In this case precision is limited to float32.

Currently implemented methods are:

  • __getitem__()
  • __setitem__()
  • __contains__()
  • __len__()
  • get()
  • setdefault()
  • keys()
  • iterkeys()

Other methods are not implemented - contributions are welcome!

Performance

Performance is measured for hat_trie.Trie against Python's dict with 100k unique unicode words (English and Russian) as keys and '1' numbers as values.

Benchmark results for Python 3.3 (intel i5 1.8GHz, "1.000M ops/sec" == "1 000 000 operations per second"):

dict __getitem__ (hits)      6.874M ops/sec
trie __getitem__ (hits)      3.754M ops/sec
dict __contains__ (hits)     7.035M ops/sec
trie __contains__ (hits)     3.772M ops/sec
dict __contains__ (misses)   5.356M ops/sec
trie __contains__ (misses)   3.364M ops/sec
dict __len__                 785958.286 ops/sec
trie __len__                 574164.704 ops/sec
dict __setitem__ (updates)   6.830M ops/sec
trie __setitem__ (updates)   3.472M ops/sec
dict __setitem__ (inserts)   6.774M ops/sec
trie __setitem__ (inserts)   2.460M ops/sec
dict setdefault (updates)    3.522M ops/sec
trie setdefault (updates)    2.680M ops/sec
dict setdefault (inserts)    4.062M ops/sec
trie setdefault (inserts)    1.866M ops/sec
dict keys()                  189.564 ops/sec
trie keys()                  16.067 ops/sec

HAT-Trie is about 1.5x faster that datrie on all supported operations; it also supports fast inserts unlike datrie. On the other hand, datrie has more features (e.g. better iteration support and richer API); datrie is also more memory efficient.

If you need a memory efficient data structure and don't need inserts then marisa-trie or DAWG should work better.

Contributing

Development happens at github:

Feel free to submit ideas, bugs, pull requests or regular patches.

Please don't commit changes to generated C files; I will rebuild them myself.

Running tests and benchmarks

Make sure tox is installed and run

$ ./update_c.sh
$ tox

from the source checkout. You will need Cython to do that.

Tests should pass under python 2.7 and 3.3+.

$ tox -c bench.ini

runs benchmarks.

Authors & Contributors

This module wraps hat-trie C library by Daniel Jones & contributors.

License

Licensed under MIT License.

hat-trie's People

Contributors

b4hand avatar kmike avatar mheilman avatar mikepb avatar svisser avatar yflau avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.