Giter VIP home page Giter VIP logo

learned-filters's Introduction

learned-filters

Experimental work on learning-augmented AMQs

Technical stack

Requires bitarray, mmh3, and torch, all available on pip.

How to use

To learn how to run tests, go to the src directory and run

python test.py -h

This will display a help message that lists all of the available options.

Quick start

Run the following to see the bloom filter in action. (Use toast or sandwich in place of bloom to see how well the other filters do on the default parameters.)

python test.py -f bloom

Approach

Filter architectures

  • Bloom filter
  • Open sandwich bloom filter (Kraska)
  • Sandwiched bloom filter (Mitzenmacher)

Testing setup

The Word type

The Bloom filter needs a hashable input, and the model needs matrix of inputs in [0,1]. To meet both requirements, we use the following approach:

Our abstraction is that we are working with strings of n letters on an alphabet of size C.

The model gets a 1x(Cn) row vector of values in {0, 1} where only one of the scalars in the vector is 1, all others 0. In other words, for the letter with index 0 <= i < C, the vector has 0s everywhere except at location i, which has value 1.

The filter gets an n-element tuple of integers from 0 to C-1, which is hashable.

Note: We can convert decimal numbers to Words by using C=10, converting each number's decimal place to a letter in [0..9]

We handle these conversions by adding a Word type, along with WordNet and WordBloom wrappers for Net and Bloom, respectively.

Estimating model size

We estimate the size of the machine learniing model by counting the number of elements in the two matrices that comprise the model's layers, and then multiply by the size of a float in python (24 bytes).

Hyperparameters

  • heuristic for choosing tau
  • training time/epochs
  • neural net hyperparameters
  • ML model

Lingering questions

  • How much do we train the neural net / pre-filter?

TODO

  • Build overflow Bloom filter
  • Testing rig (setup sending different inputs to model vs filter)
  • Sandwich
  • Tuning tau

learned-filters's People

Contributors

djslzx avatar pzhao1799 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.