Giter VIP home page Giter VIP logo

insanelyfaststringquantization's Introduction

InsanelyFastStringQuantization

This repository implements "Extremely Fast Text Feature Extraction for Classification and Indexing" in Pure-Python for extremely fast string quantization.

There are NO dependencies! ..... If you don't plan on using the progress bar (tqdm).

A Pure-Javascript implementation is in the works for in-browser Deep learning (Tensorflow.js).

NOTE: Tested only on Python >= 3.7, May not work on other versions of Python!

About

Given an input string, a hash of the string is returned that has certain properties:

  • No model required to generate features from string of arbitrary length.
  • Extremely low memory requirements for the lookup table
  • Insanely. Fast. Over 7200000 Characters/sec in Pure-Python!
  • The quantized feature vector represents the PRESENCE of words.
  • Rather than frequency in the case of TF-IDF or BOW.
  • Since this hashing is very lossy, it's not recommended for applications where inference speed is not a priority.

Getting Started

from InsanelyFastStringQuantization import Hasher

vectorizer = Hasher(16, random_table=False) # Generate feature vector of size 16, and use a static-hard-coded lookup table
# random_table is recommended to be set to False for consistency between production environments, 
# or properly control seed for consistency hashing

# Quantize a single string
print(vectorizer.vectorize("Hello World!")) # [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]

# Quantize a list of strings
print(vectorizer.vectorize(["Hello World!", "Buy Now!", "Add to Cart"])) 
# [
#    [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], 
#    [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
#    [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
# ]

Contributors

Surya Kant Sahu

  • Exploring the intersection of Recommender Systems and Reinforcement Learning. Built Data Pipelines for multiple realtime Machine Learning applications.
  • I play the Piano. A huge fan of Frédéric Chopin and Japanese Neo-Classical.
  • Contact:

insanelyfaststringquantization's People

Contributors

ojus1 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.