Giter VIP home page Giter VIP logo

kpruningradixtrie's Introduction

JPruningRadixTrie
MIT License

Java port of wolfgarbe/PruningRadixTrie. See link for the background of the project etc. All credit to Wolf Garbe.

No features have been added/removed other than where necessary to match Java conventions. So functionality and performance should be unchanged, aside only from these exceptions:

  • Removed parameter out long termFrequencyCountPrefix from method getTopkTermsForPrefix because awkward to achieve this in Java without changing the method's return type significantly. Specifically, the functionality lost is- getting the number of terms in the dictionary which begin with the given prefix (not an essential feature).
  • readTermsFromFile now requires that the frequency values in the file have a max value of 2^63 - 1, whereas before they could have a max value of 2^64 - 1 (due to the frequency values are interpreted as signed longs rather than unsigned longs).
  • Introduced parameter String delimiter to method readTermsFromFile so that text files with delimiters other than \t can be read. See Usage below.

Download Instructions:

Copy the source code into your project! There are only 4 classes to copy, and no external dependencies. I'm afraid I do not have plans to upload this project to Maven Central or the like.


Usage:

Create Object

PruningRadixTrie pruningRadixTrie = new PruningRadixTrie();

addTerm: insert term and term frequency count into Pruning Radix Trie. Frequency counts for same term are summed up.

pruningRadixTrie.addTerm("microsoft", 1000);

getTopkTermsForPrefix: retrieve the top-k most relevant terms for a given prefix from the Pruning Radix Trie.

String prefix = "micro";
int topK = 10;
List<TermAndFrequency> results = pruningRadixTrie.getTopkTermsForPrefix(prefix, topK);
for (TermAndFrequency result : results) {
  System.out.println(result.getTerm() + " " + result.getTermFrequencyCount());
}

readTermsFromFile: Deserialise the Pruning Radix Trie from disk for persistence. Note here that terms.txt contains tab-delimited lines, ie in the format term\tfrequency. If format is term,frequency then replace second argument with ",", etc.

pruningRadixTrie.readTermsFromFile("terms.txt", "\t");

writeTermsToFile: Serialise the Pruning Radix Trie to disk for persistence.

pruningRadixTrie.writeTermsToFile("terms.txt");

kpruningradixtrie's People

Contributors

benldr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.