Giter VIP home page Giter VIP logo

myutil's Introduction

by Brendan O'Connor, http://brenocon.com

Java utilities for statistics/machinelearning and various supporting tools. (Often intended for NLP applications, though not much NLP in this library.) This needs a better name; currently it's "myutil", https://github.com/brendano/myutil

The idea is to be a library of functions for well-known algorithms, as opposed to a grand ML/NLP framework, because those are never as useful as one would hope (in my experience at least).

This is under active development so any of it may be broken at any time. If there are comments with a testing procedure, that may be a good sign.

Stuff in here

Math/stats/opt things:

  • Arr.java: lots of array/matrix math and manipulation utilities. Unlike Colt or Jama, uses the more natural Java arrays and array-of-arrays representations. Also includes all Java standard library methods, because I can't remember which class is which.
  • MCMC.java: generic MCMC algorithms: Slice sampling, Metropolis-Hastings
  • LibLBFGS: a port of LibLBFGS to Java. Seems to behave similarly as Stanford's OWLQN port, but it's more efficient.
  • FastRandom: a random number generator that's 10 times faster than the Java standard library's.
  • GaussianInference: conjugate posterior inference (exact and sampling) for Gaussian scalars, linear regression, and DLM's (Kalman filter, smoother, FFBS)
  • MVNormal2: linear algebra inference and samplers for multivariate normals (ported from Mallet)
  • LNInference: logistic normal MAP and samplers
  • ChainInfer.java: discrete chain inference: Viterbi, forward-backward, FFBS
  • Online algorithms: Vitter reservoir sampling (ReservoirSampler), and Welford running mean/variance (OnlineNormal1d(Weighted))
  • Util.java: some other math/stats functions

Non-math-y things:

  • ThreadUtil: basically ThreadPool wrappers for divide-and-conquer workloads
  • U.java: printing utilities (mostly)
  • BasicFileIO: IO utilities
  • Vocabulary: feature name/numberization (I'd love to get a better/more efficient one here)
  • Timer: timings for large sections of your program
  • JsonUtil: very simple wrappers for Jackson

NLP things:

  • corenlp/: runners for Stanford CoreNLP that work with JSON or XML-based one-line-per-document formats. Once you have thousands of documents, these formats are typically much faster to deal with than CoreNLP's one-document-per-file strategy. They're more Hadoop-friendly too. To use these, need to drop in the model file (stanford-corenlp-3.2.0-models.jar) into lib/stanford_extras

Example models:

  • In the root package, example implementation of CGS LDA. When working on a related model, I copy-and-paste one to get started then hack it up. scripts/ has viewers for it.

Licenses

Let's say new code is GPL version 2. Note there's code from other libraries inside here too, like JAMA and LibLBFGS and the Java SDK, which have their own licenses.

myutil's People

Contributors

brendano avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.