Giter VIP home page Giter VIP logo

online-oceanarium's Introduction

Tim Hargreaves' GitHub Profile

Overview

GitHub Statistics Top Languages

The Code Consortium

The Code Consortium is a collection of three packages written in Python, R, and Julia. They consist of efficient implementations of algorithms related to machine learning, streaming, and Monte Carlo methods, respectively. They were developed in collaboration with various other students with the goal of practicing modern package development (including CI/CD pipelines for testing, documentation, linting and benchmarking), building experience with collaboration on large code bases, and finally, sharing quality implementations of useful algorithms in a form more accessible to students than productionised packages.

Follow the image links below to check out the project.

online-oceanarium's People

Contributors

kasia-kobalczyk avatar thargreaves avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

online-oceanarium's Issues

Particle Filter

A generic particle filter with the ability to specify arbitrary noise, measurement, and control distributions.

Online linear regression

Not sure if this is even possible, but it would be nice to have to obtain an algorithm for updating regression that has a lower time complexity than recomputing coefficients from scratch.

There Be Bandits

Implement common strategies for Bernoulli and normal bandits:

  • Random
  • Epsilon greedy
  • Epsilon first
  • Epsilon decreasing
  • UCB
  • Thompson sampling

Sorting

Sort a list in an online fashion using insertion sort.

Test coverage is poor

I think we should be aiming for test coverage in the 80%s at this point. An easy way to up it is to add unit tests for all mean and variance streamers, which can be based on those for SMA and CMA.

Generic summary method

Implement a method for the built-in summary() function which prints a summary of the streamer's internal state.

Implement running mean and variance algorithms.

Implement streaming versions of the following statistics:

  • Variance (biased and unbiased)
  • Other streaming averages (see here)

You can base these off the Mean class. For the first, the goal is to solve for $\sigma^2_{n+1}$ in terms of $\sigma^2_n$ and $x_{n+1}$.

Behaviour of SMA when window size > init length

When the window size is greater than the initialisation vector length as in SMA$new(c(1, 2), window=3) the values vector is padded on the left with zeros so that SMA$value() is 1 rather than 1.5 initially. Is this intended behaviour? If so, it might be worth adding that to the docs.

Collaborative filtering/NMF

This requires some looking into before going ahead. What algorithms exists for online collaborative filtering?

Thompson Sampling

Thompson sampling for common distributions with conjugate priors. It might be cleanest to implement this as a general exponential family and then define special cases.

Benchmarking

It would be helpful to add some benchmark vignettes or notebooks to demonstrate speed-ups/memory savings compared to offline methods using different implementations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.