Giter VIP home page Giter VIP logo

capaldi's Introduction

The Idea

Capaldi is a Python-and-R-based time series expert system. It's designed to take a generic time series - that is, floating point or integer values at a sequence of times, and return:

  • A list of candidate models fit to the data.
  • Some basic information about why these fitted models are suitable
  • Some basic information about why other candidate models can't be reasonably fit.

This information will be returned as a mix of text and graphs, packed into a JSON Object.

Status

Currently, Capaldi is set up to take a generic time series, regroup the data into a number of different date/time buckets, and run a few algorithms in bulk on it. It provides some minimal sanity checking based on the number of buckets produced and the some of the properties of the data.

It is not yet set up to provide the planned expert feedback, plain text responses, or pretty graphics.

Design

Capaldi is implemented as a Python module, defined in the capaldi directory. With the repository, the algs directory includes submodules for each individual time series algorithm. Many of these submodules are wrappers for R packages that must be hosted on a partner OpenCPU instance; OpenCPU will serve a local installation of R and make it accessible via a JSON API.

Sanity checks are currently hosted in algs as well but will be getting moved to a new checks module. Many of the algorithms depend on the same assumptions so these can be collapsed to simple checklist.

The principal function -capaldi() in capaldi/capaldi.py takes a data frame, divides it into a number of chunks, tests each independence assumption, and then runs each algorithm on each chunk.

Algorithms

Currently-incorporated

Algorithm Source
ARIMA source
Markov-Modulated Poisson Processes (MMPP) source
Barry and Hartigan's Product Partition Model source

In-progress

Algorithm Source
Bayesian Structural Time Series / Google Causal Impact source
Seasonal Hybrid ESD (S-H-ESD) / Twitter Anomaly Detection source
E-Divisive With Medians (EDM) / Twitter Breakout Detection source
Kalman Filters source

Planned

Algorithm Source
... ...

capaldi's People

Contributors

pmlandwehr avatar

Stargazers

Vishal Belsare avatar

Watchers

Jeffrey Borowitz avatar James Cloos avatar  avatar  avatar  avatar Hugh avatar Mike DeLong avatar  avatar Giant Oak avatar

Forkers

vishalbelsare

capaldi's Issues

Better test data

We need test data that passes the error checks so that we can run the tests on it! (We can also disable the error checks so that the tests don't run, but that's a meh idea.)

Figure out how to cut off grouping process with luigi

Without luigi we use a for loop. When the number of buckets gets too sparse, we break. Given that we've defined checking the empty buckets as a task, is there a luigi-ish way to break the loop? (Alternately, can undefine the check and throw it all into CapaldiWrapper, but that seems very wrong.

Simplify TwitterAnomaly Detection

Per the original code, we use a weird OrderedDict wrapper for anomaly detection. It seems like this could be a lot cleaner. Something to inspect when we have time.

Use binary search for error checks on one dimensional time buckets

the checks for whether there are too many or too few empty buckets can be considered linear within a particular dataframe: if X has too few buckets when split every six hours, it will also have too few buckets when split every 12 hours. As such, we should be able to do a binary search across the line of possible time slices to figure out the minimum number of buckets; we should also just be able to use a histogram to find the optimal count.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.