Giter VIP home page Giter VIP logo

crucible's Introduction

Crucible

x

Crucible is a refinement and feedback suite for algorithm testing. It was designed to be used to create anomaly detection algorithms, but it is very simple and can probably be extended to work with your particular domain. It evolved out of a need to test and rapidly generate standardized feedback for iterating on anomaly detection algorithms.

How it works

Crucible uses its library of timeseries in /data and tests all the algorithms in algorithms.py on all these data. It builds the timeseries datapoint by datapoint, and runs each algorithm at every step, as a way of simulating a production environment. For every anomaly it detects, it draws a red dot on the x value where the anomaly occured. It then saves each graph to disk in /results for you to check, grouped by algorithm-timeseries.

To be as fast as possible, Crucible launches a new process for each timeseries.

If you want to add an algorithm, simply create your algorithm in algorithms.py and add it to settings.py as well so Crucible can find it. Crucible comes loaded with a bunch of stock algorithms from an early Skyline release, but it's designed for you to write your own and test them.

Dependencies

Standard python data science suite - everything is listed in algorithms.py

  1. Install numpy, scipy, pandas, patsy, statsmodels, matplotlib.

  2. You may have trouble with SciPy. If you're on a Mac, try:

  • sudo port install gcc48
  • sudo ln -s /opt/local/bin/gfortran-mp-4.8 /opt/local/bin/gfortran
  • sudo pip install scipy

On Debian, apt-get works well for Numpy and SciPy. On Centos, yum should do the trick. If not, hit the Googles, yo.

Instructions

Just call python src/crucible.py. Then check the /results folder for the results. Happy algorithming!

To add a timeseries:

Create a json array of the form [[timestamp, datapoint], [timestamp], datapoint]]. Put it in the /data folder. Done.

Graphite integration:

There's a small tool to easily grab Graphite data and analyze it. Just call python utils/graphite-grab.py "your_graphite.com/render/?from=-24hour&target=your.metric&format=json" and the script will grab Graphite data, format it, and put it into /data for you.

Contributions

It would be fantastic to have a robust library of canonical timeseries data. Please, if you have a timeseries that you think a good anomaly detection algorithm should be able to handle, share the love and add the timeseries to the suite!

x

crucible's People

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.