Giter VIP home page Giter VIP logo

hyperstream's Introduction

HyperStream logo

DOI

Build Status

HyperStream

Hyperstream is a large-scale, flexible and robust software package for processing streaming data.

Hyperstream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. Although developed specifically for SPHERE, Hyperstream is a general purpose tool that is well-suited for the design, development, and deployment of algorithms and predictive models in a wide space of sequential predictive problems.

This software has been designed from the outset to be domain-independent, in order to provide maximum value to the wider community. Key aspects of the software include the capability to create complex interlinked workflows, and a computational engine that is designed to be "compute-on-request", meaning that no unnecessary resources are used.

- NOTE: This software is in a stable but early beta, and hence the API may change significantly.

The system consists of the following 3 layers, from bottom up:

Tools

Tools are the computation elements. They take in input data in a standard format (dict of list of hyperstream.instance.Instance objects) and output data in a standard format (list of hyperstream.instance.Instance objects). Tools are version controlled. Minor version numbers should be used for updates that will not require recomputing streams, since the output should be identical (in expectation for stochastic streams). The output should be identical (again, in expectation for stochastic streams) regardless of whether the tool is run twice on time-ranges t1..t2 and t2..t3 or just once on the time-range t1..t3. Major version number changes will cause the stream to be recomputed.

Tool Versions

The tool versions form a major/minor/patch 3-tuple, e.g. v1.3.2. See https://pypi.python.org/pypi/semantic_version/ for details. In our setting the major version number is treated as a binary flag: 0 for development, 1 for production. Minor version numbers indicate changes that affect the output, or in the API. The patch number indicates changes that do not affect the output or API in any way (e.g. speedups).

Streams

Streams are objects that use a particular kernel for computation, with fixed parameters and filters defined that can reduce the amount of data that needs to be read from the database. The stream is physically manifested in the database (mongodb) for the time ranges that it has been computed on.

There are special data streams, for which a custom hyperstream.interface.Input or hyperstream.interface.Output objects can be defined, in order to work with custom databases or file-based storage.

Workflows

Workflows define a graph of streams. Usually, the first stream will be a special "raw" stream that pulls in data from a custom data source. Workflows can have multiple time ranges, which will cause the streams to be computed on all of the ranges given.

Installation

git clone [email protected]:IRC-SPHERE/HyperStream.git
cd HyperStream
virtuenv venv
. venv/bin/activate
pip install -r requirements.txt
python -c 'from hyperstream import HyperStream'

Running tests

Run the following command

nosetests

Note that for the MQTT logging test to succeed, you will need to have an MQTT broker running (e.g. Mosquitto). For example:

docker run -ti -p 1883:1883 -p 9001:9001 toke/mosquitto

or on OSX you will need pidof and mosquitto:

brew install pidof
brew install mosquitto
brew services start mosquitto

Running the examples

- TODO

License

This code is released under the MIT license.

Acknowledgements

This work has been funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/K031910/1 - "SPHERE Interdisciplinary Research Collaboration".

hyperstream's People

Contributors

meeliskull avatar tdiethe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.