Giter VIP home page Giter VIP logo

datastream.io's Introduction

datastream.io

An open-source framework for real-time anomaly detection using Python, Elasticsearch and Kibana.

Installation

The recommended installation method is to use pip within a Python 3.x virtalenv.

virtualenv --python=python3 dsio-env
source dsio-env/bin/activate
pip install -e git+https://github.com/MentatInnovations/datastream.io#egg=dsio

Usage

You can use dsio through the command line or import it in your Python code. You can visualize your data streams using the built-in Bokeh server or you can restream them to Elasticsearch and visualize them with Kibana. In either case, dsio will generate an appropriate dashboard for your stream. Also, if you invoke dsio through a Jupyter notebook, it will embed the streaming Bokeh dashboard within the same notebook.

Jupyter

Examples

For this section, it is best to run commands from inside the examples directory. If you have installed dsio via pip as demonstrated above, you'd need to run the following command:

cd dsio-env/src/dsio/examples

If instead you cloned the github repo then just cd dsio/examples will do.

You can use the example csv datasets or provide your own. If the dataset includes a time dimension, dsio will attempt to detect it automatically. Alternatively, you can use the --timefield argument to manually configure the field that designates the time dimension. If no such field exists, dsio will assume the data is a time series starting from now with 1sec intervals between samples.

dsio data/cardata_sample.csv

The above command will load the cardata sample csv and will use the default Gaussian1D anomaly detector to apply scores on every numeric column. Then it will generate an appropriate Bokeh dashboard and restream the data. A browser window should open that will point to the generated dashboard.

Bokeh

You can experiment with different datasets and anomaly detectors. E.g.

dsio --detector percentile1d path_to_my_dataset/my_dataset.csv

You can select specific columns using the --sensors argument and you can increase or decrease the streaming speed using the --speed argument.

dsio --sensors accelerator_pedal_position engine_speed --detector gaussian1d --speed 5 data/cardata_sample.csv

Elasticsearch & Kibana (optional)

In order to restream to an Elasticsearch instance that you're running locally and generate a Kibana dashboard you can use the --es-uri and --kibana-uri arguments.

dsio --es-uri http://localhost:9200/ --kibana-uri http://localhost:5601/app/kibana data/cardata_sample.csv

If you are using localhost and the default Kibana and ES ports, you can use the shorthand:

dsio --es data/cardata_sample.csv

ElasticKibana

If you don't have access to Elasticsearch and Kibana 5.x instances, you can easily start them up in your machine using the docker-compose.yaml file within the examples directory. Docker and docker-compose need to be installed for this to work.

docker-compose up -d

Check that Elasticsearch and Kibana are up.

docker-compose ps

Once you're done you can bring them down.

docker-compose down

Keep in mind that docker-compose commands need to be run in the directory where the docker-compose.yaml file resides (e.g. dsio-env/src/dsio/examples)

Defining your own anomaly detectors

You can use dsio with your own hand coded anomaly detectors. These should inherit from the AnomalyDetector abstract base class and implement at least the train, update & score methods. You can find an example 99th percentile anomaly detector in the examples dir. Load the python modules that contain your detectors using the --modules argument and select the target detector by providing its class name to the --detector argument (case insensitive).

dsio  --modules detector.py --detector GreaterThanMaxRolling data/cardata_sample.csv

Integration with scikit-learn

Naturally we encourage people to use dsio in combination with sklearn: we have no wish to reinvent the wheel! However, sklearn currently supports regression, classification and clustering interfaces, but not anomaly detection as a standalone category. We are trying to correct that by the introduction of the AnomalyMixin: an interface for anomaly detection which follows sklearn design patterns. When you import an sklearn object you can therefore simply define or override certain methods to make it compatible with dsio. We have provided an example for you here:

./datamstream.io/examples/lof_anomaly_detector.py

datastream.io's People

Contributors

canagnos avatar d-mo avatar harismichailidis avatar nicolasmota avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datastream.io's Issues

its not working

dsio --es-uri http://localhost:9200/ --kibana-uri http://localhost:5601/app/kibana data/cardata_sample.csv
Loading the data...
Done.

data found from 2013-05-24 23:51:37 to 2013-05-25 00:14:13
Converting to milliseconds ...
Done
Adding time offset of 1529302326.54 seconds
Setting speed to 1x
Done
Traceback (most recent call last):
File "C:\Program Files\Python36\Scripts\dsio-script.py", line 11, in
load_entry_point('dsio', 'console_scripts', 'dsio')()
File "c:\windows\system32\src\dsio\dsio\main.py", line 155, in main
cols=int(args.cols)
File "c:\windows\system32\src\dsio\dsio\main.py", line 70, in restream_dataframe
port=bokeh_port, update_queue=update_queue
File "c:\windows\system32\src\dsio\dsio\dashboard\bokeh.py", line 74, in generate_dashboard
if io_loop._running: # Assume we're in a Jupyter notebook
AttributeError: 'AsyncIOMainLoop' object has no attribute '_running'

Local First Run Results in AttributeError Exception

Hello when starting to test this out, I was immediately faced with the issue below.

Done.

data found from 2018-03-25 14:14:30 to 2018-03-26 05:49:05
Converting to milliseconds ...
Done
Adding time offset of 0.06 seconds
Setting speed to 1x
Done
Traceback (most recent call last):
  File ".../tryingdsio/dsio-env/bin/dsio", line 9, in <module>
    load_entry_point('dsio==0.1.0', 'console_scripts', 'dsio')()
  File ".../tryingdsio/dsio-env/src/dsio/dsio/main.py", line 155, in main
    cols=int(args.cols)
  File ".../tryingdsio/dsio-env/src/dsio/dsio/main.py", line 70, in restream_dataframe
    port=bokeh_port, update_queue=update_queue
  File ".../tryingdsio/dsio-env/src/dsio/dsio/dashboard/bokeh.py", line 74, in generate_dashboard
    if io_loop._running: # Assume we're in a Jupyter notebook
AttributeError: 'AsyncIOMainLoop' object has no attribute '_running'```

I'm not sure what the root cause of the issue is, but I have a non-breaking fix for myself that may also help other people. I will make a PR with the change.

Thank you!

output is not loading

When I execute restream_dataframe , final output is not loaded. can you help me on this issue.
issue

Use correct library versions in requirements.txt

Datastream.io is not working because some of the library dependencies are not set to the correct version (tornado and elasticsearch in my case).
Here is the pip freeze (python3.5) of the fully working datastream.io:

bokeh==1.3.0 dateparser==0.7.1 -e git+https://github.com/MentatInnovations/datastream.io@a243b89ec3c4e06473b5004c498c472ffd37ead2#egg=dsio elasticsearch==5.5.3 Jinja2==2.10.1 joblib==0.13.2 kibana-dashboard-api==0.1.2 MarkupSafe==1.1.1 numpy==1.17.0 packaging==19.0 pandas==0.24.2 Pillow==6.1.0 pyparsing==2.4.1.1 python-dateutil==2.8.0 pytz==2019.1 PyYAML==5.1.1 regex==2019.6.8 scikit-learn==0.21.2 scipy==1.3.0 six==1.12.0 tornado==4.5.3 tzlocal==2.0.0 urllib3==1.25.3

Review: utils_dashboard.py

config_json_cyber.py and config_json_car.py are two examples of scripts for generation of dashboards. They do not actually read the header of a CSV - details are hard-coded. To be fixed. Also check dashboard.json and static/screenshots/dashboard.png for target example.

dsio-notebook-example.ipynb running abnormally

dsio-notebook-example.ipynb running abnormally

[My env]
1 windows 7 x64
2 python 3.6.2
3 anaconda 3

[Problem]
While I was running dsio-notebook-example.ipynb from jupyter, the following happened,

error1

can't find where to put novelty =true

AttributeError: decision_function is not available when novelty=False. Use novelty=True if you want to use LOF for novelty detection and compute decision_function for new unseen data. Note that the opposite LOF of the training samples is always available by considering the negative_outlier_factor_ attribute.

TypeError: search() got an unexpected keyword argument 'doc_type'

I'm using docker-compose and exact versions of elasticsearch and kibana wrote in docker-compose.yml. but when I run dsio --es data/cardata_sample.csv I get the below error:
Traceback (most recent call last):
File "/home/data1/Documents/dsio-env/bin/dsio", line 11, in
load_entry_point('dsio', 'console_scripts', 'dsio')()
File "/home/data1/Documents/dsio-env/src/dsio/dsio/main.py", line 155, in main
cols=int(args.cols)
File "/home/data1/Documents/dsio-env/src/dsio/dsio/main.py", line 60, in restream_dataframe
generate_kibana_dashboard(es_conn, sensors, index_name)
File "/home/data1/Documents/dsio-env/src/dsio/dsio/dashboard/kibana.py", line 39, in generate_dashboard
vis_list = visualizations.get_all() # list all visualizations
File "/home/data1/Documents/dsio-env/lib/python3.6/site-packages/kibana_dashboard_api/visualizations.py", line 61, in get_all
res = self.es.search(index=self.index, doc_type=self.doc_type, body={'query': {'match_all': {}}})
File "/home/data1/Documents/dsio-env/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
return func(*args, params=params, **kwargs)
TypeError: search() got an unexpected keyword argument 'doc_type'
Anybody else had the same issue?
How can I solve this?
Thanks in advance.

Google Colab

Hey man, great package, if there is a way to get this to run on Google Colab, it would be very much appreciated. I have been struggling to convert it so that I can add it to the awesome google colab repo.

Thanks.

Derek

After executing "dsio data/cardata_sample.csv", can not open bokeh dashboard

After executing "dsio data/cardata_sample.csv", can not open bokeh dashboard

[env]
1 windows 7
2 python 3.6.2
3 anaconda 3

[problem]
1 executing "dsio data/cardata_sample.csv"
(tensorflow) E:\datastream\src\dsio\examples>dsio data/cardata_sample.csv
Loading the data...
Done.

data found from 2013-05-25 02:21:37 to 2013-05-25 02:44:13
Converting to milliseconds ...
Done
Adding time offset of 1528697700.11 seconds
Setting speed to 1x
Done

2 while opening browser(localhost:5001), anything can not be seen

The project is always maintaine ?

Hi ,

i hope it is ๐Ÿ’ฏ ! Specially to have a real time connector from elasticsearch ... to apply anomaly detection from production data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.