hawkular / hawkular-datamining Goto Github PK

Real-time time series prediction library with standalone server

Java 92.70% XSLT 0.78% R 5.76% MATLAB 0.26% Shell 0.50%

timeseries time-series-analysis time-series-forecast forecasting-models forecasting exponential-smoothing

hawkular-datamining's Introduction

Hawkular

Hawkular is a modular systems monitoring suite, that consists of several sub-projects for storing of metrics, alerting on incoming events and more. Those projects are developed in their own GitHub repositories and integrated in this project.

Project website: http://www.hawkular.org (its code: https://github.com/hawkular/hawkular.github.io)

Sub-projects are currently:

Hawkular-metrics: metric storage and retrieval engine
Hawkular-bus: asynchronous bus to connect the various parts
Hawkular-alerts: alerting on events.
Hawkular UI Console (a hawt.io 2 plugin)
- Hawkular-ui-services: common services and ngResource wrappers for Hawkular REST Apis.

Hawkular-charts: Charts and other Angular visualization components used to graphically render data in Hawkular.

Hawkular build tools: Helpers and definitions to build Hawkular

About this repository

In this repository we are assembling the individual pieces, sub projects and UI into the overall Hawkular instance.

In the root pom.xml you can actually set the particular versions of the components, but we can’t guarantee the various permutation of component versions to work together.

Building

To build Hawkular, clone this repository and build from the top level.

$ git clone https://github.com/hawkular/hawkular.git
$ cd hawkular
$ mvn install

Once those steps are achieved, .zip and .tgz archives will be available in dist/target directory.

Tip

If you build with mvn install -Pdev, an uncompressed directory will be created in dist/target. A default user will be created, the username is jdoe and the password is password. This can be convenient when you are working on the project as you won’t have to unzip/untar and register a new user. The uncompressed directory can be found in dist/target/hawkular-${version}/hawkular-${version}/ and run with bin/standalone.sh as normally starting a WildFly server.

Please have a look at the developer documentation for more information.

License

Hawkular is released under Apache License, Version 2.0 as described in the LICENSE document.

   Copyright 2015-2016 Red Hat, Inc.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

hawkular-datamining's People

Contributors

Stargazers

Watchers

hawkular-datamining's Issues

Catch exception from new Thread in RestPredictions

Computing prediction intervals of Exponential Smoothing models

I'm not sure whether the way predcition/confidecne intervals is correct.
I think you must use the formulas that appear here
and here
(models ANN, AAN, AAA)

After some time model converge to NaN

Optimalize SGD parameters to achieve better convergence.

Remove bindings to Hawkular metrics

Specify end time of precition

Currently it s possible to ask for n steps ahead predictions.

API should offer functionality for asking for predictions for any time in the future. Prediction time could be specified as future timestamp or time in seconds.

Get number positivity

I have a rule that throws numbers between -50 and 50 randomly, is there any way to predict the sign (positive or negative) of the next release with at least 90% accuracy based on a historical record?

Swagger for REST API documentation

Rename the repo to hawkular-prediction

this incorporates also little bit of refactoring (package names)

Model overfitting

Avoid model overfitting

Slim down distribution

At the moment app is deployed and distributed as wildfly server. The project uses only REST API, therefore it makes sense to use something "smaller" for example spring boot or wildfly swarm.

There are issues with the calculation of CollectionInterval and timestamp

    /**
     * @param collectionInterval collection interval in seconds
     */
    void setCollectionInterval(Long collectionInterval);

As declared by this interface, the interval of Metric should be in seconds.
However, in subsequent use, the modification of timestamp values based on collectionInterval was not consistent.

@Override
    public List<DataPoint> forecast(int nAhead) {

        List<DataPoint> result = new ArrayList<>(nAhead);

        for (int i = 1; i <= nAhead; i++) {
            PredictionResult predictionResult = calculatePrediction(i, null, null);
            DataPoint predictedPoint = new DataPoint(predictionResult.value,
                    lastTimestamp + i*metricContext.getCollectionInterval()*1000,
                    predictionResult.value + predictionIntervalMultiplier*predictionResult.sdOfResiduals,
                    predictionResult.value - predictionIntervalMultiplier*predictionResult.sdOfResiduals);

            result.add(predictedPoint);
        }

        return result;
    }

In this interface implementation, lastTimestamp+i * metricContext. getCollectionInterval() * 1000 is used to calculate the timestamp of the prediction point, indicating that Timestamp is measured in milliseconds.

However, in the subsequent calculation of the timestamp, the CollectionInterval was directly added without converting it to a timestamp in milliseconds.

@Override
    public DataPoint forecast() {
        PredictionResult predictionResult = calculatePrediction(1, null, null);

        return new DataPoint(predictionResult.value, lastTimestamp + metricContext.getCollectionInterval(),
                predictionResult.value + predictionIntervalMultiplier*predictionResult.sdOfResiduals,
                predictionResult.value - predictionIntervalMultiplier*predictionResult.sdOfResiduals);
    }

I am puzzled by the above phenomenon and hope to help answer it. If there is indeed an inconsistency issue, I am willing to assist with the modification.

Model initialization needs certain number of old values

Model should be successfully initialized only if there is specific number of historic values available (e.g 50 for arima). So prevent model of sending automatic predictions bus if not fully initialized.

Create Dockerfile with standalone server

Rename bus-integration module to metrics integration

Metrics integration module should contain metrics listener and sender for predicted metrics to alerts.

Prediction intervals

Calculate prediction intervals for forecasted points

Send AsyncResponse to Engine

Remove polling from RestPredictions. Async response should be sent over JMS and resumed in PredictionResultListener.
Add Async response object to predictionRequest

Metrics averaging before inserting to model.

Exponential smoothing models for metrics which are collected with high frequency (e.g. heap used metric) tend to work better for buckets. Therefore it is better to average N values and then use it as input to the model. Another solution could be use simple moving average to and use it as input for double exponential smoothing.

hawkular-datamining-dist/target/wildfly-9.0.0.Final/bin/standalone.sh --debug 9797
curl -X GET http://localhost:9080/hawkular/datamining

12:16:49,140 ERROR [stderr] (ServerService Thread Pool -- 59) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/log4j-defaults.properties

Log example:

12:16:49,370 INFO  [org.apache.spark.SecurityManager] (ServerService Thread Pool -- 59) Changing view acls to: pavol
15/09/23 12:16:49 INFO SecurityManager: Changing view acls to: pavol

Look at Apache Storm

Try RandomForestRegressor

it's quite new, but it's worth looking whether it has better results than the currently used StreamingLinearRegressionWithSGD
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/RandomForestRegressor.scala

edit: I am not sure if it can be used in the streaming manner