hawkular / hawkular-datamining Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 22.0 4.09 MB

Real-time time series prediction library with standalone server

Java 92.70% XSLT 0.78% R 5.76% MATLAB 0.26% Shell 0.50%

exponential-smoothing forecasting forecasting-models time-series-analysis time-series-forecast timeseries

hawkular-datamining's Issues

Send AsyncResponse to Engine

Remove polling from RestPredictions. Async response should be sent over JMS and resumed in PredictionResultListener.
Add Async response object to predictionRequest

Metrics averaging before inserting to model.

Exponential smoothing models for metrics which are collected with high frequency (e.g. heap used metric) tend to work better for buckets. Therefore it is better to average N values and then use it as input to the model. Another solution could be use simple moving average to and use it as input for double exponential smoothing.

Rework ModelOptimizer

ModelOptimizer should be accessible from directly from TimeSeriesModel (builder).

Listen on Inventory events

Listen on inventory events like collection Interval changes, metric creation/deletion.

Remove bindings to Hawkular metrics

Catch exception from new Thread in RestPredictions

Swagger for REST API documentation

Rename bus-integration module to metrics integration

Metrics integration module should contain metrics listener and sender for predicted metrics to alerts.

There are issues with the calculation of CollectionInterval and timestamp

    /**
     * @param collectionInterval collection interval in seconds
     */
    void setCollectionInterval(Long collectionInterval);

As declared by this interface, the interval of Metric should be in seconds.
However, in subsequent use, the modification of timestamp values based on collectionInterval was not consistent.

@Override
    public List<DataPoint> forecast(int nAhead) {

        List<DataPoint> result = new ArrayList<>(nAhead);

        for (int i = 1; i <= nAhead; i++) {
            PredictionResult predictionResult = calculatePrediction(i, null, null);
            DataPoint predictedPoint = new DataPoint(predictionResult.value,
                    lastTimestamp + i*metricContext.getCollectionInterval()*1000,
                    predictionResult.value + predictionIntervalMultiplier*predictionResult.sdOfResiduals,
                    predictionResult.value - predictionIntervalMultiplier*predictionResult.sdOfResiduals);

            result.add(predictedPoint);
        }

        return result;
    }

In this interface implementation, lastTimestamp+i * metricContext. getCollectionInterval() * 1000 is used to calculate the timestamp of the prediction point, indicating that Timestamp is measured in milliseconds.

However, in the subsequent calculation of the timestamp, the CollectionInterval was directly added without converting it to a timestamp in milliseconds.

@Override
    public DataPoint forecast() {
        PredictionResult predictionResult = calculatePrediction(1, null, null);

        return new DataPoint(predictionResult.value, lastTimestamp + metricContext.getCollectionInterval(),
                predictionResult.value + predictionIntervalMultiplier*predictionResult.sdOfResiduals,
                predictionResult.value - predictionIntervalMultiplier*predictionResult.sdOfResiduals);
    }

I am puzzled by the above phenomenon and hope to help answer it. If there is indeed an inconsistency issue, I am willing to assist with the modification.

Model overfitting

Avoid model overfitting

Get number positivity

I have a rule that throws numbers between -50 and 50 randomly, is there any way to predict the sign (positive or negative) of the next release with at least 90% accuracy based on a historical record?

Try RandomForestRegressor

it's quite new, but it's worth looking whether it has better results than the currently used StreamingLinearRegressionWithSGD
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/RandomForestRegressor.scala

edit: I am not sure if it can be used in the streaming manner

Rename the repo to hawkular-prediction

this incorporates also little bit of refactoring (package names)

Create Dockerfile with standalone server

Persist model metadata

Model weights should be probably persisted on application shutdown or on every update?

At application startup Initialize model with data form C*

Model initialization needs certain number of old values

Model should be successfully initialized only if there is specific number of historic values available (e.g 50 for arima). So prevent model of sending automatic predictions bus if not fully initialized.

Remove duplicate log messages

In the log there are duplicate log messages which should be removed.

This is probably caused by

12:16:49,140 ERROR [stderr] (ServerService Thread Pool -- 59) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/log4j-defaults.properties

Log example:

12:16:49,370 INFO  [org.apache.spark.SecurityManager] (ServerService Thread Pool -- 59) Changing view acls to: pavol
15/09/23 12:16:49 INFO SecurityManager: Changing view acls to: pavol

hawkular-datamining-dist/target/wildfly-9.0.0.Final/bin/standalone.sh --debug 9797
curl -X GET http://localhost:9080/hawkular/datamining

Slim down distribution

At the moment app is deployed and distributed as wildfly server. The project uses only REST API, therefore it makes sense to use something "smaller" for example spring boot or wildfly swarm.

After some time model converge to NaN

Optimalize SGD parameters to achieve better convergence.

hawkular / hawkular-datamining Goto Github PK

hawkular-datamining's Issues

Recommend Projects

Recommend Topics

Recommend Org