hawkular / hawkular-datamining Goto Github PK
View Code? Open in Web Editor NEWReal-time time series prediction library with standalone server
Real-time time series prediction library with standalone server
Exponential smoothing models for metrics which are collected with high frequency (e.g. heap used metric) tend to work better for buckets. Therefore it is better to average N values and then use it as input to the model. Another solution could be use simple moving average to and use it as input for double exponential smoothing.
ModelOptimizer should be accessible from directly from TimeSeriesModel (builder).
Listen on inventory events like collection Interval changes, metric creation/deletion.
Metrics integration module should contain metrics listener and sender for predicted metrics to alerts.
/**
* @param collectionInterval collection interval in seconds
*/
void setCollectionInterval(Long collectionInterval);
As declared by this interface, the interval of Metric should be in seconds.
However, in subsequent use, the modification of timestamp values based on collectionInterval was not consistent.
@Override
public List<DataPoint> forecast(int nAhead) {
List<DataPoint> result = new ArrayList<>(nAhead);
for (int i = 1; i <= nAhead; i++) {
PredictionResult predictionResult = calculatePrediction(i, null, null);
DataPoint predictedPoint = new DataPoint(predictionResult.value,
lastTimestamp + i*metricContext.getCollectionInterval()*1000,
predictionResult.value + predictionIntervalMultiplier*predictionResult.sdOfResiduals,
predictionResult.value - predictionIntervalMultiplier*predictionResult.sdOfResiduals);
result.add(predictedPoint);
}
return result;
}
In this interface implementation, lastTimestamp+i * metricContext. getCollectionInterval() * 1000
is used to calculate the timestamp of the prediction point, indicating that Timestamp is measured in milliseconds.
However, in the subsequent calculation of the timestamp, the CollectionInterval was directly added without converting it to a timestamp in milliseconds.
@Override
public DataPoint forecast() {
PredictionResult predictionResult = calculatePrediction(1, null, null);
return new DataPoint(predictionResult.value, lastTimestamp + metricContext.getCollectionInterval(),
predictionResult.value + predictionIntervalMultiplier*predictionResult.sdOfResiduals,
predictionResult.value - predictionIntervalMultiplier*predictionResult.sdOfResiduals);
}
I am puzzled by the above phenomenon and hope to help answer it. If there is indeed an inconsistency issue, I am willing to assist with the modification.
Avoid model overfitting
I have a rule that throws numbers between -50 and 50 randomly, is there any way to predict the sign (positive or negative) of the next release with at least 90% accuracy based on a historical record?
it's quite new, but it's worth looking whether it has better results than the currently used StreamingLinearRegressionWithSGD
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/RandomForestRegressor.scala
edit: I am not sure if it can be used in the streaming manner
this incorporates also little bit of refactoring (package names)
Model weights should be probably persisted on application shutdown or on every update?
Model should be successfully initialized only if there is specific number of historic values available (e.g 50 for arima). So prevent model of sending automatic predictions bus if not fully initialized.
In the log there are duplicate log messages which should be removed.
This is probably caused by
12:16:49,140 ERROR [stderr] (ServerService Thread Pool -- 59) Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Log example:
12:16:49,370 INFO [org.apache.spark.SecurityManager] (ServerService Thread Pool -- 59) Changing view acls to: pavol
15/09/23 12:16:49 INFO SecurityManager: Changing view acls to: pavol
Currently it s possible to ask for n steps ahead predictions.
API should offer functionality for asking for predictions for any time in the future. Prediction time could be specified as future timestamp or time in seconds.
Add predicted curve of heap usage metric to hawkular web console.
Predicted metrics could be exposed and collected back with prometheus
There are no public methods to get the parameters of an Exponential Smoothing model.
For example, getAlpha, getBeta, getLevel, etc.
This parameter values are only visible using toString() method of the model.
Calculate prediction intervals for forecasted points
Steps to reproduce
hawkular-datamining-dist/target/wildfly-9.0.0.Final/bin/standalone.sh --debug 9797
curl -X GET http://localhost:9080/hawkular/datamining
At the moment app is deployed and distributed as wildfly server. The project uses only REST API, therefore it makes sense to use something "smaller" for example spring boot or wildfly swarm.
Optimalize SGD parameters to achieve better convergence.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.