Giter VIP home page Giter VIP logo

anomalous-acm's Introduction

Anomalous time-series R Package

It is becoming increasingly common for organizations to collect very large amounts of data over time, and to need to detect unusual or anomalous time series. For example, Yahoo has banks of mail servers that are monitored over time. Many measurements on server performance are collected every hour for each of thousands of servers. A common use-case is to identify servers that are behaving unusually. Methods in this package compute a vector of features on each time series, measuring characteristics of the series. For example, the features may include lag correlation, strength of seasonality, spectral entropy, etc. Then a robust principal component decomposition is used on the features, and various bivariate outlier detection methods are applied to the first two principal components. This enables the most unusual series, based on their feature vectors, to be identified. The bivariate outlier detection methods used are based on highest density regions and alpha-hulls. For demo purposes, this package contains both synthetic and real data from Yahoo.

A cut-down version of this package under a GPL licence is available from http://github.com/robjhyndman/anomalous.

Installation

You can install the package using

# install.packages("devtools")
devtools::install_github("robjhyndman/anomalous-acm")

Simple Example

  z <- ts(matrix(rnorm(3000),ncol=100),freq=4)
  y <- tsmeasures(z)
  biplot.features(y)
  anomaly(y)

License

This package is free and open source software, licensed under ACM.

anomalous-acm's People

Contributors

cbergmeir avatar earowang avatar nlaptev avatar robjhyndman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anomalous-acm's Issues

Demo error plot(dat0([, hdr)

Plot top 3 anomalous time series by bivariate kernel density

plot(dat0[, hdr])
Error in plot(dat0[, hdr]) :
error in evaluating the argument 'x' in selecting a method for function 'plot': Error in [.default(dat0, , hdr) : invalid subscript type 'list'

dat0 looks ok

summary(hdr)
Length Class Mode
index 3 -none- numeric
scores 88 -none- numeric

head(hdr)
$index
[1] 25 22 42

$scores
[,1] [,2]
[1,] -3.41964341 -1.89456155
[2,] 0.87844942 2.12278092
[3,] -2.23081484 -1.13704932
[4,] 0.35507648 -1.51430496
...

The first plots looked ok.

sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] anomalousACM_0.1.0 alphahull_2.0 splancs_2.01-37 sp_1.1-0 spatstat_1.42-0
[6] sgeostat_1.0-25 tripack_1.3-6 ggplot2_1.0.1 ForeCA_0.2.2 ifultools_2.0-1
[11] MASS_7.3-39 splus2R_1.2-0 devtools_1.8.0

loaded via a namespace (and not attached):
[1] pcaPP_1.9-60 Rcpp_0.11.6 sapa_2.0-1 git2r_0.10.1 plyr_1.8.2
[6] ks_1.9.4 bitops_1.0-6 tools_3.2.0 digest_0.6.8 goftest_1.0-2
[11] memoise_0.2.1 gtable_0.1.2 nlme_3.1-120 lattice_0.20-31 mgcv_1.8-6
[16] Matrix_1.2-0 RcppRoll_0.2.2 mvtnorm_1.0-2 proto_0.3-10 httr_0.6.1
[21] stringr_1.0.0 rversions_1.0.0 grid_3.2.0 hdrcde_3.1 rgl_0.95.1247
[26] XML_3.98-1.2 polyclip_1.3-2 reshape2_1.4.1 deldir_0.1-9 magrittr_1.5
[31] tensor_1.5 scales_0.2.4 misc3d_0.8-4 abind_1.4-3 colorspace_1.2-6
[36] KernSmooth_2.23-14 stringi_0.4-1 RCurl_1.95-4.6 munsell_0.4.2

Find outliers with more dimensions than 2

Hi,
Is there a way to detect outliers taking in account more than the first 2 PC? For example PC1 + PC2 + PC3 + PC4 if I realise that I need to use all of them to cover a good enough cumulated variance of my original dataset. Would it be possible to modify your source code to make it work?
Tx

I am getting this error while installation

curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
Error in download.file(url, destfile, method, mode = "wb", ...) :
'curl' call had nonzero exit status
Warning in download.packages(x$name, destdir = dest_dir, repos = x$repos, :
download of package ‘rgl’ failed
Installation failed: subscript out of bounds
'/home/test/anaconda3/lib/R/bin/R' --no-site-file --no-environ --no-save
--no-restore --quiet CMD INSTALL '/tmp/RtmpNgn3Bc/devtools2f8643039054/ks'
--library='/home/test/anaconda3/lib/R/library' --install-tests

ERROR: dependency ‘rgl’ is not available for package ‘ks’

  • removing ‘/home/test/anaconda3/lib/R/library/ks’
    Installation failed: Command failed (1)
    '/home/test/anaconda3/lib/R/bin/R' --no-site-file --no-environ --no-save
    --no-restore --quiet CMD INSTALL
    '/tmp/RtmpNgn3Bc/devtools2f86109209dc/hdrcde'
    --library='/home/test/anaconda3/lib/R/library' --install-tests

ERROR: dependency ‘ks’ is not available for package ‘hdrcde’

  • removing ‘/home/test/anaconda3/lib/R/library/hdrcde’
    Installation failed: Command failed (1)
    '/home/test/anaconda3/lib/R/bin/R' --no-site-file --no-environ --no-save
    --no-restore --quiet CMD INSTALL
    '/tmp/RtmpNgn3Bc/devtools2f867c367d45/robjhyndman-anomalous-acm-79d50aa'
    --library='/home/test/anaconda3/lib/R/library' --install-tests

ERROR: dependencies ‘alphahull’, ‘hdrcde’ are not available for package ‘anomalousACM’

  • removing ‘/home/test/anaconda3/lib/R/library/anomalousACM’
    Installation failed: Command failed (1)

similar error while installing cut down version
My OS is ubuntu 17.04

Calculation of trend wrong for series without seasonality?

In line 184 of tsmeasures.R, you have:

      trend0 <- fitted(mgcv::gam(contx ~ s(tt)))
      remainder <- contx - trend0
      deseason <- contx - trend0
v.adj <- var(trend0, na.rm = TRUE)

It seems to me that it needs to be:
v.adj <- var(remainder, na.rm = TRUE)

instead. This would be in line with the seasonal case and with a slide deck that I read on the topic (http://robjhyndman.com/seminars/big-time-series/). Also, I was getting trends of 1 for series that don't seem to have a trend, and trend of 0 with series that seem to have one.

install instructions

Could you add installation instructions in the readme? Is it devtools::install_github(), etc

Thanks for your work! This package is awesome

KLscore computation error for shorter time series

Hi,

When using the anomalous package for shorter time series (i.e below than 2*frequency), the package gave me an error stating that it cannot compute the KL score "I cannot compute KLscore when the length is too small.". When I debugged the code, I found the relevant constraint in [1], which doesn't accommodate for shorter time series that violates the aforementioned condition. Although ideally, it should log a warning message and continue with the rest of the computation. But according to [2], it causes to stop the program abruptly, without logging the computed values of other features (i.e other than KL Score)

Thanks !

[1] https://github.com/robjhyndman/anomalous/blob/master/R/tsmeasures.R#L204
[2] https://github.com/robjhyndman/anomalous/blob/master/R/tsmeasures.R#L205

package not building in OSX

Hi, any ideas why the installation isn't working?
Below is the output I get. the "plain" anomalous package works BTW.
thanks

> devtools::install_github('robjhyndman/anomalous-acm')
Downloading GitHub repo robjhyndman/anomalous-acm@master
Installing anomalousACM
Skipping 1 packages not available: spatstat
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save --no-restore CMD INSTALL  \
  '/private/var/folders/t3/f4r4rwmn7xl80skj6h35ftnh0000gn/T/RtmpiKihGU/devtools2cb837be8a52/robjhyndman-anomalous-acm-10373a7'  \
  --library='/Library/Frameworks/R.framework/Versions/3.2/Resources/library' --install-tests 

* installing *source* package ‘anomalousACM’ ...
** R
** data
*** moving datasets to lazyload DB
** demo
** inst
** tests
** byte-compile and prepare package for lazy loading
Warning: package ‘ggplot2’ was built under R version 3.2.3
Error : package ‘spatstat’ required by ‘alphahull’ could not be found
ERROR: lazy loading failed for package ‘anomalousACM’
* removing ‘/Library/Frameworks/R.framework/Versions/3.2/Resources/library/anomalousACM’
Error: Command failed (1)

Error in running the example

Hi there

I am trying to run the example stated in README file and got the results:

> z <- ts(matrix(rnorm(3000),ncol=100),freq=4)
> y <- tsmeasures(z)
> biplot.features(y)
Error in pcaPP::PCAproj(naomit.x, k = 2, scale = sd, center = mean) : 
  object 'C_pcaProj_up' not found
> anomaly(y)
Error in pcaPP::PCAproj(naomit.x, k = 2, center = mean, scale = sd) : 
  object 'C_pcaProj_up' not found

Google indeed gave me quite strange results on that (all pointing to fixing Win 10 bugs etc.)

Error when running with constant time series and normalization

Hi,

I noticed that the tsmeasures function will terminate with an error if run on constant time series as follows:

> tsmeasures(cbind(rep(3,50), rep(4,50)))
Error in acf(x, plot = FALSE, na.action = na.exclude) : 'lag.max' must be at least 0 In addition: Warning message: In acf(x, plot = FALSE, na.action = na.exclude) : NAs introduced by coercion

The problem is that in the following line in the tsmeasures function:

if (normalise) {
x <- as.ts(scale(x, center = TRUE, scale = TRUE)) # Normalise data to mean = 0, sd = 1
}

if the series is constant, scale will divide by 0, so that you get a series consisting of NaN. Then, later on, when acf() is executed on this NaN series, it throws the error.

Solution would be probably to check in the normalization if the series is constant, and if it is, not use the scale.

i.e., instead of:
x <- as.ts(scale(x, center = TRUE, scale = TRUE))

you could use something like:

x <- as.ts(scale(x, center = TRUE, scale = !(abs(max(x) - min(x)) < 1e9)))

How to rank all samples in order of anomalous level when using ahull

I want to evaluate the result of ahull in terms of AUC. But it needs to rank all samples in order of anomalous level . How can I achieve it? I try setting the value of parameter n to be the number of samples and ordered=TRUE in function 'anomaly' but an error occured: 'Error in findnextoutlier(scores, nextout$alpha, tmp.idx[1:(i - 1)]) : Too hard. Please reduce the number of outliers requeste'. (There are no errors when I set 'method=hdr' but I want to use ahull). Does anybody help? Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.