Giter VIP home page Giter VIP logo

fdaoutlier's Introduction

fdaoutlier

Outlier Detection Tools for Functional Data Analysis

Codecov test coverage Lifecycle: experimental CRAN status CRAN downloads Licence

`fdaoutlier` is a collection of outlier detection

tools for functional data analysis. Methods implemented include directional outlyingness, MS-plot, total variation depth, and sequential transformations among others.

Installation

You can install the current version of fdaoutliers from CRAN with:

install.packages("fdaoutlier")

or the latest the development version from GitHub with:

devtools::install_github("otsegun/fdaoutlier")

Example

Generate some functional data with magnitude outliers:

library(fdaoutlier)
simdata <- simulation_model1(plot = T, seed = 1)

dim(simdata$data)
#> [1] 100  50

Next apply the msplot of Dai & Genton (2018)

ms <- msplot(simdata$data)

ms$outliers
#> [1]  4  7 17 26 29 55 62 66 76
simdata$true_outliers
#> [1]  4  7 17 55 66

Methods Implemented

  1. MS-Plot (Dai & Genton, 2018)
  2. TVDMSS (Huang & Sun, 2019)
  3. Extremal depth (Narisetty & Nair, 2016)
  4. Extreme rank length depth (Myllymäki et al., 2017; Dai et al., 2020)
  5. Directional quantile (Myllymäki et al., 2017; Dai et al., 2020)
  6. Fast band depth and modified band depth (Sun et al., 2012)
  7. Directional Outlyingness (Dai & Genton, 2019)
  8. Sequential transformation (Dai et al., 2020)

Bugs and Feature Requests

Kindly open an issue using Github issues.

fdaoutlier's People

Contributors

otsegun avatar

Stargazers

Michael L. Creutzinger avatar  avatar Arif Çakır avatar Xiang Liu avatar

Watchers

 avatar  avatar Sameh Abdulah avatar

Forkers

aefdz ibidat

fdaoutlier's Issues

`tvdmss` trips error when no shape outlier is found

Error tripped:

Error in functional_boxplot(dts, depth_values = tvd, emp_factor = emp_factor_tvd, : Argument 'central_region' must be greater than 0 and less than 1.

First if statement is where I believe the error is:

if (length(shape_boxstats$out) != 0) {
  shape_outliers <- which(mss %in% shape_boxstats$out[shape_boxstats$out < mean(mss)])
  dts <- dts[-shape_outliers, ]
  tvd <- tvd[-shape_outliers]
  index <- index[-shape_outliers]
}

The issue is that an MSS value may be outlying positively (above the median), which makes the initial if statement TRUE. But then shape_outliers returns integer (empty). Thus, when indexing dts and tvd with shape_outliers, it gives back an empty data set.

Finally, this causes an issue when calling functional_boxplot because now central_region = 0.

I think this could be resolved easily by adding another condition to the if statement like so:

if (length(shape_boxstats$out) != 0 & shape_boxstats$out < mean(mss)) {
  ...
}

Edit: that solution^ did not work. I think the second part of the if statement needs to be another if statement inside the first.

if (length(shape_boxstats$out) != 0) {
  if (any(shape_boxstats$out < mean(mss))) {
    ...
  }
}

Edit2: sloppy coding by me today... added any for the inner if statement, considering shape_boxstats$out could be a vector of length > 1.

Warning for large p provides Warning message: In qf(0.993, dimension, m - dimension + 1) : NaNs produced

Hi Segun,

First things first, thank you for the great work!

I found that the aproximation of the degrees of freedom for large dimension provides values smaller than dimension+1. Then the quantile of the F distribution cannot be computed. Find attached an R code with a visualization of the problem.

To solve the issue, I modified the function croux_hesbroeck_asymptotic.R to use the asyntotic m value when this happends. I add the following at the end of the function assuming that for large p and n the asyntotic is a good estimator.

if(m >= dimension){
cutoff <- qf(0.993, dimension, m - dimension + 1)
}else{ #for large dimension and m, we assume that the m_asy is a good estimator
m <- m_asy
}

I will create a pull request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.