Giter VIP home page Giter VIP logo

rdimtools's Introduction

Rdimtools

CRAN_Status_Badge Travis-CI Build Status

Rdimtools is an R package for dimension reduction (DR) - including feature selection and manifold learning - and intrinsic dimension estimation (IDE) methods. We aim at building one of the most comprehensive toolbox available online, where current version delivers 145 DR algorithms and 17 IDE methods.

The philosophy is simple, the more we have at hands, the better we can play.

Elephant

Our logo characterizes the foundational nature of multivariate data analysis; we may be blind people wrangling the data to see an elephant to grasp an idea of what the data looks like with partial information from each algorithm.

Installation

You can install a release version from CRAN:

install.packages("Rdimtools")

or the development version from github:

## install.packages("devtools")
devtools::install_github("kisungyou/Rdimtools")

Minimal Example : Dimension Reduction

Here is an example of dimension reduction on the famous iris dataset. Principal Component Analysis (do.pca), Laplacian Score (do.lscore), and Diffusion Maps (do.dm) are compared, each from a family of algorithms for linear reduction, feature extraction, and nonlinear reduction.

# load the library
library(Rdimtools)

# load the data
X   = as.matrix(iris[,1:4])
lab = as.factor(iris[,5])

# run 3 algorithms mentioned above
mypca = do.pca(X, ndim=2)
mylap = do.lscore(X, ndim=2)
mydfm = do.dm(X, ndim=2, bandwidth=10)

# visualize
par(mfrow=c(1,3))
plot(mypca$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="PCA")
plot(mylap$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="Laplacian Score")
plot(mydfm$Y, pch=19, col=lab, xlab="axis 1", ylab="axis 2", main="Diffusion Maps")

Minimal Example : Dimension Estimation

Swill Roll is a classic example of 2-dimensional manifold embedded in $\mathbb{R}^3$ and one of 11 famous model-based samples from aux.gensamples() function. Given the ground truth that $d=2$, let’s apply several methods for intrinsic dimension estimation.

# generate sample data
set.seed(100)
roll = aux.gensamples(dname="swiss")

# we will compare 6 methods (out of 17 methods from version 1.0.0)
vecd = rep(0,5)
vecd[1] = est.Ustat(roll)$estdim       # convergence rate of U-statistic on manifold
vecd[2] = est.correlation(roll)$estdim # correlation dimension
vecd[3] = est.made(roll)$estdim        # manifold-adaptive dimension estimation
vecd[4] = est.mle1(roll)$estdim        # MLE with Poisson process
vecd[5] = est.twonn(roll)$estdim       # minimal neighborhood information

# let's visualize
plot(1:5, vecd, type="b", ylim=c(1.5,2.5), 
     main="true dimension is d=2",
     xaxt="n",xlab="",ylab="estimated dimension")
xtick = seq(1,5,by=1)
axis(side=1, at=xtick, labels = FALSE)
text(x=xtick,  par("usr")[3], 
     labels = c("Ustat","correlation","made","mle1","twonn"), pos=1, xpd = TRUE)

We can observe that all 5 methods we tested estimated the intrinsic dimension around $d=2$. It should be noted that the estimated dimension may not be integer-valued due to characteristics of each method.

Acknowledgements

The logo icon is made by Freepik from www.flaticon.com.The rotating Swiss Roll image is taken from Dinoj Surendran’s website.

rdimtools's People

Contributors

kisungyou avatar rcannood avatar zeehio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rdimtools's Issues

polynomial kernel, d=2, c=1

polynomial kernel of Rdimtools and kernlab return different values. Isn't this a bug?

e.g.,

(x = matrix(c(1:6), 2, 3))
 [,1] [,2] [,3]

[1,] 1 3 5
[2,] 2 4 6

library(Rdimtools)
(x_kpoly_Rdim_d2 = aux.kernelcov(x, ktype = c("polynomial", d=2, c=1)))

$K
[,1] [,2]
[1,] 37 46
[2,] 46 58

$Kcenter
[,1] [,2]
[1,] 0.75 -0.75
[2,] -0.75 0.75

library(kernlab)
linear_kern <- polydot(degree = 2, scale = 1, offset=1)
(x_kpoly_d2_kernlab = kernelMatrix(kernel = linear_kern, x=x))

An object of class "kernelMatrix"
[,1] [,2]
[1,] 1296 2025
[2,] 2025 3249

Memory issue with do.lmds

Hi,

Thank for you very interresting package. Using it, I encoutered a memory issue executing this code :

library(magrittr)
library(dplyr)
library(Rdimtools)

n <- 1000000

test <- data.frame(cle = c(paste0("N", runif(n, 1, 400) %>% round(0))),
                   deb = runif(n, 1, 5) %>% round(0),
                   fin = runif(n, 6, 12) %>% round(0),
                   stringsAsFactors = F)

test2 <- do.lmds(test[, c("deb", "fin")], npoints=round(nrow(test[, c("deb", "fin")])/100))

Error message upon trying to install Rdimtools package

Hi, I have been having trouble installing the Rdimtools package. These error messages have been displayed after each step. Is there a simple solution that I am overlooking?

1,
devtools::install_github("kisungyou/Rdimtools")

ERROR: package installation failed
Error: Failed to install 'Rdimtools' from GitHub:
System command 'R' failed, exit status: 1, stdout & stderr were printed
In addition: Warning messages:
1: In i.p(...) : installation of package ‘Rmpfr’ had non-zero exit status
2: In i.p(...) : installation of package ‘CVXR’ had non-zero exit status

2,
install.packages("Rmpfr")

configure: error: Header file mpfr.h not found; maybe use --with-mpfr-include=INCLUDE_PATH
ERROR: configuration failed for package ‘Rmpfr’

  • removing ‘/home/people/krw19kxd/R/x86_64-pc-linux-gnu-library/4.1/Rmpfr’
    Warning in install.packages :
    installation of package ‘Rmpfr’ had non-zero exit status

3,
install.packages("CVXR")

Warning in install.packages :
installation of package ‘Rmpfr’ had non-zero exit status
ERROR: dependency ‘Rmpfr’ is not available for package ‘CVXR’

  • removing ‘/home/people/krw19kxd/R/x86_64-pc-linux-gnu-library/4.1/CVXR’
    Warning in install.packages :
    installation of package ‘CVXR’ had non-zero exit status

4,
install.packages("mpfr.h")

install.packages("mpfr.h")
Installing package into ‘/home/people/krw19kxd/R/x86_64-pc-linux-gnu-library/4.1’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘mpfr.h’ is not available for this version of R

do.olda stopped working

library("Rdimtools")
#> ** ------------------------------------------------------- **
#> ** Rdimtools
#> **  - Dimension Reduction and Estimation Toolbox
#> **
#> ** Version    : 1.0.8       (2021)
#> ** Maintainer : Kisung You  ([email protected])
#> ** Website    : https://kisungyou.com/Rdimtools/
#> **
#> ** Please share any bugs or suggestions to the maintainer.
#> ** ------------------------------------------------------- **

# use iris data
data(iris)
set.seed(100)
subid = sample(1:150, 50)
X     = as.matrix(iris[subid,1:4])
label = as.factor(iris[subid,5])

## compare with LDA
out1 = do.lda(X, label)
out2 = do.olda(X, label)
#> Error: $ operator is invalid for atomic vectors

Created on 2021-07-14 by the reprex package (v2.0.0)

Can't print full pca

I can't return the last dim of the pca, p-1 seems to work.

  m <- as.matrix(mtcars)
  Rdimtools::do.pca(m, ndim = ncol(m))
#> Error in dt_pca(X, myndim, myprep, mycor): * do.pca : 'ndim' should be in [1,ncol(X)).

Created on 2021-02-24 by the reprex package (v0.3.0)

do.olpp is not returning an orthogonal basis.

library(Rdimtools)
#> ** ------------------------------------------------------- **
#> ** Rdimtools
#> **  - Dimension Reduction and Estimation Toolbox
#> **
#> ** Version    : 1.0.8       (2021)
#> ** Maintainer : Kisung You  ([email protected])
#> ** Website    : https://kisungyou.com/Rdimtools/
#> **
#> ** Please share any bugs or suggestions to the maintainer.
#> ** ------------------------------------------------------- **
#?Rdimtools::do.olpp

## use iris data
data(iris)
set.seed(100)
subid = sample(1:150, 50)
X     = as.matrix(iris[subid,1:4])
label = as.factor(iris[subid,5])

##  connecting 10% and 25% of data for graph construction each.
output1 <- do.olpp(X,ndim=2,type=c("proportion",0.10))
output2 <- do.olpp(X,ndim=2,type=c("proportion",0.25))

## Is basis orthogonal?
bas1 <- output1$projection
bas2 <- output2$projection
t(bas1) %*% bas1
#>           [,1]      [,2]
#> [1,] 1.0000000 0.9667698
#> [2,] 0.9667698 1.0000000
t(bas2) %*% bas2
#>          [,1]     [,2]
#> [1,] 1.000000 0.937701
#> [2,] 0.937701 1.000000
print("both are normal, but neither is orthogonal.")
#> [1] "both are normal, but neither is orthogonal."

Created on 2021-06-28 by the reprex package (v2.0.0)

d should be allowed to equal p.

do.pca() still doesn't allow d to be equal to p which is fully valid. It currently silently drops a dimension.

dat <- mtcars[, 1:2]
Rdimtools::do.pca(X = as.matrix(dat), ndim = 2)
#> $Y
#>             [,1]
#>  [1,] -18.831637
#>  [2,] -18.831637
#>  [3,] -21.074684
#>  [4,] -19.218921
#>  [5,] -16.104483
#>  [6,] -16.023824
#>  [7,] -11.844353
#>  [8,] -22.623822
#>  [9,] -21.074684
#> [10,] -17.088856
#> [11,] -15.733360
#> [12,] -13.877597
#> [13,] -14.748987
#> [14,] -12.715744
#> [15,]  -8.068329
#> [16,]  -8.068329
#> [17,] -12.231638
#> [18,] -30.369513
#> [19,] -28.433090
#> [20,] -31.821830
#> [21,] -19.816009
#> [22,] -13.006207
#> [23,] -12.715744
#> [24,] -10.876142
#> [25,] -16.588589
#> [26,] -25.431635
#> [27,] -24.172960
#> [28,] -28.433090
#> [29,] -13.296670
#> [30,] -17.572962
#> [31,] -12.522101
#> [32,] -19.719188
#> 
#> $vars
#> [1] 38.69375
#> 
#> $projection
#>            [,1]
#> [1,] -0.9682113
#> [2,]  0.2501336
#> 
#> $algorithm
#> [1] "linear:PCA"
#> 
#> attr(,"class")
#> [1] "Rdimtools"

Created on 2023-03-20 with reprex v2.0.2

do.olpp() erroring

library(Rdimtools)
#> Warning: package 'Rdimtools' was built under R version 4.0.5
#> ** ------------------------------------------------------- **
#> ** Rdimtools
#> **  - Dimension Reduction and Estimation Toolbox
#> **
#> ** Version    : 1.0.4       (2021)
#> ** Maintainer : Kisung You  ([email protected])
#> ** Website    : kyoustat.com/Rdimtools
#> **
#> ** Please share any bugs or suggestions to the maintainer.
#> ** ------------------------------------------------------- **

## Not run: 
## use iris data
data(iris)
set.seed(100)
subid = sample(1:150, 50)
X     = as.matrix(iris[subid,1:4])
label = as.factor(iris[subid,5])

##  connecting 10% and 25% of data for graph construction each.
output1 <- do.olpp(X,ndim=2,type=c("proportion",0.10))
#> Error in dt_pca(X, myndim, myprep, mycor): * do.pca : 'ndim' should be in [1,ncol(X)).
output2 <- do.olpp(X,ndim=2,type=c("proportion",0.25))
#> Error in dt_pca(X, myndim, myprep, mycor): * do.pca : 'ndim' should be in [1,ncol(X)).

ncol(X)
#> [1] 4

Created on 2021-04-10 by the reprex package (v1.0.0)

do.iltsa() returns weird outlier values

This is an excellent library that has helped me a ton, thank you for writing it!

I'm having an issue with do.iltsa() that I've spent a couple of days on and haven't been able to figure out. It keeps returning extreme outlier values after reduction even when the input data is just a swiss roll.

Here's a reproducible example:

library(Rdimtools)
library(ggplot2)

swissroll <- aux.gensamples(
  n = 500,
  dname = "swiss"
) 

swiss.iltsa <- do.iltsa(
  swissroll,
  ndim = 2
)

swiss.iltsa.toplot <- as.data.frame(swiss.iltsa$Y)

ggplot(swiss.iltsa.toplot, aes(V1, V2)) + geom_point()

Which gives a plot similar to this:
image

I'm expecting something much closer to the results in the LTSA paper. Am I doing something wrong? I'm on a fresh install of Rstudio with R version 4.0.2

Wish: preprocess type "scale"

Currently, Rdimtools offers centering, decorrelating and whitening, and there is no built-in way to scale the data to a common variance (1) without decorrelating them. This is what I would often like to do, therefore a type="scale" possibility would be very much appreciated!

Best, Ulrike

question about selecting landmark points by MaxMin

Hello, I'm kind of confused by the landmark points selction.
It seems the strategy is not exactly as the original MaxMin? It's kind of like the "MinSum" strategy? (testdists += pD(testpt,targetpt);) Or could you explain why it's equivalent to MaxMin?

Thanks a lot!


int aux_landmarkMaxMin(arma::mat& pD, arma::vec& plandmark, arma::vec& seqnp){
  // 4-1. basic setting
  const int nlandmark = plandmark.n_elem;
  const int ntestpts  = seqnp.n_elem;

  // 4-2. we should be careful ; -1 for both vectors
  vec veclandmark = plandmark - 1;
  vec vecseqnp    = seqnp - 1;

  // 4-3. main iteration
  int currentidx      = 0;
  double currentdists = 123456789;
  for (int i=0;i<ntestpts;i++){
    int testpt       = vecseqnp(i);
    double testdists = 0;
    for (int j=0;j<nlandmark;j++){
      int targetpt = veclandmark(j);
      testdists += pD(testpt,targetpt);
    }
    if (testdists<currentdists){
      currentidx   = testpt;
      currentdists = testdists;
    }
  }
  currentidx += 1;

  // 4-4. return output
  return(currentidx);
}

Unable to install

Hi I am running into some dependencies issues which are not getting resolved needed for this package. Kindly assist me. Thanks
When I try to separately install dependencies it does not work.

The downloaded source packages are in
	'/tmp/RtmpRvEiPi/downloaded_packages'
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning messages:
1: In install.packages("Rdimtools", dependencies = TRUE) :
  installation of package 'gmp' had non-zero exit status
2: In install.packages("Rdimtools", dependencies = TRUE) :
  installation of package 'Rmpfr' had non-zero exit status
3: In install.packages("Rdimtools", dependencies = TRUE) :
  installation of package 'CVXR' had non-zero exit status
4: In install.packages("Rdimtools", dependencies = TRUE) :
  installation of package 'Rdimtools' had non-zero exit status

Issue with installation

Hi.

I have tried to install the Rdimtools package to my R(version 3.4.3) but there is an error of
" (Error: package or namespace load failed for ‘Rdimtools’:
object ‘admm.rpca’ is not exported by 'namespace:ADMM'"

It seems okay 2 weeks ago. But when I try to run the package again this week, it does not install.

I have tried to use " devtools::install_github("kisungyou/Rdimtools")", but also not working.

Appreciate if you can advise me on this matter.

Thanks

Better description for lbd1 in do.disr

First of all, a big thank you for this amazing package!

I have been researching Diversity-Induced Self-Representation (DISR) FS and I think the description for lbd1 and lbd2 in DISR could be clearer.

It currently says:

#' @param lbd1 nonnegative number to control the degree of self-representation.
#' @param lbd2 nonnegative number to control the degree of feature similarity.

which suggests that the function being minimised is basically lbd1 * [term for self-rep error] + lbd2 * [term for similarity].

But reading the paper (and the Rcpp code to check it matches), it looks like lbd1 controls the regularisation of the regularised linear model (used to estimate self-representation) and lbd2 is the weight of the similarity term in the objective function (used to quantify diversity).

Basically the objective function actually has the form [linear model error + lbd1 * regularisation term] + lbd2 * [term for similarity], so lbd1 doesn't control the degree of self-representation -- that's all in lbd2, which controls the trade-off between self-rep and diversity.

I think a more accurate description would be:

#' @param lbd1 nonnegative number to control the degree of regularisation of the self-representation.
#' @param lbd2 nonnegative number to control the degree of feature diversity. `lbd2=1` gives equal weight to self-representation and diversity.

Default npoints for do.lisomap

Thank you for providing Rdimtools!
The default npoints for do.lisomap causes trouble, if the number of rows is not a multiple of 5. It would be good to include rounding, so that the default is always an integer, e.g. max(round(nrow(X)/5), ndim + 1)

Best, Ulrike

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.