gdkrmr / dimred Goto Github PK

View Code? Open in Web Editor NEW

72.0 72.0 15.0 58.05 MB

A Framework for Dimensionality Reduction in R

Home Page: https://www.guido-kraemer.com/software/dimred/

License: GNU General Public License v3.0

R 86.63% Shell 1.40% Makefile 0.17% TeX 11.80%

dimensionality-reduction framework high-dimensional-data manifold-learning quality-control r visualization

dimred's People

Contributors

Stargazers

Watchers

Forkers

bgi-jena nblakhani drninjamommy vishalbelsare makspsm gridl topepo csetraynor khughitt ffancheng hsiyjnd vocanic-eruption minghao2016 dmytrosytnyk

dimred's Issues

Huge Sparse Matrix

I have a huge data set with 98% of data missing. I use Sparse matrix and dataset fits easily into memory. As a full data frame it would use 100s of GB. Could you please let embed use Spare Matrix?

spurious error when testing

Sometimes I get the follwing error when testing:

✖ | 14 1     | the dimRedData class
────────────────────────────────────────────────────────────────────────────────
test_dimRedData.R:31: failure: misc functions
nrow(Iris) not equal to 150.
target is NULL, current is numeric
────────────────────────────────────────────────────────────────────────────────

happens when using devtools::test() and R CMD check --run-donttest --run-dontrun --timings

Non-standard parameters not passed through for embed()

The documentation for embed() suggests that additional parameters can be passed via ..., but they seem to be ignored:

library(dimRed)
#> Loading required package: DRR
#> Loading required package: kernlab
#> Loading required package: CVST
#> Loading required package: Matrix
#> 
#> Attaching package: 'dimRed'
#> The following object is masked from 'package:stats':
#> 
#>     embed
#> The following object is masked from 'package:base':
#> 
#>     as.data.frame

sr <- loadDataSet("Swiss Roll", n = 2000, sigma = 0.05)
test <- embed(sr, "Isomap", knn = 50, eps = 1, ndim = 2, get_geod = FALSE)
#> Warning in matchPars(methodObject, list(...)): Parameter matching: eps is not a
#> standard parameter, ignoring.
#> 2020-04-28 17:11:55: Isomap START
#> 2020-04-28 17:11:55: constructing knn graph
#> 2020-04-28 17:11:55: calculating geodesic distances
#> 2020-04-28 17:11:59: Classical Scaling

^{Created on 2020-04-28 by the reprex package (v0.3.0)}

PCA_L1 fails for ndim=1

The PCA_L1 method appears not to work when ndim is set to 1 (... last one for now ;))

To reproduce:

library(dimRed)

## Loading required package: DRR

## Loading required package: kernlab

## Loading required package: CVST

## Loading required package: Matrix

## 
## Attaching package: 'dimRed'

## The following object is masked from 'package:stats':
## 
##     embed

## The following object is masked from 'package:base':
## 
##     as.data.frame

set.seed(1)

embed(matrix(rnorm(1E5), 100), 'PCA_L1', ndim = 1)

## Error in dimnames(rot) <- list(orgnames, newnames): 'dimnames' applied to non-array

System Information:

sessionInfo()

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS: /usr/lib/libblas.so.3.8.0
## LAPACK: /usr/lib/liblapack.so.3.8.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dimRed_0.2.2   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-15 
## [5] kernlab_0.9-27 colorout_1.2-0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0         lattice_0.20-38    digest_0.6.18     
##  [4] grid_3.5.2         magrittr_1.5       evaluate_0.12     
##  [7] stringi_1.2.4      pcaL1_1.5.2        rmarkdown_1.11    
## [10] tools_3.5.2        stringr_1.3.1      xfun_0.4          
## [13] yaml_2.2.0         compiler_3.5.2     BiocManager_1.30.4
## [16] htmltools_0.3.6    knitr_1.21

TODO

Non-Method Features to add:

Methods to add:

(Semi) supervised methods:

Please propose more

`install_github()` fails

I see:

> install_github("gdkrmr/dimRed")
Using GitHub PAT from envvar GITHUB_PAT
Downloading GitHub repo gdkrmr/dimRed@master
from URL https://api.github.com/repos/gdkrmr/dimRed/zipball/master
Installing dimRed
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL  \
  '/private/tmp/Rtmpt5ZdO6/devtools53646e44af92/gdkrmr-dimRed-97564ff'  \
  --library='/Users/hadley/R' --with-keep.source --install-tests --no-multiarch 

* installing *source* package ‘dimRed’ ...
** R
Error in .install_package_code_files(".", instdir) : 
files in 'Collate' field missing from '/private/tmp/Rtmpt5ZdO6/devtools53646e44af92/gdkrmr-dimRed-97564ff/R':
  get_info.R
ERROR: unable to collate and parse R files for package ‘dimRed’
* removing ‘/Users/hadley/R/dimRed’
* restoring previous ‘/Users/hadley/R/dimRed’

HHLE fails for ndim=1

The HHLE method appears not to work when ndim is set to 1.

To reproduce:

library(dimRed)

## Loading required package: DRR

## Loading required package: kernlab

## Loading required package: CVST

## Loading required package: Matrix

## 
## Attaching package: 'dimRed'

## The following object is masked from 'package:stats':
## 
##     embed

## The following object is masked from 'package:base':
## 
##     as.data.frame

set.seed(1)

embed(matrix(rnorm(1E5), 100), 'HLLE', ndim = 1, knn = 10)

## 2019-01-26 23:42:42: Finding nearest neighbors

## 2019-01-26 23:42:42: Calculating Hessian

## 1/100

## Error in combn(seq_len(pars$ndim), 2): n < m

System Information:

sessionInfo()

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS: /usr/lib/libblas.so.3.8.0
## LAPACK: /usr/lib/liblapack.so.3.8.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dimRed_0.2.2   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-15 
## [5] kernlab_0.9-27 nvimcom_0.9-75 colorout_1.2-0
## 
## loaded via a namespace (and not attached):
##  [1] RANN_2.6.1         Rcpp_1.0.0         lattice_0.20-38   
##  [4] digest_0.6.18      RSpectra_0.13-1    grid_3.5.2        
##  [7] magrittr_1.5       evaluate_0.12      stringi_1.2.4     
## [10] rmarkdown_1.11     tools_3.5.2        stringr_1.3.1     
## [13] xfun_0.4           yaml_2.2.0         compiler_3.5.2    
## [16] BiocManager_1.30.4 htmltools_0.3.6    knitr_1.21

standardize function names

function names are a mess currently, I will standardize them at some point with a deprecation period.

Autoencoder TODOs

TODOs for the Autoencoder:

write tests for the other activation function types
better stopping criteria
more than one training step, continue training a dimRedResult object
handle the installation of tensorflow and keras (this should be handled by the user -> IGNORE)

could not find function "dimRedMethodList"

I'm creating a package that imports dimRed and I'm getting an error ("Error in eval(expr, envir, enclos) : could not find function "dimRedMethodList") when invoking this code:

#' @importFrom dimRed Isomap dimRedData embed

foo <- function(x, training, ...) {
  imap <- embed(dimRedData(training)), 
                "Isomap", knn = x$options$knn, 
                ndim = x$num, .mute = x$options$.mute)
}

I know that this isn't reproducible but it otherwise works when using Isomap directly instead of embed(,"Isomap"). I've tried importing dimRedMethodList too but had the same error. Loading the package prior to invoking this function also works.

LandMark ISOMAP

Hi, the ISOMAP function run fast. But Is there any method to automatically select landmark points in your ISOMAP? Many thanks.

Inverse function for tsne and umap

Hi,
I am trying to find the reconstruction error for umap and tsne. I get this error.

ir <- loadDataSet("3D S Curve")
ir.umap <- embed(ir, "UMAP", ndim = ndims(ir))
ir.tsne <- embed(ir, "tSNE", ndim = ndims(ir))
rmse <- data.frame(
rmse_umap = reconstruction_error(ir.umap),
rmse_tsne = reconstruction_error(ir.tsne)
)
matplot(rmse, type = "l")
plot(ir)
plot(ir.umap)
plot(ir.tsne)

This gives me an error:
Error in .local(object, ...): object does not have an inverse function
Traceback:

data.frame(rmse_umap = reconstruction_error(ir.umap), rmse_tsne = reconstruction_error(ir.tsne))
reconstruction_error(ir.umap)
reconstruction_error(ir.umap)
.local(object, ...)
getData(inverse(object, getData(getDimRedData(object))[, seq_len(n[i]),
. drop = FALSE]))
inverse(object, getData(getDimRedData(object))[, seq_len(n[i]),
. drop = FALSE])
inverse(object, getData(getDimRedData(object))[, seq_len(n[i]),
. drop = FALSE])
.local(object, ...)
stop("object does not have an inverse function")

Please let me know where am I going wrong/ how to fix this issue. Thanks!

NMF fails again

no applicable method for 'predict' applied to an object of class "c('NMFfit', 'NMF')"
seems like an import problem again.

✖ |  8 1     | NNMF [2.0 s]
────────────────────────────────────────────────────────────────────────────────
test_NNMF.R:68: error: other arguments
no applicable method for 'predict' applied to an object of class "c('NMFfit', 'NMF')"
1: embed(input_trn, "NNMF", seed = 13, nrun = 10, ndim = 3, method = "KL", options = list(.pbackend = NULL)) at /home/gkraemer/progs/R/dimRed/tests/testthat/test_NNMF.R:68
2: embed(input_trn, "NNMF", seed = 13, nrun = 10, ndim = 3, method = "KL", options = list(.pbackend = NULL))
3: .local(.data, ...)
4: do.call(methodObject@fun, args) at /home/gkraemer/progs/R/dimRed/R/embed.R:135
5: (function (data, pars, keep.org.data = TRUE) 
   {
       chckpkg("NMF")
       chckpkg("MASS")
       meta <- data@meta
       orgdata <- if (keep.org.data) 
           data@data
       else NULL
       data <- data@data
       if (!is.matrix(data)) 
           data <- as.matrix(data)
       data <- t(data)
       if (pars$ndim > nrow(data)) 
           stop("`ndim` should be less than the number of columns.", call. = FALSE)
       if (length(pars$method) != 1) 
           stop("only supply one `method`", call. = FALSE)
       args <- list(x = quote(data), rank = pars$ndim, method = pars$method, nrun = pars$nrun, 
           seed = pars$seed)
       if (length(pars$options) > 0) 
           args <- c(args, pars$options)
       nmf_result <- do.call(NMF::nmf, args)
       w <- NMF::basis(nmf_result)
       h <- t(NMF::coef(nmf_result))
       colnames(w) <- paste0("NNMF", 1:ncol(w))
       other.data <- list(w = w)
       colnames(h) <- paste0("NNMF", 1:ncol(h))
       appl <- function(x) {
           appl.meta <- if (inherits(x, "dimRedData")) 
               x@meta
           else data.frame()
           dat <- if (inherits(x, "dimRedData")) 
               x@data
           else x
           if (!is.matrix(dat)) 
               dat <- as.matrix(dat)
           if (ncol(dat) != nrow(w)) 
               stop("x must have the same number of columns ", "as the original data (", 
                   nrow(w), ")", call. = FALSE)
           res <- dat %*% t(MASS::ginv(w))
           colnames(res) <- paste0("NNMF", 1:ncol(res))
           scores <- new("dimRedData", data = res, meta = appl.meta)
           return(scores)
       }
       inv <- function(x) {
           appl.meta <- if (inherits(x, "dimRedData")) 
               x@meta
           else data.frame()
           proj <- if (inherits(x, "dimRedData")) 
               x@data
           else x
           if (ncol(proj) > ncol(w)) 
               stop("x must have less or equal number of dimensions ", "as the original data")
           res <- tcrossprod(proj, w)
           colnames(res) <- colnames(data)
           res <- new("dimRedData", data = res, meta = appl.meta)
           return(res)
       }
       inv <- function(x) {
           appl.meta <- if (inherits(x, "dimRedData")) 
               x@meta
           else data.frame()
           proj <- if (inherits(x, "dimRedData")) 
               x@data
           else x
           if (ncol(proj) > ncol(data)) 
               stop("x must have less or equal number of dimensions ", "as the original data")
           reproj <- proj %*% other.data$H
           reproj <- new("dimRedData", data = reproj, meta = appl.meta)
           return(reproj)
       }
       res <- new("dimRedResult", data = new("dimRedData", data = h, meta = meta), org.data = orgdata, 
           apply = appl, inverse = inv, has.org.data = keep.org.data, has.apply = TRUE, 
           has.inverse = TRUE, method = "NNMF", pars = pars, other.data = other.data)
       return(res)
   })(data = <S4 object of class structure("dimRedData", package = "dimRed")>, keep.org.data = TRUE, 
       pars = structure(list(ndim = 3, method = "KL", nrun = 10, seed = 13, options = structure(list(
           .pbackend = NULL), .Names = ".pbackend")), .Names = c("ndim", "method", "nrun", 
       "seed", "options")))
6: do.call(NMF::nmf, args) at /home/gkraemer/progs/R/dimRed/R/nnmf.R:93
7: (structure(function (x, rank, method, ...) 
   standardGeneric("nmf"), generic = structure("nmf", package = "NMF"), package = "NMF", group = list(), valueClass = character(0), signature = c("x", 
   "rank", "method"), default = `\001NULL\001`, skeleton = (function (x, rank, method, 
       ...) 
   stop("invalid call in method dispatch to 'nmf' (no default method)", domain = NA))(x, 
       rank, method, ...), class = structure("standardGeneric", package = "methods")))(x = data, 
       rank = 3, method = "KL", nrun = 10, seed = 13, .pbackend = NULL)
8: (structure(function (x, rank, method, ...) 
   standardGeneric("nmf"), generic = structure("nmf", package = "NMF"), package = "NMF", group = list(), valueClass = character(0), signature = c("x", 
   "rank", "method"), default = `\001NULL\001`, skeleton = (function (x, rank, method, 
       ...) 
   stop("invalid call in method dispatch to 'nmf' (no default method)", domain = NA))(x, 
       rank, method, ...), class = structure("standardGeneric", package = "methods")))(x = data, 
       rank = 3, method = "KL", nrun = 10, seed = 13, .pbackend = NULL)
9: nmf(x, rank, method = strategy, ...)
10: nmf(x, rank, method = strategy, ...)
...
15: (function (n, RNGobj) 
   {
       if (verbose) {
           if (verbose > 1) {
               cat("\n## Run: ", n, "/", nrun, "\n", sep = "")
           }
           else {
               cat("", n)
           }
       }
       if (verbose > 2) 
           message("# Setting up loop RNG ... ", appendLF = FALSE)
       setRNG(RNGobj, verbose = verbose > 3)
       if (verbose > 2) 
           message("OK")
       if (n == 1 && .checkRandomness) {
           .RNGinit <- getRNG()
       }
       res <- nmf(x, rank, method, nrun = 1, seed = seed, model = model, .options = .options, 
           ...)
       if (n == 1 && .checkRandomness && rng.equal(.RNGinit)) {
           warning("NMF::nmf - You are running multiple non-random NMF runs with a fixed seed", 
               immediate. = TRUE)
       }
       if (!keep.all) {
           resList <- list(residuals = NA, .callback = NULL)
           err <- residuals(res)
           best <- best.static$residuals
           if (is.na(best) || err < best) {
               if (verbose) {
                   if (verbose > 1L) 
                     cat("## Updating best fit [deviance =", err, "]\n", sep = "")
                   else cat("*")
               }
               best.static$fit <<- res
               best.static$residuals <<- err
               resList$residuals <- err
           }
           best.static$consensus <<- best.static$consensus + connectivity(res, no.attrib = TRUE)
           if (!is.null(.callback)) {
               resList$.callback <- tryCatch(.callback(res, n), error = function(e) e)
           }
           res <- resList
       }
       if (opt.gc && n%%opt.gc == 0) {
           if (verbose > 1) 
               message("# Call garbage collection NOW")
           else if (verbose) 
               cat("%")
           gc(verbose = verbose > 3)
       }
       if (verbose > 1) 
           cat("## DONE\n")
       res
   })(dots[[1L]][[1L]], dots[[2L]][[1L]])
16: connectivity(res, no.attrib = TRUE)
17: connectivity(res, no.attrib = TRUE)
18: .local(object, ...)
19: callNextMethod(object = object, what = "samples")
20: eval(call, callEnv)
21: eval(call, callEnv)
22: .nextMethod(object = object, what = "samples")
23: predict(object, ...)
24: predict(object, ...)
─────────────────────────────────────────

Wish: improved documentation

The written documentation is very scarce (probably a consequence of using Roxygen ...). More detail would be helpful.

Technically, due to the documentation file names, the "see also" links are absolutely horrible; they would be much more helpful in their obvious short version. Can that be changed?

Best, Ulrike

Wish: quality methods for coRanking matrices

It would be great if the quality methods would directly work on coRanking matrices.

Best, Ulrike

NMF issues

On my computer running testthat::test(".") results in an endless loop with 100% CPU utilization, I think NMF is the culprit but cannot pin it down with absolute certainty. The test run just fine on CRAN though.

AUC_lnK_R_NX doesn't do what the documentation states it does

Thanks for providing dimRed.

The documentation regarding AUC_lnK_R_NX is quite misleading, as you also seem to be aware of in your code. You currently use normalized inverse position weights, instead of the claimed logarithmic ones.

Would be good if documentation were adapted to code or vice versa.

Best, Ulrike

Unable to work with dimRed on MNIST dataset

Hi,
I am trying to apply the metrics using the dimRed package on MNIST dataset. I am unable to load the dataset and get an object of dimRedData. Please help. Thanks!

kPCA reproducibility

I'm not able to reproduce the kernel PCA results in comparison to the underlying function. Here's an example:

library(kernlab)
library(dimRed)

set.seed(131)
tr_dat <- matrix(rnorm(100*6), ncol = 6)
te_dat <- matrix(rnorm(20*6), ncol = 6)
colnames(tr_dat) <- paste0("X", 1:6)
colnames(te_dat) <- paste0("X", 1:6)

k_name <- "rbfdot"
k_par <- list(sigma = .2)

## test values

kpca_obj <- kPCA(stdpars = list(ndim = 3, kernel = k_name, kpar = k_par))
kpca_obj <- kpca_obj@fun(dimRedData(tr_dat), kpca_obj@stdpars)
kpca_pred <- kpca_obj@apply(te_dat)@data

## expected values

kpca_obj_exp <- kpca(tr_dat, 
                     kernel = k_name,
                     kpar = k_par)
kpca_pred_exp <- predict(kpca_obj_exp, tr_dat)[, 1:3]
colnames(kpca_pred_exp) <- paste0("kPCA", 1:3)

I get

> head(kpca_pred)
          kPCA1      kPCA2       kPCA3
[1,] -0.1754955 -2.8205993  0.51416167
[2,]  1.1112348  1.7925091 -0.02363246
[3,]  1.9973353 -0.9198911  0.14218226
[4,]  3.0105551  1.4249128 -2.79424169
[5,] -3.2053340 -2.0046749 -0.79662181
[6,]  1.5522026  3.6696689 -2.54760691
> head(kpca_pred_exp)
         kPCA1      kPCA2      kPCA3
[1,]  2.614505  2.9551241  2.1230302
[2,] -1.827209  2.4680460 -2.3203690
[3,]  2.956935 -1.2295952 -2.9909752
[4,] -3.740879 -0.8210545 -4.0988922
[5,] -1.015746 -1.7453619 -0.5225218
[6,] -2.357748  2.1721046 -2.2195350
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dimRed_0.0.3.9001 DRR_0.0.2         CVST_0.2-1        Matrix_1.2-7.1    kernlab_0.9-25   

loaded via a namespace (and not attached):
[1] tools_3.3.2     grid_3.3.2      lattice_0.20-34

BTW what's the best way to access the objects generated in the fun code from the base object? I'd like to get ahold of the PCA rotation matrix or the kPCA object res. That gets computed once on the first call?

Thanks,

Max

Sparse matrix error

I find it misleading trying to use embed() on a sparse matrix and don't get an error. After an investigation, I see a call of as.matrix() on my sparse matrix. I think it's reasonable to throw an error preventing a memory explosion. Even more, the call as.matrix() assumes a user can pass something else and the result can be unexpected. It's dangerous and in some point of view, in most cases, useless.

NNLM got removed from CRAN

It seems that NNLM is unmaintained and got removed from CRAN (@topepo ):

https://cran.r-project.org/web/packages/NNLM/index.html

Questions:

What to do about the immediate problem, I guess I will get an email from the CRAN maintainers soon.
General robustness: Is there a way to make these dependencies optional and still pass CRAN tests?

Changing @stdpars$knn not reflected by UMAP embedding when using "umap-learn"

With the reference UMAP implementation (umap-learn 0.3.9, py27_0, conda-forge) installed, dimRed (0.2.3, R-) appears to use only the default knn as specified in umap@stdpars.

library(dimRed)

dat <- loadDataSet("3D S Curve", n = 300)

## use the S4 Class directly:
umap <- UMAP()

umap@stdpars
# $knn
# [1] 15
# 
# $ndim
# [1] 2
# 
# $d
# [1] "euclidean"
# 
# $method
# [1] "umap-learn"

emb <- umap@fun(dat, umap@stdpars)
plot(emb)

umap@stdpars$knn <- 30
umap@stdpars
# $knn
# [1] 30
# 
# $ndim
# [1] 2
# 
# $d
# [1] "euclidean"
# 
# $method
# [1] "umap-learn"

emb <- umap@fun(dat, umap@stdpars)
plot(emb) # same plot although it should be different because of change in knn

emb2 <- embed(dat, "UMAP", .mute = NULL, knn = 2, method="naive")
plot(emb2, type = "2vars")

emb2 <- embed(dat, "UMAP", .mute = NULL, knn = 200, method="naive")
plot(emb2, type = "2vars") # same here

sessionInfo()
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 18.04 (Bionic Beaver)
# 
# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_AG.UTF-8       
#  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_AG.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_AG.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
# [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_AG.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] dimRed_0.2.3   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-17  kernlab_0.9-27
# 
# loaded via a namespace (and not attached):
#  [1] compiler_3.6.0  magrittr_1.5    tools_3.6.0     yaml_2.2.0      reticulate_1.12 Rcpp_1.0.1     
#  [7] RSpectra_0.14-0 grid_3.6.0      jsonlite_1.6    umap_0.2.2.0    lattice_0.20-38

release schedule

When do you think that you'll do another release? I have a recipes version going to CRAN before the end of the year and I wasn't sure if I could include the NNMF or autoencoder features from dimRed in that version.

Nonnegative Matrix Factorization

Any plans on including this? I might get motivated enough to submit a PR. If so, you you prefer any particular package (NMF or NNLM )?

DiffusionMaps fails for ndim=1

The DiffusionMaps method appears not to work when ndim is set to 1.

To reproduce:

library(dimRed)
set.seed(1)

embed(matrix(rnorm(1E5), 100), 'DiffusionMaps', ndim=1)

## Performing eigendecomposition
## Computing Diffusion Coordinates
## Elapsed time: 0.009 seconds

## Warning in seq_len(ncol(outdata)): first element used of 'length.out'
## argument

## Error in seq_len(ncol(outdata)): argument must be coercible to non-negative integer

System info

sessionInfo()

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS: /usr/lib/libblas.so.3.8.0
## LAPACK: /usr/lib/liblapack.so.3.8.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dimRed_0.2.2   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-15 
## [5] kernlab_0.9-27 nvimcom_0.9-75 colorout_1.2-0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0           lattice_0.20-38      digest_0.6.18       
##  [4] grid_3.5.2           magrittr_1.5         evaluate_0.12       
##  [7] stringi_1.2.4        scatterplot3d_0.3-41 rmarkdown_1.11      
## [10] tools_3.5.2          stringr_1.3.1        igraph_1.2.2        
## [13] xfun_0.4             yaml_2.2.0           compiler_3.5.2      
## [16] pkgconfig_2.0.2      BiocManager_1.30.4   htmltools_0.3.6     
## [19] diffusionMap_1.1-0.1 knitr_1.21

Update Tensorflow to new api

Tensorflow 2.0 has a new api.

rmse_by_ndim

Include something like this and add some parameters to inverse(...)

library(dimRed)

x <- loadDataSet("Iris")
ir.drr <- embed(ir, "DRR", ndim = ndims(x))
ir.pca <- embed(ir, "PCA", ndim = ndims(x))

get_rmse_by_ndim <- function (x, n = ndims(x)) {
  res <- numeric(n)
  org <- getData(getOrgData(x))
  for (i in seq_len(n)) {
    rec <- getData(inverse(x, getData(getDimRedData(x))[, seq_len(i), drop = FALSE]))
    res[i] <- sqrt(mean((org - rec) ^ 2))
  }
  res
}

rmse <- data.frame(
  rmse_drr = get_rmse_by_ndim(ir.drr),
  rmse_pca = get_rmse_by_ndim(ir.pca)
)

matplot(rmse, type = "l")
plot(ir)
plot(ir.drr)
plot(ir.pca)

make embed default to PCA

Current master does not do that!

Vignette Contribution

Dear Guido Kraemer,

Thanks for the package! I am thinking of contributing a vignette that you help users to quickly understand how to use the package (more on the usage side than illustrating different methods). Is it a good idea?

Regards,
Srikanth KS

data depth methods

You might consider adding some of Tukey's data depth methods. R has a few packages that you could wrap including ddalpha (see this paper gives a pretty good description of that).

do over for classes

don't nest dimRedResult class.

prospective projections/predictions

Can you add a functions or classes that will allow the model to be estimated from a data set and then applied to any other data set? This wouldn't work for every method (e.g. MDS) but would be extremely useful.

For example, with PCA:

set.seed(12)
for_mod <- sample(1:nrow(USArrests), 40)

pca_mod <- prcomp(~ Murder + Assault + Rape, data = USArrests[for_mod, ], scale = TRUE)

## now apply the projection onto any data set:
pca_mod_data   <- predict(pca_mod, USArrests[ for_mod, ])
pca_other_data <- predict(pca_mod, USArrests[-for_mod, ])

Thanks

Argument `.keep.org.data` not working

Hi there,

Please see below the repro. The examples are taken from documentation of embed.

library(dimRed)
#> Loading required package: DRR
#> Loading required package: kernlab
#> Loading required package: CVST
#> Loading required package: Matrix
#> 
#> Attaching package: 'dimRed'
#> The following object is masked from 'package:stats':
#> 
#>     embed
#> The following object is masked from 'package:base':
#> 
#>     as.data.frame

as.data.frame(
  embed(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
        iris, "PCA", .keep.org.data = FALSE)
)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': invalid class "dimRedResult" object: invalid object for slot "org.data" in class "dimRedResult": got class "NULL", should be or extend class "matrix"

as.data.frame(embed(iris[, 1:4], "PCA", .keep.org.data = FALSE))
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': invalid class "dimRedResult" object: invalid object for slot "org.data" in class "dimRedResult": got class "NULL", should be or extend class "matrix"

^{Created on 2022-08-28 by the reprex package (v2.0.1)}

gdkrmr / dimred Goto Github PK

dimred's People

Contributors

Stargazers

Watchers

Forkers

dimred's Issues

Recommend Projects

Recommend Topics

Recommend Org