Giter VIP home page Giter VIP logo

dimred's People

Contributors

gdkrmr avatar khughitt avatar topepo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dimred's Issues

Huge Sparse Matrix

I have a huge data set with 98% of data missing. I use Sparse matrix and dataset fits easily into memory. As a full data frame it would use 100s of GB. Could you please let embed use Spare Matrix?

spurious error when testing

Sometimes I get the follwing error when testing:

✖ | 14 1     | the dimRedData class
────────────────────────────────────────────────────────────────────────────────
test_dimRedData.R:31: failure: misc functions
nrow(Iris) not equal to 150.
target is NULL, current is numeric
────────────────────────────────────────────────────────────────────────────────

happens when using devtools::test() and R CMD check --run-donttest --run-dontrun --timings

Non-standard parameters not passed through for embed()

The documentation for embed() suggests that additional parameters can be passed via ..., but they seem to be ignored:

library(dimRed)
#> Loading required package: DRR
#> Loading required package: kernlab
#> Loading required package: CVST
#> Loading required package: Matrix
#> 
#> Attaching package: 'dimRed'
#> The following object is masked from 'package:stats':
#> 
#>     embed
#> The following object is masked from 'package:base':
#> 
#>     as.data.frame

sr <- loadDataSet("Swiss Roll", n = 2000, sigma = 0.05)
test <- embed(sr, "Isomap", knn = 50, eps = 1, ndim = 2, get_geod = FALSE)
#> Warning in matchPars(methodObject, list(...)): Parameter matching: eps is not a
#> standard parameter, ignoring.
#> 2020-04-28 17:11:55: Isomap START
#> 2020-04-28 17:11:55: constructing knn graph
#> 2020-04-28 17:11:55: calculating geodesic distances
#> 2020-04-28 17:11:59: Classical Scaling

Created on 2020-04-28 by the reprex package (v0.3.0)

PCA_L1 fails for ndim=1

The PCA_L1 method appears not to work when ndim is set to 1 (... last one for now ;))

To reproduce:

library(dimRed)
## Loading required package: DRR

## Loading required package: kernlab

## Loading required package: CVST

## Loading required package: Matrix

## 
## Attaching package: 'dimRed'

## The following object is masked from 'package:stats':
## 
##     embed

## The following object is masked from 'package:base':
## 
##     as.data.frame
set.seed(1)

embed(matrix(rnorm(1E5), 100), 'PCA_L1', ndim = 1)
## Error in dimnames(rot) <- list(orgnames, newnames): 'dimnames' applied to non-array

System Information:

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS: /usr/lib/libblas.so.3.8.0
## LAPACK: /usr/lib/liblapack.so.3.8.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dimRed_0.2.2   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-15 
## [5] kernlab_0.9-27 colorout_1.2-0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0         lattice_0.20-38    digest_0.6.18     
##  [4] grid_3.5.2         magrittr_1.5       evaluate_0.12     
##  [7] stringi_1.2.4      pcaL1_1.5.2        rmarkdown_1.11    
## [10] tools_3.5.2        stringr_1.3.1      xfun_0.4          
## [13] yaml_2.2.0         compiler_3.5.2     BiocManager_1.30.4
## [16] htmltools_0.3.6    knitr_1.21

TODO

Non-Method Features to add:

  • Rotation Matrix for PCA and ICA
  • mean and std vectors from PCA, ICA and PCA L1
  • Eigenvalues
  • kernel Matrices for kPCA
  • update documentation for .mute in embed
  • add TravisCI script
  • the possibility to use distance or kernel matrices as input.
  • the possibility to use (semi) supervised methods
  • a simple way to register new methods
  • ...

Methods to add:

(Semi) supervised methods:

  • tukey data depth
  • cca
  • opls
  • pls
  • kcca
  • kopls
  • kpls

Please propose more

`install_github()` fails

I see:

> install_github("gdkrmr/dimRed")
Using GitHub PAT from envvar GITHUB_PAT
Downloading GitHub repo gdkrmr/dimRed@master
from URL https://api.github.com/repos/gdkrmr/dimRed/zipball/master
Installing dimRed
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL  \
  '/private/tmp/Rtmpt5ZdO6/devtools53646e44af92/gdkrmr-dimRed-97564ff'  \
  --library='/Users/hadley/R' --with-keep.source --install-tests --no-multiarch 

* installing *source* package ‘dimRed’ ...
** R
Error in .install_package_code_files(".", instdir) : 
files in 'Collate' field missing from '/private/tmp/Rtmpt5ZdO6/devtools53646e44af92/gdkrmr-dimRed-97564ff/R':
  get_info.R
ERROR: unable to collate and parse R files for package ‘dimRed’
* removing ‘/Users/hadley/R/dimRed’
* restoring previous ‘/Users/hadley/R/dimRed’

HHLE fails for ndim=1

The HHLE method appears not to work when ndim is set to 1.

To reproduce:

library(dimRed)
## Loading required package: DRR

## Loading required package: kernlab

## Loading required package: CVST

## Loading required package: Matrix

## 
## Attaching package: 'dimRed'

## The following object is masked from 'package:stats':
## 
##     embed

## The following object is masked from 'package:base':
## 
##     as.data.frame
set.seed(1)

embed(matrix(rnorm(1E5), 100), 'HLLE', ndim = 1, knn = 10)
## 2019-01-26 23:42:42: Finding nearest neighbors

## 2019-01-26 23:42:42: Calculating Hessian

## 1/100

## Error in combn(seq_len(pars$ndim), 2): n < m

System Information:

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS: /usr/lib/libblas.so.3.8.0
## LAPACK: /usr/lib/liblapack.so.3.8.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dimRed_0.2.2   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-15 
## [5] kernlab_0.9-27 nvimcom_0.9-75 colorout_1.2-0
## 
## loaded via a namespace (and not attached):
##  [1] RANN_2.6.1         Rcpp_1.0.0         lattice_0.20-38   
##  [4] digest_0.6.18      RSpectra_0.13-1    grid_3.5.2        
##  [7] magrittr_1.5       evaluate_0.12      stringi_1.2.4     
## [10] rmarkdown_1.11     tools_3.5.2        stringr_1.3.1     
## [13] xfun_0.4           yaml_2.2.0         compiler_3.5.2    
## [16] BiocManager_1.30.4 htmltools_0.3.6    knitr_1.21

standardize function names

function names are a mess currently, I will standardize them at some point with a deprecation period.

Autoencoder TODOs

TODOs for the Autoencoder:

  • write tests for the other activation function types
  • better stopping criteria
  • more than one training step, continue training a dimRedResult object
  • handle the installation of tensorflow and keras (this should be handled by the user -> IGNORE)

could not find function "dimRedMethodList"

I'm creating a package that imports dimRed and I'm getting an error ("Error in eval(expr, envir, enclos) : could not find function "dimRedMethodList") when invoking this code:

#' @importFrom dimRed Isomap dimRedData embed

foo <- function(x, training, ...) {
  imap <- embed(dimRedData(training)), 
                "Isomap", knn = x$options$knn, 
                ndim = x$num, .mute = x$options$.mute)
}

I know that this isn't reproducible but it otherwise works when using Isomap directly instead of embed(,"Isomap"). I've tried importing dimRedMethodList too but had the same error. Loading the package prior to invoking this function also works.

LandMark ISOMAP

Hi, the ISOMAP function run fast. But Is there any method to automatically select landmark points in your ISOMAP? Many thanks.

Inverse function for tsne and umap

Hi,
I am trying to find the reconstruction error for umap and tsne. I get this error.

ir <- loadDataSet("3D S Curve")
ir.umap <- embed(ir, "UMAP", ndim = ndims(ir))
ir.tsne <- embed(ir, "tSNE", ndim = ndims(ir))
rmse <- data.frame(
rmse_umap = reconstruction_error(ir.umap),
rmse_tsne = reconstruction_error(ir.tsne)
)
matplot(rmse, type = "l")
plot(ir)
plot(ir.umap)
plot(ir.tsne)

This gives me an error:
Error in .local(object, ...): object does not have an inverse function
Traceback:

data.frame(rmse_umap = reconstruction_error(ir.umap), rmse_tsne = reconstruction_error(ir.tsne))
reconstruction_error(ir.umap)
reconstruction_error(ir.umap)
.local(object, ...)
getData(inverse(object, getData(getDimRedData(object))[, seq_len(n[i]),
. drop = FALSE]))
inverse(object, getData(getDimRedData(object))[, seq_len(n[i]),
. drop = FALSE])
inverse(object, getData(getDimRedData(object))[, seq_len(n[i]),
. drop = FALSE])
.local(object, ...)
stop("object does not have an inverse function")

Please let me know where am I going wrong/ how to fix this issue. Thanks!

NMF fails again

no applicable method for 'predict' applied to an object of class "c('NMFfit', 'NMF')"
seems like an import problem again.

|  8 1     | NNMF [2.0 s]
────────────────────────────────────────────────────────────────────────────────
test_NNMF.R:68: error: other arguments
no applicable method for 'predict' applied to an object of class "c('NMFfit', 'NMF')"
1: embed(input_trn, "NNMF", seed = 13, nrun = 10, ndim = 3, method = "KL", options = list(.pbackend = NULL)) at /home/gkraemer/progs/R/dimRed/tests/testthat/test_NNMF.R:68
2: embed(input_trn, "NNMF", seed = 13, nrun = 10, ndim = 3, method = "KL", options = list(.pbackend = NULL))
3: .local(.data, ...)
4: do.call(methodObject@fun, args) at /home/gkraemer/progs/R/dimRed/R/embed.R:135
5: (function (data, pars, keep.org.data = TRUE) 
   {
       chckpkg("NMF")
       chckpkg("MASS")
       meta <- data@meta
       orgdata <- if (keep.org.data) 
           data@data
       else NULL
       data <- data@data
       if (!is.matrix(data)) 
           data <- as.matrix(data)
       data <- t(data)
       if (pars$ndim > nrow(data)) 
           stop("`ndim` should be less than the number of columns.", call. = FALSE)
       if (length(pars$method) != 1) 
           stop("only supply one `method`", call. = FALSE)
       args <- list(x = quote(data), rank = pars$ndim, method = pars$method, nrun = pars$nrun, 
           seed = pars$seed)
       if (length(pars$options) > 0) 
           args <- c(args, pars$options)
       nmf_result <- do.call(NMF::nmf, args)
       w <- NMF::basis(nmf_result)
       h <- t(NMF::coef(nmf_result))
       colnames(w) <- paste0("NNMF", 1:ncol(w))
       other.data <- list(w = w)
       colnames(h) <- paste0("NNMF", 1:ncol(h))
       appl <- function(x) {
           appl.meta <- if (inherits(x, "dimRedData")) 
               x@meta
           else data.frame()
           dat <- if (inherits(x, "dimRedData")) 
               x@data
           else x
           if (!is.matrix(dat)) 
               dat <- as.matrix(dat)
           if (ncol(dat) != nrow(w)) 
               stop("x must have the same number of columns ", "as the original data (", 
                   nrow(w), ")", call. = FALSE)
           res <- dat %*% t(MASS::ginv(w))
           colnames(res) <- paste0("NNMF", 1:ncol(res))
           scores <- new("dimRedData", data = res, meta = appl.meta)
           return(scores)
       }
       inv <- function(x) {
           appl.meta <- if (inherits(x, "dimRedData")) 
               x@meta
           else data.frame()
           proj <- if (inherits(x, "dimRedData")) 
               x@data
           else x
           if (ncol(proj) > ncol(w)) 
               stop("x must have less or equal number of dimensions ", "as the original data")
           res <- tcrossprod(proj, w)
           colnames(res) <- colnames(data)
           res <- new("dimRedData", data = res, meta = appl.meta)
           return(res)
       }
       inv <- function(x) {
           appl.meta <- if (inherits(x, "dimRedData")) 
               x@meta
           else data.frame()
           proj <- if (inherits(x, "dimRedData")) 
               x@data
           else x
           if (ncol(proj) > ncol(data)) 
               stop("x must have less or equal number of dimensions ", "as the original data")
           reproj <- proj %*% other.data$H
           reproj <- new("dimRedData", data = reproj, meta = appl.meta)
           return(reproj)
       }
       res <- new("dimRedResult", data = new("dimRedData", data = h, meta = meta), org.data = orgdata, 
           apply = appl, inverse = inv, has.org.data = keep.org.data, has.apply = TRUE, 
           has.inverse = TRUE, method = "NNMF", pars = pars, other.data = other.data)
       return(res)
   })(data = <S4 object of class structure("dimRedData", package = "dimRed")>, keep.org.data = TRUE, 
       pars = structure(list(ndim = 3, method = "KL", nrun = 10, seed = 13, options = structure(list(
           .pbackend = NULL), .Names = ".pbackend")), .Names = c("ndim", "method", "nrun", 
       "seed", "options")))
6: do.call(NMF::nmf, args) at /home/gkraemer/progs/R/dimRed/R/nnmf.R:93
7: (structure(function (x, rank, method, ...) 
   standardGeneric("nmf"), generic = structure("nmf", package = "NMF"), package = "NMF", group = list(), valueClass = character(0), signature = c("x", 
   "rank", "method"), default = `\001NULL\001`, skeleton = (function (x, rank, method, 
       ...) 
   stop("invalid call in method dispatch to 'nmf' (no default method)", domain = NA))(x, 
       rank, method, ...), class = structure("standardGeneric", package = "methods")))(x = data, 
       rank = 3, method = "KL", nrun = 10, seed = 13, .pbackend = NULL)
8: (structure(function (x, rank, method, ...) 
   standardGeneric("nmf"), generic = structure("nmf", package = "NMF"), package = "NMF", group = list(), valueClass = character(0), signature = c("x", 
   "rank", "method"), default = `\001NULL\001`, skeleton = (function (x, rank, method, 
       ...) 
   stop("invalid call in method dispatch to 'nmf' (no default method)", domain = NA))(x, 
       rank, method, ...), class = structure("standardGeneric", package = "methods")))(x = data, 
       rank = 3, method = "KL", nrun = 10, seed = 13, .pbackend = NULL)
9: nmf(x, rank, method = strategy, ...)
10: nmf(x, rank, method = strategy, ...)
...
15: (function (n, RNGobj) 
   {
       if (verbose) {
           if (verbose > 1) {
               cat("\n## Run: ", n, "/", nrun, "\n", sep = "")
           }
           else {
               cat("", n)
           }
       }
       if (verbose > 2) 
           message("# Setting up loop RNG ... ", appendLF = FALSE)
       setRNG(RNGobj, verbose = verbose > 3)
       if (verbose > 2) 
           message("OK")
       if (n == 1 && .checkRandomness) {
           .RNGinit <- getRNG()
       }
       res <- nmf(x, rank, method, nrun = 1, seed = seed, model = model, .options = .options, 
           ...)
       if (n == 1 && .checkRandomness && rng.equal(.RNGinit)) {
           warning("NMF::nmf - You are running multiple non-random NMF runs with a fixed seed", 
               immediate. = TRUE)
       }
       if (!keep.all) {
           resList <- list(residuals = NA, .callback = NULL)
           err <- residuals(res)
           best <- best.static$residuals
           if (is.na(best) || err < best) {
               if (verbose) {
                   if (verbose > 1L) 
                     cat("## Updating best fit [deviance =", err, "]\n", sep = "")
                   else cat("*")
               }
               best.static$fit <<- res
               best.static$residuals <<- err
               resList$residuals <- err
           }
           best.static$consensus <<- best.static$consensus + connectivity(res, no.attrib = TRUE)
           if (!is.null(.callback)) {
               resList$.callback <- tryCatch(.callback(res, n), error = function(e) e)
           }
           res <- resList
       }
       if (opt.gc && n%%opt.gc == 0) {
           if (verbose > 1) 
               message("# Call garbage collection NOW")
           else if (verbose) 
               cat("%")
           gc(verbose = verbose > 3)
       }
       if (verbose > 1) 
           cat("## DONE\n")
       res
   })(dots[[1L]][[1L]], dots[[2L]][[1L]])
16: connectivity(res, no.attrib = TRUE)
17: connectivity(res, no.attrib = TRUE)
18: .local(object, ...)
19: callNextMethod(object = object, what = "samples")
20: eval(call, callEnv)
21: eval(call, callEnv)
22: .nextMethod(object = object, what = "samples")
23: predict(object, ...)
24: predict(object, ...)
─────────────────────────────────────────

Wish: improved documentation

The written documentation is very scarce (probably a consequence of using Roxygen ...). More detail would be helpful.

Technically, due to the documentation file names, the "see also" links are absolutely horrible; they would be much more helpful in their obvious short version. Can that be changed?

Best, Ulrike

NMF issues

On my computer running testthat::test(".") results in an endless loop with 100% CPU utilization, I think NMF is the culprit but cannot pin it down with absolute certainty. The test run just fine on CRAN though.

AUC_lnK_R_NX doesn't do what the documentation states it does

Thanks for providing dimRed.

The documentation regarding AUC_lnK_R_NX is quite misleading, as you also seem to be aware of in your code. You currently use normalized inverse position weights, instead of the claimed logarithmic ones.

Would be good if documentation were adapted to code or vice versa.

Best, Ulrike

kPCA reproducibility

I'm not able to reproduce the kernel PCA results in comparison to the underlying function. Here's an example:

library(kernlab)
library(dimRed)

set.seed(131)
tr_dat <- matrix(rnorm(100*6), ncol = 6)
te_dat <- matrix(rnorm(20*6), ncol = 6)
colnames(tr_dat) <- paste0("X", 1:6)
colnames(te_dat) <- paste0("X", 1:6)

k_name <- "rbfdot"
k_par <- list(sigma = .2)

## test values

kpca_obj <- kPCA(stdpars = list(ndim = 3, kernel = k_name, kpar = k_par))
kpca_obj <- kpca_obj@fun(dimRedData(tr_dat), kpca_obj@stdpars)
kpca_pred <- kpca_obj@apply(te_dat)@data

## expected values

kpca_obj_exp <- kpca(tr_dat, 
                     kernel = k_name,
                     kpar = k_par)
kpca_pred_exp <- predict(kpca_obj_exp, tr_dat)[, 1:3]
colnames(kpca_pred_exp) <- paste0("kPCA", 1:3)

I get

> head(kpca_pred)
          kPCA1      kPCA2       kPCA3
[1,] -0.1754955 -2.8205993  0.51416167
[2,]  1.1112348  1.7925091 -0.02363246
[3,]  1.9973353 -0.9198911  0.14218226
[4,]  3.0105551  1.4249128 -2.79424169
[5,] -3.2053340 -2.0046749 -0.79662181
[6,]  1.5522026  3.6696689 -2.54760691
> head(kpca_pred_exp)
         kPCA1      kPCA2      kPCA3
[1,]  2.614505  2.9551241  2.1230302
[2,] -1.827209  2.4680460 -2.3203690
[3,]  2.956935 -1.2295952 -2.9909752
[4,] -3.740879 -0.8210545 -4.0988922
[5,] -1.015746 -1.7453619 -0.5225218
[6,] -2.357748  2.1721046 -2.2195350
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dimRed_0.0.3.9001 DRR_0.0.2         CVST_0.2-1        Matrix_1.2-7.1    kernlab_0.9-25   

loaded via a namespace (and not attached):
[1] tools_3.3.2     grid_3.3.2      lattice_0.20-34

BTW what's the best way to access the objects generated in the fun code from the base object? I'd like to get ahold of the PCA rotation matrix or the kPCA object res. That gets computed once on the first call?

Thanks,

Max

Sparse matrix error

I find it misleading trying to use embed() on a sparse matrix and don't get an error. After an investigation, I see a call of as.matrix() on my sparse matrix. I think it's reasonable to throw an error preventing a memory explosion. Even more, the call as.matrix() assumes a user can pass something else and the result can be unexpected. It's dangerous and in some point of view, in most cases, useless.

Changing @stdpars$knn not reflected by UMAP embedding when using "umap-learn"

With the reference UMAP implementation (umap-learn 0.3.9, py27_0, conda-forge) installed, dimRed (0.2.3, R-) appears to use only the default knn as specified in umap@stdpars.

library(dimRed)

dat <- loadDataSet("3D S Curve", n = 300)

## use the S4 Class directly:
umap <- UMAP()

umap@stdpars
# $knn
# [1] 15
# 
# $ndim
# [1] 2
# 
# $d
# [1] "euclidean"
# 
# $method
# [1] "umap-learn"

emb <- umap@fun(dat, umap@stdpars)
plot(emb)

umap@stdpars$knn <- 30
umap@stdpars
# $knn
# [1] 30
# 
# $ndim
# [1] 2
# 
# $d
# [1] "euclidean"
# 
# $method
# [1] "umap-learn"

emb <- umap@fun(dat, umap@stdpars)
plot(emb) # same plot although it should be different because of change in knn

emb2 <- embed(dat, "UMAP", .mute = NULL, knn = 2, method="naive")
plot(emb2, type = "2vars")

emb2 <- embed(dat, "UMAP", .mute = NULL, knn = 200, method="naive")
plot(emb2, type = "2vars") # same here

sessionInfo()
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 18.04 (Bionic Beaver)
# 
# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_AG.UTF-8       
#  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_AG.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_AG.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
# [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_AG.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] dimRed_0.2.3   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-17  kernlab_0.9-27
# 
# loaded via a namespace (and not attached):
#  [1] compiler_3.6.0  magrittr_1.5    tools_3.6.0     yaml_2.2.0      reticulate_1.12 Rcpp_1.0.1     
#  [7] RSpectra_0.14-0 grid_3.6.0      jsonlite_1.6    umap_0.2.2.0    lattice_0.20-38

release schedule

When do you think that you'll do another release? I have a recipes version going to CRAN before the end of the year and I wasn't sure if I could include the NNMF or autoencoder features from dimRed in that version.

Nonnegative Matrix Factorization

Any plans on including this? I might get motivated enough to submit a PR. If so, you you prefer any particular package (NMF or NNLM )?

DiffusionMaps fails for ndim=1

The DiffusionMaps method appears not to work when ndim is set to 1.

To reproduce:

library(dimRed)
set.seed(1)

embed(matrix(rnorm(1E5), 100), 'DiffusionMaps', ndim=1)
## Performing eigendecomposition
## Computing Diffusion Coordinates
## Elapsed time: 0.009 seconds

## Warning in seq_len(ncol(outdata)): first element used of 'length.out'
## argument

## Error in seq_len(ncol(outdata)): argument must be coercible to non-negative integer

System info

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS: /usr/lib/libblas.so.3.8.0
## LAPACK: /usr/lib/liblapack.so.3.8.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dimRed_0.2.2   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-15 
## [5] kernlab_0.9-27 nvimcom_0.9-75 colorout_1.2-0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0           lattice_0.20-38      digest_0.6.18       
##  [4] grid_3.5.2           magrittr_1.5         evaluate_0.12       
##  [7] stringi_1.2.4        scatterplot3d_0.3-41 rmarkdown_1.11      
## [10] tools_3.5.2          stringr_1.3.1        igraph_1.2.2        
## [13] xfun_0.4             yaml_2.2.0           compiler_3.5.2      
## [16] pkgconfig_2.0.2      BiocManager_1.30.4   htmltools_0.3.6     
## [19] diffusionMap_1.1-0.1 knitr_1.21

rmse_by_ndim

Include something like this and add some parameters to inverse(...)

library(dimRed)

x <- loadDataSet("Iris")
ir.drr <- embed(ir, "DRR", ndim = ndims(x))
ir.pca <- embed(ir, "PCA", ndim = ndims(x))

get_rmse_by_ndim <- function (x, n = ndims(x)) {
  res <- numeric(n)
  org <- getData(getOrgData(x))
  for (i in seq_len(n)) {
    rec <- getData(inverse(x, getData(getDimRedData(x))[, seq_len(i), drop = FALSE]))
    res[i] <- sqrt(mean((org - rec) ^ 2))
  }
  res
}

rmse <- data.frame(
  rmse_drr = get_rmse_by_ndim(ir.drr),
  rmse_pca = get_rmse_by_ndim(ir.pca)
)

matplot(rmse, type = "l")
plot(ir)
plot(ir.drr)
plot(ir.pca)

Vignette Contribution

Dear Guido Kraemer,

Thanks for the package! I am thinking of contributing a vignette that you help users to quickly understand how to use the package (more on the usage side than illustrating different methods). Is it a good idea?

Regards,
Srikanth KS

data depth methods

You might consider adding some of Tukey's data depth methods. R has a few packages that you could wrap including ddalpha (see this paper gives a pretty good description of that).

prospective projections/predictions

Can you add a functions or classes that will allow the model to be estimated from a data set and then applied to any other data set? This wouldn't work for every method (e.g. MDS) but would be extremely useful.

For example, with PCA:

set.seed(12)
for_mod <- sample(1:nrow(USArrests), 40)

pca_mod <- prcomp(~ Murder + Assault + Rape, data = USArrests[for_mod, ], scale = TRUE)

## now apply the projection onto any data set:
pca_mod_data   <- predict(pca_mod, USArrests[ for_mod, ])
pca_other_data <- predict(pca_mod, USArrests[-for_mod, ])

Thanks

Argument `.keep.org.data` not working

Hi there,

Please see below the repro. The examples are taken from documentation of embed.

library(dimRed)
#> Loading required package: DRR
#> Loading required package: kernlab
#> Loading required package: CVST
#> Loading required package: Matrix
#> 
#> Attaching package: 'dimRed'
#> The following object is masked from 'package:stats':
#> 
#>     embed
#> The following object is masked from 'package:base':
#> 
#>     as.data.frame

as.data.frame(
  embed(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
        iris, "PCA", .keep.org.data = FALSE)
)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': invalid class "dimRedResult" object: invalid object for slot "org.data" in class "dimRedResult": got class "NULL", should be or extend class "matrix"

as.data.frame(embed(iris[, 1:4], "PCA", .keep.org.data = FALSE))
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': invalid class "dimRedResult" object: invalid object for slot "org.data" in class "dimRedResult": got class "NULL", should be or extend class "matrix"

Created on 2022-08-28 by the reprex package (v2.0.1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.