ropensci-archive / umapr Goto Github PK

View Code? Open in Web Editor NEW

112.0 15.0 16.0 5.43 MB

:no_entry: ARCHIVED :no_entry: Wraps UMAP Algorithm for Dimension Reduction

License: Other

R 97.89% Shell 2.11%

unconf18 umap reticulate r r-package rstats unconf

umapr's Introduction

umapr

This repository has been archived. The former README is now in README-NOT.md.

umapr's People

Contributors

Stargazers

Watchers

Forkers

nanaakwasiabayieboateng zouter tiphainecmartin angela-li vishalbelsare julianflowers finvis tomkellygenetics epinhoodceo gakkilovemath genomicsprograms sk233 sophial05 starxian xiangrong131

umapr's Issues

add note saying umapr is unconf18 project?

I think it would be helpful to indicate near top of README that umapr is/was an unconf18 project (with all that implies e.g. exploration). Given existence of umap by Tomasz Konopka (which I know you linked to 👍) under current development, it would be good to be clear about this so that others don't perceive you're about to race to CRAN as is, for example.

testing error : TypeError: init() got an unexpected keyword argument 'bandwidth'

after successful install , I still can not get it running , I'm always getting this error while testing.

embedding <- umap(df)
Error in py_call_impl(callable, dots$args, dots$keywords) :
"TypeError: init() got an unexpected keyword argument 'bandwidth'"
Any idea how to resolve this issue?

Update DESCRIPTION file

Add authors, license, title

Write installation instructions

Name the UMAP columns something useful

UMAP1 and UMAP2?

Transfer to personal account?

@seaaan et al 👋 according to our recently created package curation policy this repo should now be transferred to either one of your personal accounts, or the ropensci-archive organization.

If you don't answer within one month, I'll transfer the repository to ropensci-archive, after which you could still email me to transfer the repo to a personal account.

Thank you!

integer conversion thoughts

Some arguments are converted to integers with as.integer if the Python code expects an integer, allowing R users to supply, e.g., 1 instead of 1L. However, if the user supplies, e.g, 1.5, this will be converted to 1L, where it might be more appropriate to throw an error.

R6 class for umap

Create a R6 class for umap

Named arguments instead of `...`

Name all the arguments accepted by UMAP and pass them explicitly instead of using ... to allow for tab completion

Merge together input data frame with the output of `fit_transform`

So you could have the input data frame with two additional columns so the user doesn't need to merge them manually.

However, user will have to supply only numeric columns to the umap function so any categorical labels will be gone. This could possibly be solved by allowing the user to pass in a data frame containing all of their data and then the numeric indices of the columns to subset to. So e.g. you would call umap(iris, 1:4, ...) and then we'd run fit_transform(iris[ , 1:4]) and combine that result with iris not iris[,1:4]. This seems the most useful to me.

umapr seed issue: cannot be used to seed a numpy.random.RandomState instance

Hi,

I am using the r wrapper function of UMAP, with all the requirements satisfied but unfortunately I cannot set the seed into the umapr function.
e.g. umapr::umap(iris[,1:4], n_neighbors = 5, random_state = 4), I always ran into the following error

Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: 4.0 cannot be used to seed a numpy.random.RandomState instance

However, if I remove the random_state parameter, its working fine but not reproducible, could you please suggest what could be the reason behind it and any help in solving the error?

Thank you,
Himanshu

n_component ==1

Hi, I am trying to use the package to create a UMAP-based unicategorical analysis (similar to ONEsense), but when I change:

umap.defaults$n_component == 1,

i get an error after the algorithm tries to create the initial embedding:

(Error in result[, 1:d, drop = FALSE] : incorrect number of dimensions)

I think that for the Python version this works fine. Am I doing something wrong? Can you help me out?

Thx

Make pkgdown site

Could not find function "throw" when running umap()

Code from vignette:

rm(list = ls(all = TRUE))
# devtools::install_github("ropenscilabs/umapr", force = TRUE)
library(umapr)
library(tidyverse)
df <- as.matrix(iris[ , 1:4])
embedding <- umap(df)

Error in throw(e) : could not find function "throw"

sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.3.0    stringr_1.3.1    dplyr_0.7.6      purrr_0.2.5      readr_1.1.1      tidyr_0.8.1      tibble_1.4.2     ggplot2_3.0.0   
 [9] tidyverse_1.2.1  umapr_0.0.0.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19     cellranger_1.1.0 pillar_1.3.0     compiler_3.5.1   plyr_1.8.4       bindr_0.1.1      tools_3.5.1      lubridate_1.7.4 
 [9] jsonlite_1.5     nlme_3.1-137     gtable_0.2.0     lattice_0.20-35  pkgconfig_2.0.2  rlang_0.2.2      Matrix_1.2-14    cli_1.0.1       
[17] rstudioapi_0.8   yaml_2.2.0       haven_1.1.2      bindrcpp_0.2.2   withr_2.1.2      xml2_1.2.0       httr_1.3.1       hms_0.4.2       
[25] grid_3.5.1       tidyselect_0.2.4 reticulate_1.10  glue_1.3.0       R6_2.3.0         readxl_1.1.0     modelr_0.1.2     magrittr_1.5    
[33] backports_1.1.2  scales_1.0.0     rvest_0.3.2      assertthat_0.2.0 colorspace_1.3-2 stringi_1.2.4    lazyeval_0.2.1   munsell_0.5.0   
[41] broom_0.5.0      crayon_1.3.4

Implement in C++ to remove need for Python/reticulate

Consider not returning the input object attached to the embeddings

Doing this on large datasets with many variables requires returning a huge object which considerably slows it down and uses more memory.

Could make this optional behavior.

CRAN

Are you planing to put this on CRAN?

Throw meaningful error messages in `.onLoad`

Is python available? Is umap module available?

umap error

library(umapr)
library(tidyverse)

df <- as.matrix(iris[ , 1:4])

embedding <- umap(df)

The following error is generated by my RStudion when I'm running the above code:

Thanks.

Accept data frame in addition to matrix

Coerce data frame to matrix to allow user to provide a data frame instead of a matrix

How to extract n_neighbors for a sample?

Hi,

I want to know if it is possible to get the n neighbors for a sample of interest.

# reproducible data frame
set.seed(100)
df <- matrix(rnorm(1:100), nrow = 5, ncol = 20)
df <- as.data.frame(df)
rownames(df) <- paste0('row_', rownames(df))
colnames(df) <- paste0('col_', colnames(df))

# umap
ump <- umap(d = t(df), n_neighbors = 3, n_components = 2, metric = "correlation", random_state = 123L)
ump <- ump %>% 
  dplyr::select(UMAP1, UMAP2)
ump

             UMAP1      UMAP2
col_V1    1.248055 -3.2721519
col_V2   -2.732246  0.8900975
col_V3    1.382807 -3.7631097
col_V4   -1.719460  0.7712852
col_V5  -10.159772 -1.7583050
col_V6   -1.908897  0.4350810
col_V7  -11.244816 -1.9610249
col_V8    1.015908 -3.0224593
col_V9  -11.866040 -2.0123785
col_V10  -3.039619  1.3534858
col_V11  -8.347816 -1.2767988
col_V12  -8.580935 -0.3583812
col_V13   1.011420 -3.5656300
col_V14 -10.902308 -2.0594218
col_V15  -8.433976 -0.9522216
col_V16  -8.201790 -0.6048836
col_V17  -1.972616  1.2094533
col_V18  -9.591088 -1.5964160
col_V19 -11.605387 -2.2752790
col_V20  -2.681429  1.3110994

How do I extract say 3 closest neighbors to col_V15 for e.g.?

presentation about project

Come up with what to say and any visuals to show for the end of the conference.

Compare timing of umap to tsne, pca

Compare on datasets of a range of sizes (see arxiv paper for examples) using:

Rtsne with PCA first
Rtsne with no PCA first
PCA alone
Others?

https://twitter.com/jimhester_/status/996063591433416704 for possible timing package

This could be written as a vignette or something else

Error in py_call_impl

Hello,

I'm trying to get umapr to work on the iris example, but every time I run it, I get the following error:

library(umapr)
library(tidyverse)
df <- as.matrix(iris[ , 1:4])
embedding <- umap(df)

> Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: __init__() got an unexpected keyword argument 'alpha'

I followed the python install instructions from the umap GitHub and I'm able to import umap just fine in python.

Here's my session info:

R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Yosemite 10.10.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.2.0    stringr_1.2.0    dplyr_0.7.4      purrr_0.2.4      readr_1.1.1      tidyr_0.7.2     
 [7] tibble_1.3.4     ggplot2_2.2.1    tidyverse_1.2.1  umapr_0.0.0.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14     cellranger_1.1.0 plyr_1.8.4       bindr_0.1        tools_3.3.3      lubridate_1.7.1 
 [7] jsonlite_1.5     nlme_3.1-131     gtable_0.2.0     lattice_0.20-35  pkgconfig_2.0.1  rlang_0.1.4     
[13] psych_1.7.8      cli_1.0.0        rstudioapi_0.7   yaml_2.1.14      parallel_3.3.3   haven_1.1.0     
[19] bindrcpp_0.2     xml2_1.1.1       httr_1.3.1       hms_0.3          grid_3.3.3       reticulate_1.4  
[25] glue_1.2.0       R6_2.2.2         readxl_1.0.0     foreign_0.8-69   modelr_0.1.1     reshape2_1.4.3  
[31] magrittr_1.5     scales_0.5.0     rvest_0.3.2      assertthat_0.2.0 mnormt_1.5-5     colorspace_1.3-2
[37] stringi_1.1.5    lazyeval_0.2.1   munsell_0.4.3    broom_0.4.2      crayon_1.3.4

Run `as.integer` on input to n_neighbors argument

Fails if you pass in 2 because it thinks it's a float