This repository has been archived. The former README is now in README-NOT.md.
ropensci-archive / umapr Goto Github PK
View Code? Open in Web Editor NEW:no_entry: ARCHIVED :no_entry: Wraps UMAP Algorithm for Dimension Reduction
License: Other
:no_entry: ARCHIVED :no_entry: Wraps UMAP Algorithm for Dimension Reduction
License: Other
This repository has been archived. The former README is now in README-NOT.md.
I think it would be helpful to indicate near top of README that umapr is/was an unconf18 project (with all that implies e.g. exploration). Given existence of umap
by Tomasz Konopka (which I know you linked to ๐) under current development, it would be good to be clear about this so that others don't perceive you're about to race to CRAN as is, for example.
after successful install , I still can not get it running , I'm always getting this error while testing.
embedding <- umap(df)
Error in py_call_impl(callable, dots$args, dots$keywords) :
"TypeError: init() got an unexpected keyword argument 'bandwidth'"
Any idea how to resolve this issue?
Add authors, license, title
UMAP1
and UMAP2
?
@seaaan et al ๐ according to our recently created package curation policy this repo should now be transferred to either one of your personal accounts, or the ropensci-archive organization.
If you don't answer within one month, I'll transfer the repository to ropensci-archive, after which you could still email me to transfer the repo to a personal account.
Thank you!
Some arguments are converted to integers with as.integer
if the Python code expects an integer, allowing R users to supply, e.g., 1
instead of 1L
. However, if the user supplies, e.g, 1.5
, this will be converted to 1L
, where it might be more appropriate to throw an error.
Create a R6 class for umap
Name all the arguments accepted by UMAP
and pass them explicitly instead of using ...
to allow for tab completion
So you could have the input data frame with two additional columns so the user doesn't need to merge them manually.
However, user will have to supply only numeric columns to the umap
function so any categorical labels will be gone. This could possibly be solved by allowing the user to pass in a data frame containing all of their data and then the numeric indices of the columns to subset to. So e.g. you would call umap(iris, 1:4, ...)
and then we'd run fit_transform(iris[ , 1:4])
and combine that result with iris
not iris[,1:4]
. This seems the most useful to me.
Hi,
I am using the r wrapper function of UMAP, with all the requirements satisfied but unfortunately I cannot set the seed into the umapr function.
e.g. umapr::umap(iris[,1:4], n_neighbors = 5, random_state = 4), I always ran into the following error
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: 4.0 cannot be used to seed a numpy.random.RandomState instance
However, if I remove the random_state parameter, its working fine but not reproducible, could you please suggest what could be the reason behind it and any help in solving the error?
Thank you,
Himanshu
Hi, I am trying to use the package to create a UMAP-based unicategorical analysis (similar to ONEsense), but when I change:
umap.defaults$n_component == 1,
i get an error after the algorithm tries to create the initial embedding:
(Error in result[, 1:d, drop = FALSE] : incorrect number of dimensions)
I think that for the Python version this works fine. Am I doing something wrong? Can you help me out?
Thx
Ed
Code from vignette:
rm(list = ls(all = TRUE))
# devtools::install_github("ropenscilabs/umapr", force = TRUE)
library(umapr)
library(tidyverse)
df <- as.matrix(iris[ , 1:4])
embedding <- umap(df)
Error in throw(e) : could not find function "throw"
sessionInfo():
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.3.0 stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0
[9] tidyverse_1.2.1 umapr_0.0.0.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 cellranger_1.1.0 pillar_1.3.0 compiler_3.5.1 plyr_1.8.4 bindr_0.1.1 tools_3.5.1 lubridate_1.7.4
[9] jsonlite_1.5 nlme_3.1-137 gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.2 rlang_0.2.2 Matrix_1.2-14 cli_1.0.1
[17] rstudioapi_0.8 yaml_2.2.0 haven_1.1.2 bindrcpp_0.2.2 withr_2.1.2 xml2_1.2.0 httr_1.3.1 hms_0.4.2
[25] grid_3.5.1 tidyselect_0.2.4 reticulate_1.10 glue_1.3.0 R6_2.3.0 readxl_1.1.0 modelr_0.1.2 magrittr_1.5
[33] backports_1.1.2 scales_1.0.0 rvest_0.3.2 assertthat_0.2.0 colorspace_1.3-2 stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0
[41] broom_0.5.0 crayon_1.3.4
Doing this on large datasets with many variables requires returning a huge object which considerably slows it down and uses more memory.
Could make this optional behavior.
Are you planing to put this on CRAN?
Is python available? Is umap
module available?
Coerce data frame to matrix to allow user to provide a data frame instead of a matrix
Hi,
I want to know if it is possible to get the n neighbors for a sample of interest.
# reproducible data frame
set.seed(100)
df <- matrix(rnorm(1:100), nrow = 5, ncol = 20)
df <- as.data.frame(df)
rownames(df) <- paste0('row_', rownames(df))
colnames(df) <- paste0('col_', colnames(df))
# umap
ump <- umap(d = t(df), n_neighbors = 3, n_components = 2, metric = "correlation", random_state = 123L)
ump <- ump %>%
dplyr::select(UMAP1, UMAP2)
ump
UMAP1 UMAP2
col_V1 1.248055 -3.2721519
col_V2 -2.732246 0.8900975
col_V3 1.382807 -3.7631097
col_V4 -1.719460 0.7712852
col_V5 -10.159772 -1.7583050
col_V6 -1.908897 0.4350810
col_V7 -11.244816 -1.9610249
col_V8 1.015908 -3.0224593
col_V9 -11.866040 -2.0123785
col_V10 -3.039619 1.3534858
col_V11 -8.347816 -1.2767988
col_V12 -8.580935 -0.3583812
col_V13 1.011420 -3.5656300
col_V14 -10.902308 -2.0594218
col_V15 -8.433976 -0.9522216
col_V16 -8.201790 -0.6048836
col_V17 -1.972616 1.2094533
col_V18 -9.591088 -1.5964160
col_V19 -11.605387 -2.2752790
col_V20 -2.681429 1.3110994
How do I extract say 3 closest neighbors to col_V15
for e.g.?
Come up with what to say and any visuals to show for the end of the conference.
Compare on datasets of a range of sizes (see arxiv paper for examples) using:
https://twitter.com/jimhester_/status/996063591433416704 for possible timing package
This could be written as a vignette or something else
Hello,
I'm trying to get umapr to work on the iris example, but every time I run it, I get the following error:
library(umapr)
library(tidyverse)
df <- as.matrix(iris[ , 1:4])
embedding <- umap(df)
> Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: __init__() got an unexpected keyword argument 'alpha'
I followed the python install instructions from the umap GitHub and I'm able to import umap
just fine in python.
Here's my session info:
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Yosemite 10.10.4
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.2.0 stringr_1.2.0 dplyr_0.7.4 purrr_0.2.4 readr_1.1.1 tidyr_0.7.2
[7] tibble_1.3.4 ggplot2_2.2.1 tidyverse_1.2.1 umapr_0.0.0.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 cellranger_1.1.0 plyr_1.8.4 bindr_0.1 tools_3.3.3 lubridate_1.7.1
[7] jsonlite_1.5 nlme_3.1-131 gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.4
[13] psych_1.7.8 cli_1.0.0 rstudioapi_0.7 yaml_2.1.14 parallel_3.3.3 haven_1.1.0
[19] bindrcpp_0.2 xml2_1.1.1 httr_1.3.1 hms_0.3 grid_3.3.3 reticulate_1.4
[25] glue_1.2.0 R6_2.2.2 readxl_1.0.0 foreign_0.8-69 modelr_0.1.1 reshape2_1.4.3
[31] magrittr_1.5 scales_0.5.0 rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[37] stringi_1.1.5 lazyeval_0.2.1 munsell_0.4.3 broom_0.4.2 crayon_1.3.4
Fails if you pass in 2 because it thinks it's a float
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.