grf-labs / sufrep Goto Github PK
View Code? Open in Web Editor NEWSufficient Representation for Categorical Variables https://arxiv.org/abs/1908.09874v1
License: GNU General Public License v3.0
Sufficient Representation for Categorical Variables https://arxiv.org/abs/1908.09874v1
License: GNU General Public License v3.0
The example has this
train.df <- means_encoder(X = X, G = G)
print(head(train.df))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.5855 0.2239 -1.4361 0.1464 0.014645 -0.21528
# [2,] 0.7095 -1.1562 -0.6293 0.1892 0.177011 0.03227
# [3,] -0.1093 0.4224 0.2435 -0.3276 -0.009544 0.06307
# [4,] -0.4535 -1.3248 1.0584 0.1464 0.014645 -0.21528
# [5,] 0.6059 0.1411 0.8313 0.4915 0.159208 0.17173
# [6,] -1.8180 -0.5360 0.1052 0.5474 -0.056997 -0.16276
But I get (using the seed set on the homepage):
# "Means" encoding
means_encoder <- make_encoder(X = X, G = G, method = "means")
train.df <- means_encoder(X = X, G = G)
print(head(train.df))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.5855288 0.2239254 -1.4361457 0.1036831 -0.1872252 -0.19094847
# [2,] 0.7094660 -1.1562233 -0.6292596 0.1036831 -0.1872252 -0.19094847
# [3,] -0.1093033 0.4224185 0.2435218 0.4277212 0.2087704 0.02461108
# [4,] -0.4534972 -1.3247553 1.0583622 0.1957134 -0.2072658 0.13467584
# [5,] 0.6058875 0.1410843 0.8313488 0.1957134 -0.2072658 0.13467584
# [6,] -1.8179560 -0.5360480 0.1052118 0.1957134 -0.2072658 0.13467584
Hi grf-labs team,
i am having trouble installing sufrep. I keep getting the following error message
devtools::install_github("grf-labs/sufrep")
Downloading GitHub repo grf-labs/sufrep@HEAD
√ checking for file 'C:\Users\hhsie\AppData\Local\Temp\Rtmpii4XUC\remotes31b048262f83\grf-labs-sufrep-317be9e/DESCRIPTION' ...
- preparing 'sufrep': (1.3s)
√ checking DESCRIPTION meta-information ...- checking for LF line-endings in source and make files and shell scripts
- checking for empty or unneeded directories
- building 'sufrep_0.1.0.tar.gz'
Installing package into ‘C:/Users/hhsie/Dropbox/My PC (DESKTOP-A078MCC)/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
- installing source package 'sufrep' ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
** help
No man pages found in package 'sufrep'
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
*** arch - i386
*** arch - x64
ERROR: loading failed for 'i386'- removing 'C:/Users/hhsie/Dropbox/My PC (DESKTOP-A078MCC)/Documents/R/win-library/4.0/sufrep'
Error: Failed to install 'sufrep' from GitHub:
(converted from warning) installation of package ‘C:/Users/hhsie/AppData/Local/Temp/Rtmpii4XUC/file31b046463905/sufrep_0.1.0.tar.gz’ had non-zero exit status
Any suggestions for what I could do?
Thanks a lot for developing all these great tools.
Hans
@erikcs @halflearned
it seems that the make_encoder() function only allow for one categorical variable at a time. As an example, the code below produces error. Could be a nice improvement to allow for multiple categorical variables or just to type in the code to use in this case.
library(sufrep)
set.seed(12345)
n <- 100
p <- 3
X <- matrix(rnorm(n * p), n, p)
G <- data.frame(as.factor(sample(5, size = n, replace = TRUE)), as.factor(sample(5, size = n, replace = TRUE)))
** One-hot encoding **
onehot_encoder <- make_encoder(X = X, G = G, method = "one_hot")
train.df <- onehot_encoder(X = X, G = G)
print(head(train.df))
Package does not work when categorical variable has 2 levels. Of course you want to use the package when there are more than two levels. But I have many categorical variables and thus wrote a function to apply encoding on all the categorical variables. Since some of my categorical variables only have two levels, I get errors. See code below
@erikcs @halflearned
set.seed(12345)
n <- 100
p <- 3
X <- matrix(rnorm(n * p), n, p)
G <- as.factor(sample(2, size = n, replace = TRUE))
onehot_encoder <- make_encoder(X = X, G = G, method = "one_hot")
train.df <- onehot_encoder(X = X, G = G)
print(head(train.df))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.