diogoferrari / hdpglm Goto Github PK
View Code? Open in Web Editor NEWHierarchical Dirichlet Process Generalized Linear Models
Home Page: http://www.diogoferrari.com/hdpGLM/
License: Other
Hierarchical Dirichlet Process Generalized Linear Models
Home Page: http://www.diogoferrari.com/hdpGLM/
License: Other
This may be an issue with R more than the function (i.e. not fixable), but it's worth noting. Code and session info at bottom.
When using plot.hdpGLM on a model with 1000 iterations, the function works great. However, if the number of iterations is increased to 10,000 the function fails unpredictably, likely due to failure to get memory. Occasionally, R catches this and gives the standard error (Cannot allocate vector of size xxx GB
), but often it causes the R session to crash. This occurs even when the term
argument is used to generate only one plot.
Additionally, this occurs even when the model object is the only object created in the R session and after a call to gc()
to free up memory. No additional processes were running on the machine at the time of failure (i.e. only GNOME desktop and background processes).
Being a memory issue, ultimately solving the problem means using a higher memory machine. However, if possible it would be nice for the function to fail before it risks aborting the R session.
Code used and session info:
#model
model_res <- hdpGLM(data = analysis_df,
formula1 = opp_vote_pct ~ mean_dist_ren + turnout + m_opp_vote_pct
+ dm_opp_vote_pct + dp_opp_vote_pct + kremlin_dist,
formula2 = ~ govt_dep + higher_ed + pop_total,
context.id = 'region',
mcmc = list(burn.in = 1000, n.iter = 10000),
K = 30,
family = 'gaussian')
#aborts R session
plot(model_res, separate = T)
#also aborts R session
plot(model_res, terms = 'mean_dist_ren')
#session info
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.10
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.3.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 lme4_1.1-19 Matrix_1.2-14 hdpGLM_1.0.0.0000 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.8
[8] purrr_0.2.5 readr_1.3.1 tidyr_0.8.2 tibble_2.0.0 ggplot2_3.1.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 lubridate_1.7.4 lattice_0.20-35 formula.tools_1.7.1 assertthat_0.2.0 digest_0.6.18
[7] R6_2.3.0 cellranger_1.1.0 plyr_1.8.4 ggridges_0.5.1 backports_1.1.3 acepack_1.4.1
[13] coda_0.19-2 httr_1.4.0 pillar_1.3.1 rlang_0.3.1 lazyeval_0.2.1 readxl_1.2.0
[19] minqa_1.2.4 rstudioapi_0.9.0 data.table_1.11.8 nloptr_1.2.1 rpart_4.1-13 checkmate_1.9.0
[25] labeling_0.3 splines_3.5.1 foreign_0.8-71 htmlwidgets_1.3 munsell_0.5.0 broom_0.5.1
[31] compiler_3.5.1 modelr_0.1.2 xfun_0.4 pkgconfig_2.0.2 base64enc_0.1-3 htmltools_0.3.6
[37] nnet_7.3-12 tidyselect_0.2.5 gridExtra_2.3 htmlTable_1.13.1 Hmisc_4.1-1 crayon_1.3.4
[43] withr_2.1.2 MASS_7.3-50 grid_3.5.1 nlme_3.1-137 jsonlite_1.6 gtable_0.2.0
[49] magrittr_1.5 scales_1.0.0 cli_1.0.1 stringi_1.2.4 latticeExtra_0.6-28 xml2_1.2.0
[55] generics_0.0.2 Formula_1.2-3 RColorBrewer_1.1-2 tools_3.5.1 glue_1.3.0 hms_0.4.2
[61] survival_2.42-6 yaml_2.2.0 colorspace_1.3-2 cluster_2.0.7-1 operator.tools_1.6.3 rvest_0.3.2
[67] knitr_1.21
A nice enhancement would be for users to be able to supply their own labels for context IDs for the plot. Currently, the context.id
argument just pulls the context labels as they are in the data, which aren't always formatted nicely (or in my case, are in a language other than English). Currently this can only be accomplished by recoding the labels in the data and rerunning the model.
Small issue, but the error message dplyr provides here is very unhelpful.
I accidentally saved a dataframe as grouped_df
class. When provided as a data argument to hdpGLM
it returns this error:
Error in filter_impl(.data, quo) : Result must have length 187, not 3298
Given that dplyr::ungroup()
is relatively easy to forget while cleaning data (for me at least), it might be helpful to at least have a clearer/earlier error message. Especially since dplyr throws this error message when attempting to use undeclared variables, which could lead users in the wrong direction.
Model objects (i.e. objects created by a call to hdpGLM()
) that are saved as .rds or .RData no longer seem to work with .hdpGLM methods after being loaded into a new R session.
After loading the saved model object, both plot.hdpGLM
and summary.hdpGLM
throw the following error:
Error in as.data.frame.default(value, stringsAsFactors = FALSE) :
cannot coerce class ‘"mcmc"’ to a data.frame
This occurs despite the fact that the object is correctly read as class hdpGLM.
When following the link, I get Page not found
http://www.diogoferrari.com/hdpGLM/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.