jonasrieger / ldaprototype Goto Github PK
View Code? Open in Web Editor NEWDetermine a Prototype from a number of runs of Latent Dirichlet Allocation.
License: GNU General Public License v3.0
Determine a Prototype from a number of runs of Latent Dirichlet Allocation.
License: GNU General Public License v3.0
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ldaPrototype_0.3.0
loaded via a namespace (and not attached):
[1] parallelMap_1.5.0 Rcpp_1.0.6 NLP_0.2-1 tosca_0.3-1 pillar_1.6.1 compiler_4.1.0 prettyunits_1.1.1
[8] viridis_0.6.1 tools_4.1.0 progress_1.2.2 dendextend_1.15.1 lubridate_1.7.10 lifecycle_1.0.0 tibble_3.1.2
[15] gtable_0.3.0 checkmate_2.0.0 viridisLite_0.4.0 pkgconfig_2.0.3 rlang_0.4.11 DBI_1.1.1 parallel_4.1.0
[22] gridExtra_2.3 lda_1.4.2 xml2_1.3.2 dplyr_1.0.7 generics_0.1.0 vctrs_0.3.8 fs_1.5.0
[29] hms_1.1.0 grid_4.1.0 tidyselect_1.1.1 glue_1.4.2 data.table_1.14.0 R6_2.5.0 fansi_0.5.0
[36] ggplot2_3.3.4 purrr_0.3.4 magrittr_2.0.1 BBmisc_1.11 backports_1.2.1 scales_1.1.1 ellipsis_0.3.2
[43] assertthat_0.2.1 colorspace_2.0-1 utf8_1.2.1 munsell_0.5.0 slam_0.1-48 tm_0.7-8 crayon_1.4.1
Under the above given setting the following error message appears when e.g. LDARep
is executed:
Error in (function (fun, ..., more.args = list(), simplify = FALSE, use.names = FALSE, : object '.Random.seed' not found.
There is a workaround calling set.seed
before. The function itself should actually take care of this case by calling the following code:
if (!exists(".Random.seed", envir = globalenv())) {
runif(1)
}
oldseed = .Random.seed
seeds = sample(9999999, n)
.Random.seed <<- oldseed
I don't currently know exactly why this isn't working.
In reference to the JOSS Review, a few paper suggestions
mallet
is another package for estimating lda that might be mentioned along with lda
and topicmodels
.The docs
object expects (for technical reasons) that all words occur with frequency 1. If words occur several times, they appear several times each with frequency 1.
In the quanteda
package there are dfm
objects that also allow values greater than 1. If you do your preprocessing in quanteda
and want to use quanteda::dfm2lda
to convert your object into the necessary structure, you need one more step to fulfill the requirements for the docs
object. Just execute the following line:
docs = lapply(docs, function(x) rbind(rep(x[1,], x[2,]), 1))
This replicates words with multiple occurrences and protects you from the error message all(sapply(docs, function(x) all(x[2, ] == 1))) is not TRUE
in LDARep
and similar functions.
Relating to JOSS review here
There is no example usage of how to use the software for an analysis problem. A great place to put this would be in the README, showing basic usage. If you want to cover more ground than one would typically put in a README, a vignette is a good place. But without this, I'm not sure where to start and thus can't check functionality.
I see:
> test_check("ldaPrototype")
── Warning (test_LDABatch.R:146:3): is.LDABatch ────────────────────────────────
Parameter(s) num.iterations are duplicated. Take last one(s).
Killed
Can you please take a look? I'm planning to submit testthat to CRAN in about a month.
{parallelMap} is called when running LDARep
, a core function of the package. But because it is in "Suggests" it isn't installed by default on package install. So, if someone calls install.packages("ldaPrototype")
and doesn't have {parallelMap} already installed, running LDARep
or a function that calls it will result in an error.
Error in loadNamespace(name) : there is no package called ‘parallelMap’
I'd recommend moving parallelMap to Imports.
FWIW, this shouldn't impact the JOSS review IMO. But it would make the package more useful. (My first call to LDAPrototype
resulted in the above error.)
Related to JOSS review here
The community guidelines do not have a clear statement or instructions for contributing. Adding a sentence to the bottom of the README would fix that right up.
Hi. When running revdep checks, ldaPrototype has produced the below error. I haven't looked at the code, so I don't know what N
is but it looks like N-2
< x
and that's unexpected. Maybe you're able to see how this could happen.
...
The following object is masked from 'package:stats':
cutree
>
> test_check("ldaPrototype")
── 1. Error: (unknown) (@test_jaccardTopics.R#8) ──────────────────────────────
wrong sign in 'by' argument
Backtrace:
1. ldaPrototype::jaccardTopics(mtopics, pm.backend = "socket")
2. ldaPrototype:::jaccardTopics.parallel(...)
3. base::lapply(...)
4. ldaPrototype:::FUN(X[[i]], ...)
6. base::seq.default(x, N - 2, max(ncpus, 2))
══ testthat results ═══════════════════════════════════════════════════════════
[ OK: 243 | SKIPPED: 0 | WARNINGS: 2 | FAILED: 1 ]
1. Error: (unknown) (@test_jaccardTopics.R#8)
Error: testthat unit tests failed
Execution halted
Regarding the JOSS review, I'd recommend documenting the default parameters passed to lda.collapsed.gibbs.sampler
in the help files for the functions that call it. This is particularly important for those which don't have defaults or have different defaults in the original package.
It is a stylistic choice but I'd also give some consideration to removing the default for K
. Users rarely change defaults and I think a reason that other packages don't offer a default for K
is a way of signaling that it is something that the user really has to engage with.
Following the code in the ReadMe breaks down at Step 3.1 because sims
isn't an object that has been completed. (by the way, I enjoyed the ReadMe it was nice that you highlighted the aggregate function and then broke down the components)
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ldaPrototype_0.3.0
loaded via a namespace (and not attached):
[1] parallelMap_1.5.0 Rcpp_1.0.6 NLP_0.2-1 tosca_0.3-1 pillar_1.6.1 compiler_4.1.0 prettyunits_1.1.1
[8] viridis_0.6.1 tools_4.1.0 progress_1.2.2 dendextend_1.15.1 lubridate_1.7.10 lifecycle_1.0.0 tibble_3.1.2
[15] gtable_0.3.0 checkmate_2.0.0 viridisLite_0.4.0 pkgconfig_2.0.3 rlang_0.4.11 DBI_1.1.1 parallel_4.1.0
[22] gridExtra_2.3 lda_1.4.2 xml2_1.3.2 dplyr_1.0.7 generics_0.1.0 vctrs_0.3.8 fs_1.5.0
[29] hms_1.1.0 grid_4.1.0 tidyselect_1.1.1 glue_1.4.2 data.table_1.14.0 R6_2.5.0 fansi_0.5.0
[36] ggplot2_3.3.4 purrr_0.3.4 magrittr_2.0.1 BBmisc_1.11 backports_1.2.1 scales_1.1.1 ellipsis_0.3.2
[43] assertthat_0.2.1 colorspace_2.0-1 utf8_1.2.1 munsell_0.5.0 slam_0.1-48 tm_0.7-8 crayon_1.4.1
Under the above given setting the following warning message appears when e.g. LDARep
is executed:
1: In sprintf(...) : one argument not used by format 'Exporting objects to package env on master for mode: %s'
If you are running LDARep
locally, however, the following message should appear:
Exporting objects to package env on master for mode: local
This is a warning resulting from parallelMap::parallelExport
, explicitly from the line
showInfoMessage("Exporting objects to package env on master for mode: %s", mode, collapse(objnames))
.
There is only one conversion specification %s
, but two arguments, which results in the warning.
This will not be fixed, because the development of parallelMap
is retired and this is an unwanted behavior that does not necessarily need to be corrected. Instead, the ldaPrototype
package will replace parallelMap
with the future
package in the long run.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.