This R package, plda, implements the following Markov chain Monte Carlo (MCMC) algorithms for the latent Dirichlet allocation (LDA) model.
-
Augmented Collapsed Gibbs Sampler (ACGS, Griffiths and Steyvers 2004, George and Doss 2015) algorithm
-
Grouped Gibbs Sampler (GGS, Doss and George 2022) algorithm
-
Partially Collapsed Gibbs Sampler (PCGS, Magnusson et al. 2018) algorithm
All three algorithms are implemented sequentially in this package. Algorithms GGS and PCGS are amenable to parallelization. For parallel implementations of GGS and PCGS, see LDAGroupedGibbsSampler.
For package documentation run
help("plda")
in the R console.
All major functions and datasets are documented and linked to
the package index. Raw data files for each dataset are available in the
data-raw folder. To load raw data see demo/load_raw_data.R
.
Magnusson, M., Jonsson, L., Villani, M., & Broman, D. (2018). Sparse partially collapsed MCMC for parallel inference in topic models. Journal of Computational and Graphical Statistics, 27(2), 449-463.
Doss, H. and George, C. (2022). Theoretical and Empirical Evaluation of a Grouped Gibbs Sampler for Parallel Computation in the Latent Dirichlet Allocation Model. In preparation.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl_1), 5228-5235.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
George, C. P. (2015). Latent Dirichlet Allocation: Hyperparameter Selection and Applications to Electronic Discovery. Ph.D. thesis, University of Florida.