Giter VIP home page Giter VIP logo

beyondlda's Introduction

BeyondLDA

  1. Numpy-based PLSA (Probabilistic Latent Semantic Analysis), plsa.py.
  2. LDA (Latent Dirichlet Allocation) vs hLDA (Hierarchical Latent Dirichlet Allocation) and vizualization LDA_vs.hLDA.ipynb

PLSA(plsa.py)

All numpy-based implementation in PLSA

This was when I took Text Mining lecture by UIUC MSCS Dept. Prof.ChengXiang Zhai, where we are asked to build the PLSA model in numpy. PLSA learns both document-topic distribution and topic-word distribution by Bayes inference and optimize the log-likelihood by EM algorithm. The code comments in plsa.py includes

  • How to initialized the parameters
  • How to do E-step
  • How to do M-step
  • How to calculate the log-likelihood of the inference

LDA and hLDA(LDA_vs_hLDA.ipynb)

Summary and visual comparison between LDA and hLDA

This was when I took one of the topic mining campus analytic challenges in the U.S, where we're asked to provide sth more than LDA.
The corpus used here is from NASA dataset with some masked token to make the task harder.

Word Cloud


This one starts from using gensim API to lemmatize and add bi-gram token, and then train LDA model and visualize by pyLDAvis and IPython widget for interactive result.

pyLDAvis


For currently there is no popular Python packages with implementation on hLDA. The file hlda_sampler.py is referred to joewand's github. hlda_sampler.py is the Gibbs sampler for hLDA inference, based on the implementation from Mallet having a fixed depth on the nCRP tree. The most distinguished attribute in hLDA is we could have a hierarchial topic tree, with higher branches having more general topics and lower branches having more specialized topics.

hlda

beyondlda's People

Contributors

teddywang0202 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.