Giter VIP home page Giter VIP logo

whai's Introduction

WHAI

This is the demo code for "WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling"

Introduction

This code implement the Weibull Hybrid Autoencoding Inference (WHAI), from the 2018 ICLR paper with the title "WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling".

To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora.

This source code is made publicly available for reproducibility purposes, it is not optimized for speed, minimally documented but fully functional.

How to use

The folder includes the following files:

joint_main_online.py: The main function to run our model.

model_layer1.py: The module function realizing one-layer model.

model_layer2.py: The module function realizing two-layer model.

model_layer3.py: The module function realizing three-layer model.

PGBN_sampler.py: The module function realizing some sampling in our model.

libCrt_Multi_Sample.so: A sampling function to sample from CRT distribution (written based on C and used only in Linux).

libMulti_Sample.so: A sampling function to sample from Multinomial distribution (written based on C and used only in Linux).

perplexity.m: A function to calculate the perplexity according to Equation (11) in our paper.

You can use our code in Linux system with Theano package. If you have any questions, please contact us.

License

Please note that this code should be used at your own risk. There is no implied guarantee that it will not do anything stupid. Permission is granted to use and modify the code.

Citing WHAI

Please cite our ICLR paper in your publications if it helps your research:

@inproceedings{WHAI_ICLR2018,
  Author = {H. Zhang, B. Chen, D. Guo, and M. Zhou},
  Title = {WHAI: WEIBULL HYBRID AUTOENCODING INFERENCE FOR DEEP TOPIC MODELING},
  booktitle={ICLR},
  Year  = {2018}
}

Data

The Original 20news and MNIST datasets are included in the data folder. If you want to calculate the per-heldout-word perplexity, you may need to partition the date according to the describtion in section 3.1.

RCV1 dataset are provided by Cong et al. (2017)

The WIKI dataset are downloaded using scripts provided in Hoffman et al. (2010).

Contact

Contact Bo Chen [email protected] or Hao Zhang [email protected]

Copyright (c), 2018, Hao Zhang [email protected]

whai's People

Contributors

bochengroup avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.