Giter VIP home page Giter VIP logo

mvslda's Introduction

mvsLDA

The C implementation for supervised latent Dirichlet allocation with collapsed gibbs sampling estimation. Response variables (teaching signals) are assumed to be generated according to a multivariate normal distribution (diagonal covariance).

[Blei and McAuliffe. 2008] https://papers.nips.cc/paper/3328-supervised-topic-models.pdf

Fixed-point iteration method are used for hyper-parameter update. [Minka 2000] http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/minka-dirichlet.pdf

Build

  • Requirements : The GNU Scientific Library (GSL)

  • Edit the Makefile appropriately and type "make" command.

Usage

mvslda [-I maxiter] [-K n_classes] [-A alpha] [-B beta] [-Y nresp] [-S random_seed] doc resp model

-I (int) : The number of Gibbs iterations.

-K (int) : The number of topics.

-A (float) : The initial settings of an asymmetric Dirichlet prior over the document-topic distributions (alpha parameters). These values are optimized during the Gibbs iterations.

-B (float) : The initial setting of an symmetric Dirichlet prior over the topic-word distributions (a beta parameter). This value is optimized during the Gibbs iterations.

-Y (int) : The number of response variables to use for training.

-S (int) : a seed value for the random number generator (to fix the initial setting of the Gibbs iteration).

doc : An input file for word-counts of documents.

resp : An input file for response variables.

model : a prefix for output model files.

A format for input files

doc

Each line indicates each document.

For each document, counts of each unique word are written as "(WordID):(Count)", where WordID is an index for words in dataset (1-origin, so the first index of words are 1, not zero), and Count is the number of times the word appeared on the document. Multiple values are separated by a space.

E.g.)

12:1 353:1 416:3 636:2 670:1 713:1

38:2 72:2 109:2 265:1

18:1 40:1 98:1 251:1 265:3 411:1 743:1

...

resp

Each line indicates each document. Space-separated responses.

E.g.)

-3.42 1.90

-8.80 3.74

-3.37 -2.07

...

Output files

  • [model].theta : Topic distributions for each document. (The number of documents) x (The number of topics) matrix.

  • [model].phi : Word distributions for each topic. (The size of word vocabulary) x (The number of topics) matrix.

  • [model].eta : Regression coefficients of each topic. (The number of topics) x (The number of response variables) matrix.

  • [model].hyper : Optimization processes for hyper parameters. Each line shows parameters at each step of Gibbs iterations. From left, alpha parameters (for each topic), a beta parameter, and a gamma parameter.

  • [model].lik : Log-likelihood of the model in each step of iterations.

mvslda's People

Watchers

James Cloos avatar Koichi Higashi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.