Giter VIP home page Giter VIP logo

bigartm's Introduction

BigARTM Logo

The state-of-the-art platform for topic modeling.

Build Status Windows Build Status GitHub license

What is BigARTM?

BigARTM is a tool for topic modeling based on a novel technique called Additive Regularization of Topic Models. This technique effectively builds multi-objective models by adding the weighted sums of regularizers to the optimization criterion. BigARTM is known to combine well very different objectives, including sparsing, smoothing, topics decorrelation and many others. Such combination of regularizers significantly improves several quality measures at once almost without any loss of the perplexity.

References

Related Software Packages

How to Use

Installing

Download binary release or build from source using cmake:

$ mkdir build && cd build
$ cmake ..
$ make install

Command-line interface

Check out documentation for bigartm.

Examples:

  • Basic model (20 topics, outputed to CSV-file, inferred in 10 passes)
bigartm.exe -d docword.kos.txt -v vocab.kos.txt --write-model-readable model.txt
--passes 10 --batch-size 50 --topics 20
  • Basic model with less tokens (filtered extreme values based on token's frequency)
bigartm.exe -d docword.kos.txt -v vocab.kos.txt --dictionary-max-df 50% --dictionary-min-df 2
--passes 10 --batch-size 50 --topics 20 --write-model-readable model.txt
  • Simple regularized model (increase sparsity up to 60-70%)
bigartm.exe -d docword.kos.txt -v vocab.kos.txt --dictionary-max-df 50% --dictionary-min-df 2
--passes 10 --batch-size 50 --topics 20  --write-model-readable model.txt 
--regularizer "0.05 SparsePhi" "0.05 SparseTheta"
  • More advanced regularize model, with 10 sparse objective topics, and 2 smooth background topics
bigartm.exe -d docword.kos.txt -v vocab.kos.txt --dictionary-max-df 50% --dictionary-min-df 2
--passes 10 --batch-size 50 --topics obj:10;background:2 --write-model-readable model.txt
--regularizer "0.05 SparsePhi #obj"
--regularizer "0.05 SparseTheta #obj"
--regularizer "0.25 SmoothPhi #background"
--regularizer "0.25 SmoothTheta #background" 

Interactive Python interface

Check out the documentation for the ARTM Python interface in English and in Russian

Refer to tutorials for details on how to install and start using Python interface.

# A stub
import bigartm

model = bigartm.ARTM(num_topics=15)
batch_vectorizer = bigartm.BatchVectorizer(data_format='bow_uci',
                                           collection_name='kos',
                                           target_folder='kos')
model.fit_offline(batches, passes=5)
print model.phi_

Low-level API

Contributing

Refer to the Developer's Guide.

To report a bug use issue tracker. To ask a question use our mailing list. Feel free to make pull request.

License

BigARTM is released under New BSD License that allowes unlimited redistribution for any purpose (even for commercial use) as long as its copyright notices and the license’s disclaimers of warranty are maintained.

bigartm's People

Contributors

akashin avatar applejohnny avatar arxikv avatar fedyuninv avatar jeanpaulshapo avatar kirillbobyrev avatar mellain avatar nkruglikov avatar ofrei avatar sashafrey avatar vadimantiy avatar vmarkovtsev avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.