Comments (4)
Hi, guys. Thank you for your amazing work on large scale LDA.
On the other hand, I think model quality is as important as scalability. So I am very intresting in improving it. It is exciting to know asymmetric Dirichlet prior could help. Would you please to share some experience on this? I will try my best to contribute
from lightlda.
Hi, guys,
I finished to try to add this new feature in PR#22
This PR supports asymmetric alpha in following steps:
- Add two extra tables to Multiverso. One is topic frequency table, a matrix to count each topicsโ frequency. The other one is doc length table, a row to count how many document is with length k.
- Initialize the two extra tables with random initialized documents
- Learn alpha distribution with the two extra table every 5 iterations
- Build alias table for leanred alpha distribution
- Sample topics with learned alpha distribution and alias table. Meanwhile, update countings of topic frequency table if necessary
To use this new feature, please just run with an extra option "-num_alpha_iterations".
Please notice that there are two TODOs. One is Evaluation in asymmetric prior mode, the other is Inference with asymmetric prior.
from lightlda.
Thanks, Jianyi! I will review the code.
from lightlda.
@feiga , I am sorry that I made a mistake when updating topic-frequency-table. I fixed it and commit to PR#22.
from lightlda.
Related Issues (20)
- a word occurs too many times in all docs that int cannot handle.
- how to install it on multi nodes for distributed training? HOT 10
- How can i use the result to train a topic model HOT 3
- Fatal error in PMPI_Test: A process has failed, error stack: HOT 2
- error occur in Nemesis Network Module HOT 3
- How can i get TOP WORDS for each topic
- when I run the infer it will cause a segmentation fault
- sampling throughput: -nan (tokens/thread/sec)
- Is there any python wrappers for LightLDA?
- Distributed running nytimes through mpi HOT 3
- distributed lightLDA HOT 1
- Size Error while running inference
- lightLDA is killed when traning! HOT 1
- Very Big dataset, Bad Alloc caught: failed memory allocation for documents_buffer in DataBlock
- run example success,but no result
- corpus_size_ > memory_block_size when reading file /data/block.0 HOT 1
- terminate called after throwing an instance of 'zmq::error_t'
- data prepare
- undefined reference to `multiverso:: HOT 2
- The topics don't match when every infer.@feiga
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightlda.