Comments (3)
The conditions are just to ensure we only compute likelihood ONCE in one iteration.
- Didn't ignore. We only compute the doc likelihood when sampling slice 0 in one iteration, but the computation is over the entire document. Computing doc likelihood only needs to know the doc-topic information. It's unrelated with word-topic-table, we can compute when sampling any slice.
- We compute word likelihood when sampling block = 0. This will only compute word likelihood that was contained in block 0. The result sometimes is a approximation of the correct word likelihood. This is because it's possible that the vocabulary of unique word is block 0 didn't always as same as the whole vocabulary. But it should not differ too much. May only lack of some very low frequency word.
Computing word likelihood only related with word-topic-table, so if we have the parameter, we can compute. - This is only related with the summary row, n_t. The condition here is only make sure we only compute once in one iteration.
- Sorry not clear what you mean by "In workers, all slices in every block may be executed loglikelihood under upper condition setting, and print computing log likelihood."
- The whole likelihood is doc + word + normalized.
The doc likelihood is sum of all document. Note that in every machine, we only sample part of dataset(say 1000 document) to compute. You can compute whole, but it's time-consuming. If you what to get the whole doc likelihood, the sampling result times a coefficient will give a approximation result.
Word likelihood is sum of all words, which may computed in different slices. Just sum the result from one process.
from lightlda.
Thank you very much Feiga
Sorry my english is not well. I mean the slice is the basic unit of corpus in trainning. Each slice would print loglikelihood logs when trained.
When sampling slice 0 in one iteration, will it compute entire documents' doc-likelihood in block? Here we assume there is 1 block in each worker.
Thanks, Lizhe
from lightlda.
@tanglizhe1105 Sorry I must miss your message.
Yes, it compute the entire documents' doc-likelihood. See here
from lightlda.
Related Issues (20)
- a word occurs too many times in all docs that int cannot handle.
- how to install it on multi nodes for distributed training? HOT 10
- How can i use the result to train a topic model HOT 3
- Fatal error in PMPI_Test: A process has failed, error stack: HOT 2
- error occur in Nemesis Network Module HOT 3
- How can i get TOP WORDS for each topic
- when I run the infer it will cause a segmentation fault
- sampling throughput: -nan (tokens/thread/sec)
- Is there any python wrappers for LightLDA?
- Distributed running nytimes through mpi HOT 3
- distributed lightLDA HOT 1
- Size Error while running inference
- lightLDA is killed when traning! HOT 1
- Very Big dataset, Bad Alloc caught: failed memory allocation for documents_buffer in DataBlock
- run example success,but no result
- corpus_size_ > memory_block_size when reading file /data/block.0 HOT 1
- terminate called after throwing an instance of 'zmq::error_t'
- data prepare
- undefined reference to `multiverso:: HOT 2
- The topics don't match when every infer.@feiga
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lightlda.