Comments (4)
Hi David, I think documentation is very important, so improvements are very welcome!
However in this case, I think you are wrong -- I always check all tutorial examples before each release:
- The
serialize
method resides incorpora.IndexedCorpus
class, a base class of other serializers. Dictionary
objects can now act as simple mappings, soid2word=dictionary
works (and is the preferred way of usingid2word
).Dictionary.filterTokens
andcompactify
methods are not explained in great detail in the tutorial itself, but you can always look at the API documentation. In fact, the documentation for these functions is longer than their Python code :-)
Your note about corpus parsing and reparsing is serious though, it means it is not clear to users how the dictionary processing fits within corpus creation. That's a conceptual mistake, so the tutorial is apparently not doing a good job there, I will try to improve it.
EDIT: maybe the confusion comes from you using an older version of gensim? The documentation always reflects the latest release.
from gensim.
Hi Radim,
- Yes, I am using 0.7.7, so that must be the reason for 1 & 2.
If the documentation already reflects the API of 0.7.8, then more power to you. :)
However, you might consider making the older documentation available as well, if someone has to live with it for a while (e.g. because of installation policy, etc.).
As for 3., I didn't mean the methods filterToken
and compactify
themselves, I think that's pretty straightforward. So having a short example on corpus reparsing is all I ask for. :)
from gensim.
I am sorry, I closed this accidentally. I am still learning GitHub, I just wished the "Comment and close" button wasn't the default. :/
from gensim.
Learning GitHub is a never ending process... I filed one site bug report just yesterday :-)
Full documentation (including HTML) is version controlled, and is a part of each gensim release. So you can access the relevant version a) from the source .tgz package of your release, docs
dir, b) from GitHub, when you select Switch Tags
from the main gensim page, c) from your local repo, by git checkout 0.7.7
.
There are several questions about dictionaries and corpora at the mailing list now, not just yours, so apparently the tutorial on that part is insufficient. I'll try to improve it, but once you figure it out, please consider upgrading the docs yourself. I know gensim too well, it's difficult to have a detached perspective on some things. I may see stuff as obvious and misunderstand problems.
from gensim.
Related Issues (20)
- Merging corpora requires converting itertools chain object to list object HOT 2
- Inconsistent documentation for LdaSeqModel
- Is there anyway to adjust the weight of the node? HOT 1
- Deprecation Warning for sparsetools namespace HOT 2
- simple_processing() str_iterator issue HOT 3
- Pretrained model for doc2vec HOT 1
- File "<string>", line 111, in finalize_options AttributeError: 'dict' object has no attribute '__NUMPY_SETUP__' when installing gensim 3.8.3 with pip install
- add functions to reproduce preprocessing matching `GoogleNews`, `GLoVe`, etc pretrained word-vectors HOT 1
- generate change log for 4.3.2
- Windows wheel broken for Python 3.10
- Compiled extensions are very slow when built with Cython 3.0.0
- Tests fail: RuntimeError: Compiled extensions are unavailable. HOT 3
- TypeError: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given HOT 2
- Search feature on website is broken HOT 1
- How to open doc2vec trained on an older version of gensim? HOT 3
- is the summarization module removed in the newest version of gensim, i find it nowhere in the documentation? HOT 1
- Vocabulary size is much smaller than requested HOT 2
- Docs still reference fasttext.build_vocab sentences parameter HOT 1
- EnsembleLDA with pyLDAvis visualisation
- library stubs are missing HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gensim.