Giter VIP home page Giter VIP logo

Comments (4)

piskvorky avatar piskvorky commented on May 14, 2024

Hi David, I think documentation is very important, so improvements are very welcome!

However in this case, I think you are wrong -- I always check all tutorial examples before each release:

  1. The serialize method resides in corpora.IndexedCorpus class, a base class of other serializers.
  2. Dictionary objects can now act as simple mappings, so id2word=dictionary works (and is the preferred way of using id2word).
  3. Dictionary.filterTokens and compactify methods are not explained in great detail in the tutorial itself, but you can always look at the API documentation. In fact, the documentation for these functions is longer than their Python code :-)

Your note about corpus parsing and reparsing is serious though, it means it is not clear to users how the dictionary processing fits within corpus creation. That's a conceptual mistake, so the tutorial is apparently not doing a good job there, I will try to improve it.

EDIT: maybe the confusion comes from you using an older version of gensim? The documentation always reflects the latest release.

from gensim.

DavidNemeskey avatar DavidNemeskey commented on May 14, 2024

Hi Radim,

  1. Yes, I am using 0.7.7, so that must be the reason for 1 & 2.
    If the documentation already reflects the API of 0.7.8, then more power to you. :)
    However, you might consider making the older documentation available as well, if someone has to live with it for a while (e.g. because of installation policy, etc.).

As for 3., I didn't mean the methods filterToken and compactify themselves, I think that's pretty straightforward. So having a short example on corpus reparsing is all I ask for. :)

from gensim.

DavidNemeskey avatar DavidNemeskey commented on May 14, 2024

I am sorry, I closed this accidentally. I am still learning GitHub, I just wished the "Comment and close" button wasn't the default. :/

from gensim.

piskvorky avatar piskvorky commented on May 14, 2024

Learning GitHub is a never ending process... I filed one site bug report just yesterday :-)

Full documentation (including HTML) is version controlled, and is a part of each gensim release. So you can access the relevant version a) from the source .tgz package of your release, docs dir, b) from GitHub, when you select Switch Tags from the main gensim page, c) from your local repo, by git checkout 0.7.7.

There are several questions about dictionaries and corpora at the mailing list now, not just yours, so apparently the tutorial on that part is insufficient. I'll try to improve it, but once you figure it out, please consider upgrading the docs yourself. I know gensim too well, it's difficult to have a detached perspective on some things. I may see stuff as obvious and misunderstand problems.

from gensim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.