Giter VIP home page Giter VIP logo

nltk.github.com's Introduction

Natural Language Toolkit (NLTK)

PyPI CI

NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. NLTK requires Python version 3.8, 3.9, 3.10, 3.11 or 3.12.

For documentation, please visit nltk.org.

Contributing

Do you want to contribute to NLTK development? Great! Please read CONTRIBUTING.md for more details.

See also how to contribute to NLTK.

Donate

Have you found the toolkit helpful? Please support NLTK development by donating to the project via PayPal, using the link on the NLTK homepage.

Citing

If you publish work that uses NLTK, please cite the NLTK book, as follows:

Bird, Steven, Edward Loper and Ewan Klein (2009).
Natural Language Processing with Python.  O'Reilly Media Inc.

Copyright

Copyright (C) 2001-2023 NLTK Project

For license information, see LICENSE.txt.

AUTHORS.md contains a list of everyone who has contributed to NLTK.

Redistributing

  • NLTK source code is distributed under the Apache 2.0 License.
  • NLTK documentation is distributed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States license.
  • NLTK corpora are provided under the terms given in the README file for each corpus; all are redistributable and available for non-commercial use.
  • NLTK may be freely redistributed, subject to the provisions of these licenses.

nltk.github.com's People

Contributors

ewan-klein avatar stevenbird avatar tomaarsen avatar xim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nltk.github.com's Issues

Unable to import nltk (model branch)

I get an ImportError on attempting to import nltk: "ImportError: No module named sparsefuncs"

I am using the model branch of NLTK (installed today with "pip install https://github.com/nltk/nltk/tarball/model".)

Traceback (most recent call last):
File "../MyFile.py", line 3, in
import nltk
File "/usr/local/lib/python2.7/dist-packages/nltk/init.py", line 117, in
from nltk.align import *
File "/usr/local/lib/python2.7/dist-packages/nltk/align/init.py", line 15, in
from nltk.align.ibm1 import IBMModel1
File "/usr/local/lib/python2.7/dist-packages/nltk/align/ibm1.py", line 18, in
from nltk.corpus import comtrans
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/init.py", line 66, in
from nltk.corpus.reader import *
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/init.py", line 62, in
from nltk.corpus.reader.chunked import *
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/chunked.py", line 21, in
from nltk.chunk import tagstr2tree
File "/usr/local/lib/python2.7/dist-packages/nltk/chunk/init.py", line 157, in
from nltk.chunk.api import ChunkParserI
File "/usr/local/lib/python2.7/dist-packages/nltk/chunk/api.py", line 13, in
from nltk.parse import ParserI
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/init.py", line 79, in
from nltk.parse.transitionparser import TransitionParser
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/transitionparser.py", line 17, in
from sklearn.datasets import load_svmlight_file
File "/usr/lib/python2.7/dist-packages/sklearn/datasets/init.py", line 23, in
from .twenty_newsgroups import fetch_20newsgroups
File "/usr/lib/python2.7/dist-packages/sklearn/datasets/twenty_newsgroups.py", line 53, in
from ..feature_extraction.text import CountVectorizer
File "/usr/lib/python2.7/dist-packages/sklearn/feature_extraction/init.py", line 10, in
from . import text
File "/usr/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 29, in
from ..preprocessing import normalize
File "/usr/lib/python2.7/dist-packages/sklearn/preprocessing/init.py", line 6, in
from .data import Binarizer
File "/usr/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 20, in
from ..utils.sparsefuncs import inplace_csr_row_normalize_l1
ImportError: No module named sparsefuncs

In case they're relevant, I did get two warnings on my install of NLTK:
warning: no files found matching 'Makefile' under directory '.txt'
warning: no previously-included files matching '
~' found anywhere in distribution

Typo in corpus.reader howto

It says:

...see the API documentation for <cite>nltk.corpus.CorpusReader</cite>.</p>

But I think it should read:

...see the API documentation for <cite>nltk.corpus.reader</cite>.</p>

[off-topic] Organization permissions

It's not related to this repository but to nltk organization on GitHub. It's a security issue in my opinion.

The current setting allows members of the organization give access to its repositories to third-party applications even if the members are giving access to these applications for use in its own repositories (not in nltk's).
The default setting should be to avoid it and, if needed, grant access to certain third-party applications. To configure it an owner should:

Don't encourage using sudo for installation. Encourage using virtualenv instead

Currently the best practice of installing Python packages is to create separate virtual environments for specific projects. However, http://www.nltk.org/install.html suggests running sudo pip install -U nltk, which IMO is not a piece of good advice. Instructions shouldn't contain sudo in them.

Maybe this is done with the consideration that telling the users to create virtual environments can cause further complications when they're using it, since many are still not familiar with the concept. However, would it be at least possible to suggest this as an alternative? Also I'm not sure if there would be some bugs and confusion for the users if they attempt to use sudo with pip, since there seem to be some problems with this on the latest MacOS.

An example is spaCy's getting started guide: https://spacy.io/usage/

The title on www.nltk.org is a lie

http://www.nltk.org/ still purports to be "NLTK 3.0 documentation" even though the API docs now reflect docstrings from the latest (3.2.2) release. The title should either be updated with each release, or it should be modified to contain a less precise version number that is not incorrect (e.g. "NLTK 3 documentation")

Invalid certificate on www.nltk.org

Visiting https://www.nltk.org results in a browser warning due to a certificate error:
screen shot 2015-10-12 at 3 27 07 pm

This is caused by the server presenting a certificate issued to github.com instead of nltk.org:
screen shot 2015-10-12 at 3 25 47 pm

I understand this is because www.nltk.org is hosted on Github. However, having to click through a serious security warning is an absolutely awful experience for first-time users. Please consider either getting a correct certificate or disabling HTTPS access to the page.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.