nltk / nltk_contrib Goto Github PK
View Code? Open in Web Editor NEWNLTK Contrib
Home Page: http://nltk.org/
License: Other
NLTK Contrib
Home Page: http://nltk.org/
License: Other
Natural Language Toolkit, Contrib Area (NLTK-Contrib) www.nltk.org Authors: Steven Bird <[email protected]> Edward Loper <[email protected]> Ewan Klein <[email protected]> Copyright (C) 2001-2011 NLTK Project For license information, see LICENSE.txt
Hello,
I found a bug in function "to_oo()" in textgrid.py.
def to_oo(self):
"""
@return: A string in OoTextGrid file format.
"""
oo_file = ""
oo_file += "File type = \"ooTextFile\"\n"
oo_file += "Object class = \"TextGrid\"\n\n"
oo_file += "xmin = ", self.xmin, "\n"
oo_file += "xmax = ", self.xmax, "\n"
oo_file += "tiers? <exists>\n"
oo_file += "size = ", self.size, "\n"
oo_file += "item []:\n"
TypeError: cannot concatenate 'str' and 'tuple' objects
May it been written as oo_file += "xmin = "+self.xmin+"\n" and so on.
I have a project that could benefit from the FUF module here but it is written in Python 2. Are there any plans to migrate this project to python 3?
It appears the langid demo in the misc package isn't up to date. It relies on the nltk.detect module that doesn't seem to exist anymore. Moreover it uses langs(...) on the udhr reader which doesn't seem to exist either.
Would there be a chance to see it updated?
Thanks
Perhaps I'm doing something wrong, but it's worth it to check.
My input to the ReadabilityTool
is unicode utf-8 text. The input is already encoded, and I received a TypeError
when trying to run the tests on it.
Traceback (most recent call last):
File "/Users/uname/projects/news_genome/news_genome/features.py", line 137, in metrics
flesch_readability(story),
File "/Users/uname/projects/news_genome/news_genome/mlstripper.py", line 23, in wrapper
return fn(text,*args,**kwargs)
File "/Users/uname/projects/news_genome/news_genome/mlstripper.py", line 30, in wrapper
ret = fn(*args,**kwargs)
File "/Users/uname/projects/news_genome/news_genome/features.py", line 49, in flesch_readability
contrib_score = rt.FleschReadingEase(text)
File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/readabilitytests.py", line 87, in FleschReadingEase
self.__analyzeText(text)
File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/readabilitytests.py", line 49, in __analyzeText
words = t.getWords(text)
File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/textanalyzer.py", line 50, in getWords
text = self._setEncoding(text)
File "/usr/local/lib/python2.7/site-packages/nltk_contrib/readability/textanalyzer.py", line 130, in _setEncoding
text = unicode(text, "utf8").encode("utf8")
TypeError: decoding Unicode is not supported
It appears the logic at line 130 in textanalyzer.py
expects to perform a encoding that is already performed.
def _setEncoding(self,text):
try:
text = unicode(text, "utf8").encode("utf8")
except UnicodeError:
try:
text = unicode(text, "iso8859_1").encode("utf8")
except UnicodeError:
text = unicode(text, "ascii", "replace").encode("utf8")
return text
Is there something I need to configure in order to make the module expect Unicode by default?
See nltk/nltk#75 and http://code.google.com/p/nltk/issues/detail?id=432 - it seems that was not commited
Hello, I have the following sentence:
"See you in July 18th, 2016".
When using your function "tag", it outputs:
"See you in July 18th, 2016"
I think it should include 'July 18th'. Is there a way to include it?
Also, weekday cannot be identified, for example:
"See you on Monday" --> Monday is not recognized.
(migrated from nltk/nltk#149)
Hi
An enhanced version of bioreader in nltk_contrib [http://code.google.com/p/nltk/source/browse/trunk#trunk%2Fnltk_contrib%2Fnltk_contrib%2Fbioreader] directory is available at https://bitbucket.org/jagan/bioreader.
Code clean up and implimentation of coding standards are done
Jaganadh G
Migrated from http://code.google.com/p/nltk/issues/detail?id=661
earlier comments
StevenBird1 said, at 2011-04-08T13:26:40.000Z:
Thanks. Would you please describe what the extra files are for? Also, please remember to use "new style" Python classes.
jaganadhg said, at 2011-04-08T14:43:05.000Z:
Dear Stevan The extra files are programs which I used for testing the program. Now I removed the files from bitbucket. I will impliment the "new style" Python class soon. If possible I will finish it by this week end.
jaganadhg said, at 2011-04-08T15:33:53.000Z:
Dear Stevan Just incorporated the "new style" Python class and also dome some minor corrections in the API documentation Jaganadh G
I find it unnecessarily difficult to install the nltk_contrib package as it is not published on PyPI. I know that you can still install with pip, but I want to list nltk_contrib in the dependency list in my setup.py
file.
Please consider pushing a release to PyPI.
This will make it less messy for people using this code as a submodule.
while calling timex.ground method i found this error.
r = LazyCorpusLoader('muc_7/', MUCCorpusReader, 'data/..ne.eng.keys.')
r.iob_sents()
[[('Like', 'O'), ('most', 'O'), ('of', 'O'), ('the', 'O'), ('two', 'O'), ('million', 'O'), ('infants', 'O'), ('under', 'O'), ('2', 'O'), ('who', 'O'), ('fly', 'O'), ('with', 'O'), ('their', 'O'), ('parents', 'O'), ('every', 'O'), ('year', 'O'), (',', 'O'), ('Danasia', 'B-PERSON'), ('was', 'O'), ('traveling', 'O'), ('for', 'O'), ('free', 'O'), (',', 'O'), ('seated', 'O'), ('on', 'O'), ('her', 'O'), ('mother', 'O'), ("'s", 'O'), ('lap', 'O'), ('.', 'O')], [('As', 'O'), ('the', 'O'), ('DC-9', 'O'), ('approached', 'O'), ('the', 'O'), ('airport', 'O'), ('on', 'O'), ('July', 'B-DATE'), ('2', 'I-DATE'), (',', 'I-DATE'), ('1994', 'I-DATE'), (',', 'O'), ('wind', 'O'), ('shear', 'O'), ('slammed', 'O'), ('the', 'O'), ('plane', 'O'), ('to', 'O'), ('the', 'O'), ('ground', 'O'), ('.', 'O')], ...]
len(r.iob_sents())
[Tree('S', ['The', Tree('ORGANIZATION', ['Unicef']), 'Flyer', 'flight', 'suffered', 'a', 'setback', Tree('DATE', ['Dec', '.'])]), Tree('DATE', ['Dec', '.'])]
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/util.py", line 966, in len
return max(len(lst) for lst in self._lists)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/util.py", line 966, in
return max(len(lst) for lst in self._lists)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/util.py", line 807, in len
if len(self._offsets) <= len(self._list):
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/util.py", line 966, in len
return max(len(lst) for lst in self._lists)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/util.py", line 966, in
return max(len(lst) for lst in self._lists)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/util.py", line 807, in len
if len(self._offsets) <= len(self._list):
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/util.py", line 966, in len
return max(len(lst) for lst in self._lists)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/util.py", line 966, in
return max(len(lst) for lst in self._lists)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/corpus/reader/util.py", line 379, in len
for tok in self.iterate_from(self._offsets[-1]): pass
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/corpus/reader/util.py", line 401, in iterate_from
for tok in piece.iterate_from(max(0, start_tok-offset)):
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.1rc4-py2.7.egg/nltk/corpus/reader/util.py", line 298, in iterate_from
tokens = self.read_block(self._stream)
File "/usr/local/lib/python2.7/dist-packages/nltk_contrib/coref/muc.py", line 419, in _read_parsed_block
return map(self._parse, self._read_block(stream))
File "/usr/local/lib/python2.7/dist-packages/nltk_contrib/coref/muc.py", line 428, in _parse
tree = mucstr2tree(doc, top_node='DOC')
File "/usr/local/lib/python2.7/dist-packages/nltk_contrib/coref/muc.py", line 468, in mucstr2tree
'text': _muc_read_text(match.group('text'), top_node),
File "/usr/local/lib/python2.7/dist-packages/nltk_contrib/coref/muc.py", line 534, in _muc_read_text
tree[-1].append(_muc_read_words(sent, 'S'))
File "/usr/local/lib/python2.7/dist-packages/nltk_contrib/coref/muc.py", line 558, in _muc_read_words
assert len(stack) == 1
AssertionError
Hi,
I wanted to use readability tests from nltk_contrib package. The code does not meet my requirements (like choosing between Dutch and English) and I have rewritten it.
I put it there so others could use the work as well. Maybe it could become a part of official nltk package? I would like to edit it so it meets the code standard of nltk.
Migrated from http://code.google.com/p/nltk/issues/detail?id=677
earlier comments
alex.rudnick said, at 2011-05-24T06:34:59.000Z:
Thanks for submitting updates! Could you outline the changes you made? Improvements are always welcome! Are you interested in contributing this as part of nltk_contrib? (could it maybe be merged in to the existing readability package, or would you prefer it to be separate?)
izidor.matusov said, at 2011-05-24T06:58:19.000Z:
I made these changes: 1) The original readability tests module of nltk_contrib contains code which is not related to readability tests. That code was removed. 2) Repaired a few lines so the code actually runs in the current version of Python. 3) Removed support for Dutch 4) Polished interface 5) Rewritten code, so pylint does not nag so much. Yes, I am interested. It could be merged but there are many radical changes so it could be a problem for someone who uses the previous readability package.
Migrated from nltk/nltk#161
I ran into a problem trying to apply the readability tests to a block of text with some UTF-8 characters (fancy quotes).
Sample text: http://pastebin.com/eRKGMGYn
Test script: http://pastebin.com/aE2DaRvk
I'm not very familiar with nltk_contrib, so perhaps I'm just using it wrong...but it seems to fail regardless of whether I pass in a bytestring or unicode string to ReadabilityTool. I forked nltk_contrib and changed textanalyzer.py so that it takes unicode instead of bytes, and that seems to have fixed the problem for me.
My fork: https://github.com/priceonomics/nltk_contrib
Can someone confirm the issue I'm seeing and whether my fix is appropriate? Feel free to merge it back if it's useful.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.