lynten / stanford-corenlp Goto Github PK
View Code? Open in Web Editor NEWPython wrapper for Stanford CoreNLP.
License: MIT License
Python wrapper for Stanford CoreNLP.
License: MIT License
I am able to annotate text successfully by using stanford-corenlp as follows
nlp = StanfordCoreNLP('http://localhost', port=9000)
sentence = '''Michael James editor of Publishers Weekly,
Bill Gates is the owner of Microsoft,
Obama is the owner of Microsoft,
Satish lives in Hyderabad'''
props={'annotators': 'tokenize, ssplit, pos, lemma, ner, regexner,coref',
'regexner.mapping':'training.txt','pipelineLanguage':'en'}
annotatedText = json.loads(nlp.annotate(sentence, properties=props))
I am trying to get the relation of annotatedText, but it returns nothing
roles = """
(.*(
analyst|
owner|
lives|
editor|
librarian).*)|
researcher|
spokes(wo)?man|
writer|
,\sof\sthe?\s* # "X, of (the) Y"
"""
ROLES = re.compile(roles, re.VERBOSE)
for rel in nltk.sem.extract_rels('PERSON', 'ORGANIZATION', annotatedText,corpus='ace', pattern = ROLES):
print(nltk.sem.rtuple(rel))
can you please help how to extract the relation from Stanford annotated text using NLTK nltk.sem.extract_rels
I'm trying to use standordCoreNLP and getting an error:
from stanfordcorenlp import StanfordCoreNLP
..
..
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/psutil/_psosx.py", line 330, in wrapper
return fun(self, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/psutil/_psosx.py", line 515, in connections
rawlist = cext.proc_connections(self.pid, families, types)
PermissionError: [Errno 1] Operation not permitted
..
..
During handling of the above exception, another exception occurred:
nlp = StanfordCoreNLP("/Users/NLP_models/stanford-corenlp-full-2018-01-31")
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/stanfordcorenlp/corenlp.py", line 79, in __init__
if port_candidate not in [conn.laddr[1] for conn in psutil.net_connections()]:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/psutil/__init__.py", line 2108, in net_connections
return _psplatform.net_connections(kind)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/psutil/_psosx.py", line 249, in net_connections
cons = Process(pid).connections(kind)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/psutil/_psosx.py", line 335, in wrapper
raise AccessDenied(self.pid, self._name)
psutil._exceptions.AccessDenied: psutil.AccessDenied (pid=13041)
I tried to follow suggestions here:
ContinuumIO/anaconda-issues#6006
But it didn't work - any idea?
When trying to connect to a existing server that is unavailable the code throws no exception and becomes unresponsive while trying to annotate leading to timeout errors.
How can I lemmatize all tokens in a corpus and eliminate stop words?
It's good to add all the annotators or api from CoreNLP
. If possible, add nlp.relation
.
Other python wrapper doesn't support relation extractor either.
any idea why it runs quite slow and how to solve it??
How to add words to relations model
https://github.com/Lynten/stanford-corenlp/blob/master/stanfordcorenlp/corenlp.py#L176
r_dict = json.loads(r.text)
should be try/except or check r.ok/status.
Method has invoked uncaught exceptions for me
I was trying to do NER with my text
from stanfordcorenlp import StanfordCoreNLP
documents = pd.read_csv('some csv file')['documents'].values.tolist()
text = documents[0] ## test is a string
nlp = StanfordCoreNLP('my_path\stanford-corenlp-full-2018-02-27') ## latest version
print(nlp.ner(text))
But I keep getting this error
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Nice work. Is there a possibility to make nlp.dcoref in the next release?
I'm trying to save the model to use it without having to download and read the data itself every time but I get an error:
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP("/NLP_models/stanford-corenlp-full-2018-01-31")
pickle.dump(nlp, open("/NLP_models/pickled_wrapped_nlp", "w"))
And the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't pickle _thread.lock objects
What can I do?
Thanks
Hi
When I ran the test with python 2.7, I got the following error:
Initializing native server...
java -Xmx4g -cp "/home/ehsan/Java/JavaLibraries/stanford-corenlp-full-2016-10-31/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000
The server is available.
Traceback (most recent call last):
File "test.py", line 9, in <module>
print('Tokenize:', nlp.word_tokenize(sentence))
File "/home/ehsan/Python/stanford-corenlp/stanfordcorenlp/corenlp.py", line 78, in word_tokenize
r_dict = self._request('ssplit,tokenize', sentence)
File "/home/ehsan/Python/stanford-corenlp/stanfordcorenlp/corenlp.py", line 114, in _request
r_dict = json.loads(r.text)
File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Thanks for this great Python wrapper. I've been using it in a Translation Technology class here at National Taiwan University, Taipei for two semesters.
I was wondering if it is possible to specify the segmentation model either as a parameter in the StanfordCoreNLP() method, e.g.,
nlp = StanfordCoreNLP(path, lang='zh', model='pku')
or in an external "properties" file. Thank you!
How can I use standfordcorenlp to replace all pronouns with their nouns in a sentence. For example the sentence is : Fred Rogers lives in a house with pets. It is two stories, and he has a dog, a cat, a rabbit, three goldfish, and a monkey.
This needs to be converted to : Fred Rogers lives in a house with pets. House is two stories, and Fred Rogers has a dog, a cat, a rabbit, three goldfish, and a monkey.
I had used the below code
nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2017-06-09', quiet=False)
props = {'annotators': 'coref', 'pipelineLanguage': 'en'}
text = 'Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.'
result = json.loads(nlp.annotate(text, properties=props))
mentions = result['corefs'].items()
Although, I cannot understand how to read through mentions and perform what I want to do.
Hi
Please do not use logging.basicConfig() to configure logging, since it will mix your logger with any other logger defined by a user of your module (see first comment of https://stackoverflow.com/a/35326281/141586)
In case someone is having conflicts like me (log messages showing twice, for example), set propagate to False in your logger object.
logger = logging.getLogger(__name__)
logger.propagate = False
I'm getting a file not found error as shown below:
Traceback (most recent call last): File "D:/Users/[user]/Documents/NLP/arabic_tagger/build_model.py", line 6, in <module> nlp = StanfordCoreNLP(corenlp_path, lang='ar', memory='4g')
File "C:\Users\[user]\AppData\Roaming\Python\Python36\site-packages\stanfordcorenlp\corenlp.py", line 46, in __init__
if not subprocess.call(['java', '-version'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) == 0:
File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 267, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
I have used this wrapper before and am using it in the same way as always:
corenlp_path = 'D:/Users/[user]/Desktop/StanfordCoreNLP/Full_CoreNLP_3.8.0'
nlp = StanfordCoreNLP(corenlp_path, lang='ar', memory='4g')
Just to be sure, I downloaded version 3.8.0 as well as the Arabic models and made sure they are in the path specified. I'm wondering if the FileNotFoundError is not referring to the CoreNLP path but something else... subprocess.py is in the correct directory. So yeah... not sure what's wrong/what to do.
Thanks!
There are minor differences. I've taken care of the CoreNLP versions - both are 3.7.0. Could someone point out the possible reason/s and fixes?
On Mac OSX connecting to the socket may cause hanging. Adding a short delay before the first check fixes it.
Is there any way to find the depth of constituency tree?
i just run the example's Simple Usage
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/shenglong123/Desktop/bbb/cc.py
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 8
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
i wait half of a hour
however it is not print anythings
i run it in my MACOS and ubuntu
the stanford-corenlp-full-2017-06-09 I also put on /Users/shenglong123/desktop/stanford-corenlp-full-2017-06-09
it contain
CoreNLP-to-HTML.xsl joda-time-2.9-sources.jar
LIBRARY-LICENSES joda-time.jar
LICENSE.txt jollyday-0.4.9-sources.jar
Makefile jollyday.jar
README.txt patterns
SemgrexDemo.java pom.xml
ShiftReduceDemo.java protobuf.jar
StanfordCoreNlpDemo.java slf4j-api.jar
StanfordDependenciesManual.pdf slf4j-simple.jar
build.xml stanford-corenlp-3.8.0-javadoc.jar
corenlp.sh stanford-corenlp-3.8.0-models.jar
ejml-0.23-src.zip stanford-corenlp-3.8.0-sources.jar
ejml-0.23.jar stanford-corenlp-3.8.0.jar
input.txt sutime
input.txt.xml tokensregex
javax.json-api-1.0-sources.jar xom-1.2.10-src.jar
javax.json.jar xom.jar
I would appreciate it if you could give me some advice,thanks
There is only a .whl but no .tar.gz
I maintain an ebuild for Gentoo in my overlay, and I would prefer using the tar.gz from pypi and not GitHub.
Thanks!
Would it be possible for us to extract the NP or VP from the constituency parsing results?
Does it support POS tagging for pre-tokenized text? As in here:
https://nlp.stanford.edu/software/pos-tagger-faq.html#pretagged
stanford-corenlp-x.x.x-models.jar not exists. You should download and place it in the C:\Users\User\Desktop\user\stanford-parser-full-2018-02-27\ first.
but the file stanford-corenlp-3.9.1-models.jar is present in the directory. I have also added corenlp to CLASSPATH. I haven't edited the STANFORD_MODELS yet though
Following this thread:
#28
I'm trying to init a server:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - setting default constituency parser
[main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz
[main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead
[main] INFO CoreNLP - to use shift reduce parser download English models jar from:
[main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html
[main] INFO CoreNLP - Threads: 8
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
Then I tried to execute the command
nlp = StanfordCoreNLP('http://localhost', port=9000)
But nothing happened, even for simple python command such as
a = 2
print(a)
It worth noting that when I open is the browser I have an access to the server, but I want to communicate with it from python
When I tried to run the demo code for ner:
print 'Named Entities:', nlp.ner(sentence)
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/pypy2.7/dist-packages/stanfordcorenlp/corenlp.py", line 146, in ner
r_dict = self._request('ner', sentence)
File "/usr/local/lib/pypy2.7/dist-packages/stanfordcorenlp/corenlp.py", line 171, in _request
r_dict = json.loads(r.text)
File "/usr/lib/pypy/lib-python/2.7/json/init.py", line 347, in loads
return _default_decoder.decode(s)
File "/usr/lib/pypy/lib-python/2.7/json/decoder.py", line 363, in decode
obj, end = self.raw_decode(s, idx=WHITESPACE.match(s, 0).end())
File "/usr/lib/pypy/lib-python/2.7/json/decoder.py", line 381, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Does anyone know how to solve this problem?
I use StanfordCoreNLP version 3.7.0
I had the following error using the latest version:
2017-11-06 08:36:46,679 root INFO Cleanup...
Exception ignored in: <bound method StanfordCoreNLP.__del__ of <stanfordcorenlp.corenlp.StanfordCoreNLP object at 0x0000021304456F98>>
Traceback (most recent call last):
File "C:\Program Files\Anaconda3\lib\site-packages\stanfordcorenlp\corenlp.py", line 108, in __del__
File "C:\Program Files\Anaconda3\lib\logging\__init__.py", line 1838, in info
File "C:\Program Files\Anaconda3\lib\logging\__init__.py", line 1279, in info
File "C:\Program Files\Anaconda3\lib\logging\__init__.py", line 1415, in _log
File "C:\Program Files\Anaconda3\lib\logging\__init__.py", line 1425, in handle
File "C:\Program Files\Anaconda3\lib\logging\__init__.py", line 1487, in callHandlers
File "C:\Program Files\Anaconda3\lib\logging\__init__.py", line 855, in handle
File "C:\Program Files\Anaconda3\lib\logging\__init__.py", line 1047, in emit
File "C:\Program Files\Anaconda3\lib\logging\__init__.py", line 1037, in _open
NameError: name 'open' is not defined
According to this: https://bugs.python.org/issue26789 will be better to remove the logging from the del method (in my case works).
Note that i set the logger to save to file.
Is there an API that helps process batches of sentences ?
As of now, I'm using the nlp.ner(document) API. So, I'm looking for something like nlp.batch_ner(list_of_documents).
Hi, I'm trying to use this library for a project where I would need corefence resolution. I was testing out the test.py file and this kept on showing indefinitely, how do I fix this?
Thanks in advance.
我有一些专有名词需要单独设立标签,请问可以自定义一些词的词性并添加进去吗?
Or load my own custom dictionary
Can I use a custom segmentation dictionary in python ?
When use ner function, others work.
File "", line 1, in
File "/usr/local/lib/python3.5/dist-packages/stanfordcorenlp/corenlp.py", line 195, in ner
r_dict = self._request('ner', sentence)
File "/usr/local/lib/python3.5/dist-packages/stanfordcorenlp/corenlp.py", line 239, in _request
r_dict = json.loads(r.text)
File "/usr/lib/python3.5/json/init.py", line 319, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I get an error when trying to use the quote
and quoteattribution
annotators.
Using props={'annotators': 'tokenize,ssplit,pos,lemma,ner,quote'}
, I get the following error:
'java.lang.IllegalArgumentException: annotator "quote" requires annotation "CanonicalEntityMentionIndexAnnotation". The usual requirements for this annotator are: tokenize,ssplit,pos,lemma,ner'
.
And with props={'annotators': 'tokenize,ssplit,pos,lemma,ner,entitymentions,depparse,quote,quoteattribution','pipelineLanguage':'en','outputFormat':'xml'}
, I get the following error:
'Could not handle incoming annotation'
Any suggestions on how to fix these?
Hi, CoreNLP is now at version 3.8.0 (https://stanfordnlp.github.io/CoreNLP/other-languages.html), can I just download this new version and will the interface work with it?
There is a timeout parameter for the init method of StanfordCoreNLP class. But it is never used. We should implement the behavior or remove it to reduce misunderstanding.
Exception ignored in: <bound method StanfordCoreNLP.del of <stanfordcorenlp.corenlp.StanfordCoreNLP object at 0x7f5d1e4856d8>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/stanfordcorenlp/corenlp.py", line 111, in del
File "/usr/lib/python3/dist-packages/psutil/init.py", line 349, in init
File "/usr/lib/python3/dist-packages/psutil/init.py", line 370, in _init
File "/usr/lib/python3/dist-packages/psutil/_pslinux.py", line 849, in init
File "/usr/lib/python3/dist-packages/psutil/_pslinux.py", line 151, in get_procfs_path
AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH
AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH
why it transfers the "(", ")" to "-LRB- ", "-RRB- "
Hi,
I am getting the following error when I try to create StanfordCoreNLP object with
nlp = StanfordCoreNLP(r'/Users/virk/Downloads/stanford-corenlp-full-2018-01-31')
The error is:
raise AccessDenied(self.pid, self._name)
psutil._exceptions.AccessDenied: psutil.AccessDenied (pid=34054)
Any ideas?
Sentence:
It's Lolita's chance of freedom," said Jared Goodman, PETA's director of animal law. "It's a huge step."
The server's output (and common sense) suggests that Goodman is the subject (nsubj) of the verb "said". However, the python version's output says that Goodman is the object of the verb "said". Could this discrepancy be fixed? Thank you.
nlp = StanfordCoreNLP(r'/usr/local/lib/python3.4/dist-packages/stanfordcorenlp/', lang='zh')
File "/usr/local/lib/python3.4/dist-packages/stanfordcorenlp/corenlp.py", line 46, in init
if not subprocess.call(['java', '-version'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) == 0:
File "/usr/lib/python3.4/subprocess.py", line 537, in call
with Popen(*popenargs, **kwargs) as p:
File "/usr/lib/python3.4/subprocess.py", line 859, in init
restore_signals, start_new_session)
File "/usr/lib/python3.4/subprocess.py", line 1457, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'java'
i am trying to run the following sample code, and i get the jsonDecodeError. looks like the program cannot find any object to decode. i am using anaconda python 3.6
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP(r'C:/Users/lidan/Desktop/es/corenlp/stanford-corenlp-full-2017-06-09/')
sentence = "Guangdong University of Foreign Studies is located in Guangzhou."
print ('Tokenize:', nlp.word_tokenize(sentence))
Traceback (most recent call last):
File "", line 4, in
print ('Tokenize:', nlp.word_tokenize(sentence))
File "C:\Users\lidan\Anaconda3\lib\site-packages\stanfordcorenlp\corenlp.py", line 132, in word_tokenize
r_dict = self._request('ssplit,tokenize', sentence)
File "C:\Users\lidan\Anaconda3\lib\site-packages\stanfordcorenlp\corenlp.py", line 171, in _request
r_dict = json.loads(r.text)
File "C:\Users\lidan\Anaconda3\lib\json_init_.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Users\lidan\Anaconda3\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\lidan\Anaconda3\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
JSONDecodeError: Expecting value
Getting unknown annotator from the stanford.jar
Hello!
I can run the dependency parser using stanford core nlp using pycorenlp (python wrapper). However, the challenge is how to convert it to a tree. I want to process the tree (basically root to leaves path). Hence, it is important for me to convert it into tree.
I can use nltk.tree library which will yield me a tree but the input is in bracketed parsed form which is missing.
Kindly help!
It is a great wrapper.
Can you make it run as a batch process as it is too slow to run this each time for a new sentence?
I need to make it dependency parse several sentences within seconds.
Please look into the issue.
I have installed the Java,and check it by 'java -version'.However I still run across the error like this.
---> 46 if not subprocess.call(['java', '-version'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) == 0:
47 raise RuntimeError('Java not found.')
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.