dasmith / stanford-corenlp-python Goto Github PK

Python wrapper for Stanford CoreNLP tools v3.4.1

License: GNU General Public License v2.0

Python 100.00%

stanford-corenlp-python's Introduction

Python interface to Stanford Core NLP tools v3.4.1

This is a Python wrapper for Stanford University's NLP group's Java-based CoreNLP tools. It can either be imported as a module or run as a JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM on 64-bit machines and usually a few minutes loading time), most applications will probably want to run it as a server.

Python interface to Stanford CoreNLP tools: tagging, phrase-structure parsing, dependency parsing, named-entity recognition, and coreference resolution.
Runs an JSON-RPC server that wraps the Java server and outputs JSON.
Outputs parse trees which can be used by nltk.

It depends on pexpect and includes and uses code from jsonrpc and python-progressbar.

It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON. The parser will break if the output changes significantly, but it has been tested on Core NLP tools version 3.4.1 released 2014-08-27.

Download and Usage

To use this program you must download and unpack the compressed file containing Stanford's CoreNLP package. By default, corenlp.py looks for the Stanford Core NLP folder as a subdirectory of where the script is being run. In other words:

sudo pip install pexpect unidecode
git clone git://github.com/dasmith/stanford-corenlp-python.git
cd stanford-corenlp-python
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2014-08-27.zip
unzip stanford-corenlp-full-2014-08-27.zip

Then launch the server:

python corenlp.py

Optionally, you can specify a host or port:

python corenlp.py -H 0.0.0.0 -p 3456

That will run a public JSON-RPC server on port 3456.

Assuming you are running on port 8080, the code in client.py shows an example parse:

import jsonrpc
from simplejson import loads
server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),
                             jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))

result = loads(server.parse("Hello world.  It is so beautiful"))
print "Result", result

That returns a dictionary containing the keys sentences and coref. The key sentences contains a list of dictionaries for each sentence, which contain parsetree, text, tuples containing the dependencies, and words, containing information about parts of speech, recognized named-entities, etc:

{u'sentences': [{u'parsetree': u'(ROOT (S (VP (NP (INTJ (UH Hello)) (NP (NN world)))) (. !)))',
                 u'text': u'Hello world!',
                 u'tuples': [[u'dep', u'world', u'Hello'],
                             [u'root', u'ROOT', u'world']],
                 u'words': [[u'Hello',
                             {u'CharacterOffsetBegin': u'0',
                              u'CharacterOffsetEnd': u'5',
                              u'Lemma': u'hello',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'UH'}],
                            [u'world',
                             {u'CharacterOffsetBegin': u'6',
                              u'CharacterOffsetEnd': u'11',
                              u'Lemma': u'world',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'NN'}],
                            [u'!',
                             {u'CharacterOffsetBegin': u'11',
                              u'CharacterOffsetEnd': u'12',
                              u'Lemma': u'!',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'.'}]]},
                {u'parsetree': u'(ROOT (S (NP (PRP It)) (VP (VBZ is) (ADJP (RB so) (JJ beautiful))) (. .)))',
                 u'text': u'It is so beautiful.',
                 u'tuples': [[u'nsubj', u'beautiful', u'It'],
                             [u'cop', u'beautiful', u'is'],
                             [u'advmod', u'beautiful', u'so'],
                             [u'root', u'ROOT', u'beautiful']],
                 u'words': [[u'It',
                             {u'CharacterOffsetBegin': u'14',
                              u'CharacterOffsetEnd': u'16',
                              u'Lemma': u'it',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'PRP'}],
                            [u'is',
                             {u'CharacterOffsetBegin': u'17',
                              u'CharacterOffsetEnd': u'19',
                              u'Lemma': u'be',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'VBZ'}],
                            [u'so',
                             {u'CharacterOffsetBegin': u'20',
                              u'CharacterOffsetEnd': u'22',
                              u'Lemma': u'so',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'RB'}],
                            [u'beautiful',
                             {u'CharacterOffsetBegin': u'23',
                              u'CharacterOffsetEnd': u'32',
                              u'Lemma': u'beautiful',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'JJ'}],
                            [u'.',
                             {u'CharacterOffsetBegin': u'32',
                              u'CharacterOffsetEnd': u'33',
                              u'Lemma': u'.',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'.'}]]}],
u'coref': [[[[u'It', 1, 0, 0, 1], [u'Hello world', 0, 1, 0, 2]]]]}

To use it in a regular script (useful for debugging), load the module instead:

from corenlp import *
corenlp = StanfordCoreNLP()  # wait a few minutes...
corenlp.parse("Parse this sentence.")

The server, StanfordCoreNLP(), takes an optional argument corenlp_path which specifies the path to the jar files. The default value is StanfordCoreNLP(corenlp_path="./stanford-corenlp-full-2014-08-27/").

Coreference Resolution

The library supports coreference resolution, which means pronouns can be "dereferenced." If an entry in the coref list is, [u'Hello world', 0, 1, 0, 2], the numbers mean:

0 = The reference appears in the 0th sentence (e.g. "Hello world")
1 = The 2nd token, "world", is the headword of that sentence
0 = 'Hello world' begins at the 0th token in the sentence
2 = 'Hello world' ends before the 2nd token in the sentence.

Questions

Stanford CoreNLP tools require a large amount of free memory. Java 5+ uses about 50% more RAM on 64-bit machines than 32-bit machines. 32-bit machine users can lower the memory requirements by changing -Xmx3g to -Xmx2g or even less. If pexpect timesout while loading models, check to make sure you have enough memory and can run the server alone without your kernel killing the java process:

java -cp stanford-corenlp-2014-08-27.jar:stanford-corenlp-3.4.1-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props default.properties

You can reach me, Dustin Smith, by sending a message on GitHub or through email (contact information is available on my webpage).

License & Contributors

This is free and open source software and has benefited from the contribution and feedback of others. Like Stanford's CoreNLP tools, it is covered under the GNU General Public License v2 +, which in short means that modifications to this program must maintain the same free and open source distribution policy.

I gratefully welcome bug fixes and new features. If you have forked this repository, please submit a pull request so others can benefit from your contributions. This project has already benefited from contributions from these members of the open source community:

Emilio Monti
Justin Cheng
Abhaya Agarwal

Thank you!

Related Projects

Maintainers of the Core NLP library at Stanford keep an updated list of wrappers and extensions. See Brendan O'Connor's stanford_corenlp_pywrapper for a different approach more suited to batch processing.

stanford-corenlp-python's People

Contributors

Stargazers

Watchers

Forkers

ranjithtenz rybesh underspecified abhaga jcccf chasebro marcovzla berdon jeremyjbowers mathewsbabu gutelius adkatrit pflaquerre abijith-kp maxis1718 charnugagoo jac2130 relwell aped netconstructor scraping-xx big-data bigdata-tools hithertolabs chanrom davidajohnston cryptolab rahmaniacc boblannon azizur77 dx88968 sibghatullahsheikh man27382210 vambati silverasm gthandavam ashbt julosaure icedwater imclab bruce2xkwang sarvesh-ranjan knowsis johnconnelly75 ivanvladimir curzona karimkhanp lendormi arne-cl orazaro jchou24 recski isandeep killix vinodrajendran001 redsk taylorhxu d1ma hitalex rocipher kchennen mongolia19 raunakmanjani zhongliangong gotoc huskyeder alphadx ababook pombredanne kevinlee315 rishavbajoria72 joswinkj superxiaoqiang simms21 milesqli vparikh10 denglizong danielravina emilmont likaiguo netherash largepanda maverick2789 hayj vasu5235 xiliangsong bsatts luciasalar javelir byronallen qiqipipioioi jhnlp wachihi1 saikswaroop chabhishek123 rksksm chaitanyacixlive chenhaot kyusonglee heihei2015

stanford-corenlp-python's Issues

could you add windows support?

PS F:\gitwork\stanford-corenlp-python> python corenlp.py
Traceback (most recent call last):
  File "corenlp.py", line 257, in <module>
    nlp = StanfordCoreNLP()
  File "corenlp.py", line 163, in __init__
    self.corenlp = pexpect.spawn(start_corenlp)
AttributeError: 'module' object has no attribute 'spawn'

pexpect/pexpect#321
http://pexpect.readthedocs.io/en/stable/overview.html#pexpect-on-windows

could you add windows support?

No Valid JSON error

I am a newbie to these tools (JSON in particular). I am getting a parse error with error code -32700. Please help how to fix?
I have attached the screenshot for the same-

Python 3 support

I could not find this documented, but as far as I see, this module works only with python 2. Any chance to use it with python 3/anyone already forked such version?

pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.

python corenlp/corenlp.py -H ip -p 3456
Traceback (most recent call last):
  File "corenlp/corenlp.py", line 592, in <module>
    main()
  File "corenlp/corenlp.py", line 580, in main
    nlp = StanfordCoreNLP(options.corenlp, properties=options.properties, serving=True)
  File "corenlp/corenlp.py", line 435, in __init__
    self._spawn_corenlp()
  File "corenlp/corenlp.py", line 424, in _spawn_corenlp
    self.corenlp.expect("\nNLP> ")
  File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 315, in expect
    timeout, searchwindowsize, async)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/spawnbase.py", line 339, in expect_list
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 102, in expect_loop
    return self.eof(e)
  File "/usr/local/lib/python2.7/dist-packages/pexpect/expect.py", line 49, in eof
    raise EOF(msg)
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
<pexpect.pty_spawn.spawn object at 0x7ff999081510>
command: /usr/bin/java
args: ['/usr/bin/java', '-Xmx3g', '-cp', 'stanford-corenlp-full-2014-08-27/stanford-corenlp-3.4.1.jar:stanford-corenlp-full-2014-08-27/stanford-corenlp-3.4.1-models.jar:stanford-corenlp-full-2014-08-27/xom.jar:stanford-corenlp-full-2014-08-27/joda-time.jar:stanford-corenlp-full-2014-08-27/jollyday.jar:stanford-corenlp-full-2014-08-27/ejml-0.23.jar', 'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-props', '/root/corenlp-python/corenlp/default.properties']
searcher: None
buffer (last 100 chars): ''
before (last 100 chars): ' ner\r\nLoading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... '
after: <class 'pexpect.exceptions.EOF'>
match: None
match_index: None
exitstatus: None
flag_eof: True
pid: 5804
child_fd: 6
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 8192
ignorecase: False
searchwindowsize: 80
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1

jsonrpc import error: ValueError, err :

Hello, So iv been trying to use corenlp as a wrapper for Stanfordnlp for coreference resolution. But im having issues with the corenlp.py file. There was one error in the downloaded file which was

Exception, err:
needs to writted as
Exception as err:

But when i correct this the jsonrpc import doesnt work as a method within the import is throwing this error.

Traceback (most recent call last):
File "corenlp.py", line 24, in
import jsonrpc, pexpect
File "D:\NLP\NaturalLanguageProcessing\stanford-corenlp-python\jsonrpc.py", line 376
except ValueError, err:
^
SyntaxError: invalid syntax

Any help would be much appreciated, Thanks in advance. Also would be agreat help if you could suggest any known API's for coreference resolution or a wrapper for stanfordnlp that has coreference resolution

How to add chinese models to this module?

File "corenlp.py", line 226 except Exception, e: Syntax error

Trying to run this in the terminal on my mac, but when I try running python corenlp.py, it gives me a syntax error at line 226 of corenlp.py. Any way to fix this?

I really need to get this working soon.

Error in client.py

When I ran the client.py ,it says that Tree has no such attribute 'parse'.
Also, I am not getting how to extract the dependencies using this

Sentiment Analysis Confidence Scores

Hello,

For sentiment analysis I'm able to obtain the score that corresponds to the class with the highest estimated probability, but I'm unable to produce the estimations themselves (e.g. [very_negative = 0.60, negative = 0.25, neutral = 0.10, positive = 0.025, very_positive = 0.025]). I'd like to filter probabilities below a certain confidence threshold.

Thank you.

jsonrpc.py randomly fails

I am processing large paragraphs using this python interface. If it matters, I have set the encoding to UTF8 because of some characters in the data, and the paragraphs/sentences are fairly large . When I try to execute a script, and make a request to the running core-nlp server, it fails randomly by throwing the error:

jsonrpc.RPCParseError: <RPCFault -32700: 'Parse error.' (u'No valid JSON. (Unterminated string starting at: line 1 column 50 (char 49))')>

And I use the word "randomly" because if and when it fails and I simply try it 3-4 times more, it starts working perfectly. This is a problem if I want to iteratively make calls to the server, as it throws an error in the randomly at any point in the loop and fails.

Does it have anything to do with the fact that

a) The paragraph/sentence size is fairly large(usually 200-400 words).
OR
b) I am using UTF8 encoding.

Or is it something completely else?

Note: I am using Python 2.7.12 (if that matters)

Support for sentiment analysis

Hi, I was planning to use the python wrapper but I am not sure if it has support for sentiment analysis like the original sanfordCoreNLP. If yes, please share some documentation.

Error when processing Chinese text

After I start the server (with trained Chinese models and properties file), I test the server with a Chinese sentence by replacing the example English sentence in client.py, i.e.

#result = nlp.parse(u"Hello world!  It is so beautiful.")
result = nlp.parse(u"今天天气真不错啊！")

Traceback (most recent call last):
File "client.py", line 17, in
result = nlp.parse(u"今天天气真不错啊！")
File "client.py", line 13, in parse
return json.loads(self.server.parse(text))
File "/home/kqc/github/stanford-corenlp-python/jsonrpc.py", line 934, in call
return self.__req(self.__name, args, kwargs)
File "/home/kqc/github/stanford-corenlp-python/jsonrpc.py", line 907, in __req
resp = self.__data_serializer.loads_response( resp_str )
File "/home/kqc/github/stanford-corenlp-python/jsonrpc.py", line 626, in loads_response
raise RPCInternalError(error_data)
jsonrpc.RPCInternalError: <RPCFault -32603: 'Internal error.' (None)>

Could you show me how to fix this?

Very long texts

I am trying to parse a text which is 1297 characters long but it returns an empty sentence. If I use a different timeout value in the file client.py, let's say 200.0, after that time passes the code raises an jsonrpc.RPCTransportError: timed out exception.

Could you tell me what I am supposed to modify in the code to make client.py work with longer texts?

Thanks,
michele.

TokensRegex or regexner annotators in corenlp Python

I am wondering if there is any documentation of how to use regexner and TokensRegex annotators in Python wrapper of corenlp. And also, how can I use my own customised regular expression?

[Errno 10061] No connection could be made because the target machine actively refused it

hi,
when i type server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080))) and then result = loads(server.parse("Hello world. It is so beautiful")) this eeror apears:
Traceback (most recent call last):
File "<pyshell#27>", line 1, in
result = loads(server.parse("Hello world. It is so beautiful"))
RPCTransportError: [Errno 10061] No connection could be made because the target machine actively refused it

i turn off my firewall but can not solve this error.
what should i do?

Dependency Problem

The IDs you stripped from the dependencies in remove_id() should stay there. If two identical words occur in the same sentence, and you strip the word-id from the results, there's no way for us to easily disambiguate them (hence, why Stanford explicitly put them there)

parse returning as a string rather than a dictionary.

I'm trying to follow the instructions:

from corenlp import *
corenlp = StanfordCoreNLP()
corenlp.parse("This is a test.")

When I do this it returns something like this:
'{"coref": [[[["This", 0, 0, 0, 1], ["a test", 0, 3, 2, 4]]]], "sentences": [{"parsetree": "(ROOT (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (NN test))) (. .)))", "text": "This is a test.", "dependencies": [["root", "ROOT", "test"], ["nsubj", "test", "This"], ["cop", "test", "is"], ["det", "test", "a"]], "words": [["This", {"NamedEntityTag": "O", "CharacterOffsetEnd": "4", "Lemma": "this", "PartOfSpeech": "DT", "CharacterOffsetBegin": "0"}], ["is", {"NamedEntityTag": "O", "CharacterOffsetEnd": "7", "Lemma": "be", "PartOfSpeech": "VBZ", "CharacterOffsetBegin": "5"}], ["a", {"NamedEntityTag": "O", "CharacterOffsetEnd": "9", "Lemma": "a", "PartOfSpeech": "DT", "CharacterOffsetBegin": "8"}], ["test", {"NamedEntityTag": "O", "CharacterOffsetEnd": "14", "Lemma": "test", "PartOfSpeech": "NN", "CharacterOffsetBegin": "10"}], [".", {"NamedEntityTag": "O", "CharacterOffsetEnd": "15", "Lemma": ".", "PartOfSpeech": ".", "CharacterOffsetBegin": "14"}]]}]}'

Where it is a dictionary wrapped in quotes making it a string. I'm not sure what I'm doing wrong...

Is it possible to use this lib to train a model?

Hi,

Thus the standford-corenlop-python assume that the model was trained beforehand? Or it provides some way to do it?

Very best regards,
Emanuel

I have an error and, if it's something you're aware of, wondered if you can help me with a fix?

python corenlp.py
Traceback (most recent call last):
File "corenlp.py", line 257, in
nlp = StanfordCoreNLP()
File "corenlp.py", line 163, in init
self.corenlp = pexpect.spawn(start_corenlp)
File "/usr/local/lib/python2.7/dist-packages/pexpect/pty_spawn.py", line 198, in init
self._spawn(command, args, preexec_fn, dimensions)
File "/usr/local/lib/python2.7/dist-packages/pexpect/pty_spawn.py", line 271, in _spawn
'executable: %s.' % self.command)
pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: java.

How do I change models for NER?

How do set the model:

ner.model.3class = /u/nlp/data/ner/goodClassifiers/all.3class.distsim.crf.ser.gz
ner.model.7class = /u/nlp/data/ner/goodClassifiers/muc.distsim.crf.ser.gz
ner.model.MISCclass = /u/nlp/data/ner/goodClassifiers/conll.distsim.crf.ser.gz

Could you please help me that explain what the result of coreference resolution means?

I tried the tools and get the result like:
Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.
"coref": [[[["He", 1, 0, 0, 1], ["Barack Obama", 0, 1, 0, 2]], [["the president", 1, 3, 2, 4], ["Barack Obama", 0, 1, 0, 2]], [["Obama", 2, 0, 0, 1], ["Barack Obama", 0, 1, 0, 2]]]]
So could you please help that what it means? Especially what the indices in the list mean?
Thank you very much!

hardcoded lib and jar versions

I noticed some hardcoded lib and jar versions within the python source code itself. Are these libraries only compatible with certain versions of corenlp or are we expected to search through the code and change every reference to specific filenames and jars whenever update our local corenlp?

How can I use the -nthreads argument?

I read on the corenlp page that multithreading is supported for the parser by use of the -nthreads k argument. How can I implement this with the python wrapper?

AttributeError: 'StanfordCoreNLP' object has no attribute 'parse_imperative'

Hi Dustin,

I am not sure if you are aware of the problem, when I try to run the corenlp.py, I get the following error

Starting the Stanford Core NLP parser.
Loading Models: 5/5 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| plays hard to get, smiles from time to time
NLP tools loaded.
Traceback (most recent call last):
  File "corenlp.py", line 295, in <module>
    server.register_function(nlp.parse_imperative)
AttributeError: 'StanfordCoreNLP' object has no attribute 'parse_imperative'

Commenting out the line 295 solved the problem. I have quickly scanned the code, and could not locate parse_imperative method. I am not very experienced with Python, may be I have missed something.

I wanted you to know

Thanks for the great work! Keep up.

jsonrpc.RPCInternalError: <RPCFault -32603: 'Internal error.' (None)>

i am trying to parse arabic text with python and i have got this error
Traceback (most recent call last):
File "client.py", line 16, in
result = nlp.parse(u"ﻊﻗﻮﺘﻤﻟا ﻦﻣ .ﺕﺎﺑﺎﻐﻟﺎﺑ ﻯﺫﻷا ﺕﺎﻄﻗﺎﺴﺘﻟا ﻲﻓﻭ ﺓﺭاﺮﺤﻟا ﻲﻓ ﺕاﺮﻴﻐﺘﻟا ﻖﺤﻠﺗ")
File "client.py", line 13, in parse
return json.loads(self.server.parse(text))
File "/home/arezki/stanford-corenlp-python/jsonrpc.py", line 934, in call
return self.__req(self.__name, args, kwargs)
File "/home/arezki/stanford-corenlp-python/jsonrpc.py", line 907, in __req
resp = self.__data_serializer.loads_response( resp_str )
File "/home/arezki/stanford-corenlp-python/jsonrpc.py", line 626, in loads_response
raise RPCInternalError(error_data)
jsonrpc.RPCInternalError: <RPCFault -32603: 'Internal error.' (None)>

corenlp.py fails for 3.9.0

I'm aware that the repo mentions the code for stanford-corenlp-3.4.1, but I had 3.9.0, and changed the path and models in corenlp.py accordingly.

Then it gets stuck on Loading models 4/5, and then throws a timeout error. Please look into this.

Instanciate StanfordCoreNLP with different annotators

I'm using the StanfordCoreNLP class to do NER on some text. Then somewhere else in my program I only need to do POS tagging, but performance is uselessly slowed down by NER. I see that I can edit the default.properties file to remove the annotators I don't need, but that would change every instance of StanfordCoreNLP, which won't work.

Right now I'm thinking of modifying StanfordCoreNLP's init to allow a custom string for props to be passed, and create several files that contain the annotator lists I need. This might work for now, but I'd like to know if you see a better way, and if you'd be interested in allowing StanfordCoreNLP instances to be created with an optional annotator list.

Error while launching the server, i.e. running the command python corenlp.py

This is the error:
Traceback (most recent call last):
File "", line 1, in
File "corenlp.py", line 176, in init
self.corenlp.expect("done.", timeout=200) # Loading PCFG (~3sec)
File "/Users/mihir.saxena/virtualenvironment/my_new_project/lib/python2.7/site-packages/pexpect/spawnbase.py", line 327, in expect
timeout, searchwindowsize, async_)
File "/Users/mihir.saxena/virtualenvironment/my_new_project/lib/python2.7/site-packages/pexpect/spawnbase.py", line 355, in expect_list
return exp.expect_loop(timeout)
File "/Users/mihir.saxena/virtualenvironment/my_new_project/lib/python2.7/site-packages/pexpect/expect.py", line 102, in expect_loop
return self.eof(e)
File "/Users/mihir.saxena/virtualenvironment/my_new_project/lib/python2.7/site-packages/pexpect/expect.py", line 49, in eof
raise EOF(msg)
pexpect.exceptions.EOF: End Of File (EOF). Empty string style platform.
<pexpect.pty_spawn.spawn object at 0x10ca092d0>
command: /usr/bin/java
args: ['/usr/bin/java', '-Xmx1800m', '-cp', './stanford-corenlp-full-2014-08-27/stanford-corenlp-3.4.1.jar:./stanford-corenlp-full-2014-08-27/stanford-corenlp-3.4.1-models.jar:./stanford-corenlp-full-2014-08-27/joda-time.jar:./stanford-corenlp-full-2014-08-27/xom.jar:./stanford-corenlp-full-2014-08-27/jollyday.jar', 'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-props', 'default.properties']
buffer (last 100 chars): ''
before (last 100 chars): 'aders.java:185)\r\n\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)\r\n\t... 34 more\r\n'
after: <class 'pexpect.exceptions.EOF'>
match: None
match_index: None
exitstatus: None
flag_eof: True
pid: 46580
child_fd: 6
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
0: re.compile("done.")

I have verified that all the jar files are of the same version that is specified in the corenlp.py code, earlier I had used a latest version of it and appropriately updated it in corenlp.py, in either cases, getting the same error. Not able to figure it out, kindly look into this and please suggest a solution.

Installation error due to hard coding in corenlp.py

In class StanfordCoreNLP in 'corenlp.py', the jars version are hard coded, so any jars which are of updated version are not accepted hence produces an error while launching the server.

Needs change in the lookup manner.

How can Stanford corenlp be used for Sentiment Analysis?

Hi I am planning to use python wrapper for corenlp for sentiment analysis in my lectures. Can someone please point me to some right documentation? It is a bit urgent. Thanks in advance

Python3.5.3 issues

python doesn't handle:
except ValueError, err:
^
SyntaxError: invalid syntax

Needs to be "as" format. Further issues with print statements.

I can push version for py3, if you'd like. Just let me know.

-EV

RPCTransportError: argument must be an int, or have a fileno() method.

Hi Guys,

I am getting this error when i am trying to parse multiple sentences parallely. Everything works fine if i perform parsing sequentially.

parseResult = nlp.parse(sentences)
File "/Users/Vikram/Kiwi/django/app/app/app/coreNlpUtil.py", line 18, in parse
return json.loads(self.server.parse(text))
File "/Users/Vikram/Kiwi/django/app/app/app//jsonrpc.py", line 933, in call
return self.__req(self.__name, args, kwargs)
File "/Users/Vikram/Kiwi/django/app/app/app//jsonrpc.py", line 906, in __req
resp = self.__data_serializer.loads_response( resp_str )
File "/Users/Vikram/Kiwi/django/app/app/app/jsonrpc.py", line 594, in loads_response
* raise RPCParseError("No valid JSON. (%s)" % str(err))
RPCParseError: <RPCFault -32700: 'Parse error.' ('No valid JSON. (No JSON object could be decoded)')>*

How can i fix the issue?

~400ms latency problems

I noticed a parse through the json-rpc takes 400ms longer than using the java interactive shell.

What's the best way to cut this down? Is it a python issue?

Happy to work on this for a pull request.

RPCInternalError: <RPCFault -32603: 'Internal error.' (None)>

/Users/danielsampetethiyagu/github/image_caption_using_attention/coreNlpUtil.pyc in parseText(sentences)
22 def parseText(sentences):
23
---> 24 parseResult = nlp.parse(sentences)
25
26 if len(parseResult['sentences']) == 1:

/Users/danielsampetethiyagu/github/image_caption_using_attention/coreNlpUtil.pyc in parse(self, text)
16
17 def parse(self, text):
---> 18 return json.loads(self.server.parse(text))
19
20

/Users/danielsampetethiyagu/github/image_caption_using_attention/jsonrpc.py in call(self, *args, **kwargs)
932 return _method(self.__req, "%s.%s" % (self.__name, name))
933 def call(self, *args, **kwargs):
--> 934 return self.__req(self.__name, args, kwargs)
935
936 #=========================================

/Users/danielsampetethiyagu/github/image_caption_using_attention/jsonrpc.py in __req(self, methodname, args, kwargs, id)
905 except Exception,err:
906 raise RPCTransportError(err)
--> 907 resp = self.__data_serializer.loads_response( resp_str )
908 return resp[0]
909

/Users/danielsampetethiyagu/github/image_caption_using_attention/jsonrpc.py in loads_response(self, string)
624 raise RPCInvalidMethodParams(error_data)
625 elif data["error"]["code"] == INTERNAL_ERROR:
--> 626 raise RPCInternalError(error_data)
627 elif data["error"]["code"] == PROCEDURE_EXCEPTION:
628 raise RPCProcedureException(error_data)

RPCInternalError: <RPCFault -32603: 'Internal error.' (None)>

Windows run of "python corenlp.py" Error

Use Windows 7 machine,
Python 2.7.11
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

Traceback (most recent call last):
File "corenlp.py", line 257, in
nlp = StanfordCoreNLP()
File "corenlp.py", line 163, in init
self.corenlp = pexpect.spawn(start_corenlp)
AttributeError: 'module' object has no attribute 'spawn'

Import stanford-corenlp-python as a module

When I try importing the corenlp class from a python script (exampleRun.py) that is not in the same stanford-corenlp-pyhton directory like this:

from corenlp import *
corenlp = StanfordCoreNLP("path_to_stanford-corenlp-full-2014-08-27/")

the following error is raised from pexpect:

Loading Models: 0/5
Traceback (most recent call last):
File "/home/matteorr/Project1/exampleRun.py", line 4, in
corenlp = StanfordCoreNLP("/home/matteorr/stanford-corenlp-pyhton/stanford-corenlp-full-2014-08-27/")
File "/home/matteorr/stanford-corenlp-pyhton/corenlp.py", line 168, in init
self.corenlp.expect("done.", timeout=20) # Load pos tagger model (~5sec)
File "/usr/lib/python2.7/dist-packages/pexpect.py", line 1311, in expect
return self.expect_list(compiled_pattern_list, timeout, searchwindowsize)
File "/usr/lib/python2.7/dist-packages/pexpect.py", line 1325, in expect_list
return self.expect_loop(searcher_re(pattern_list), timeout, searchwindowsize)
File "/usr/lib/python2.7/dist-packages/pexpect.py", line 1396, in expect_loop
raise EOF (str(e) + '\n' + str(self))
pexpect.EOF: End Of File (EOF) in read_nonblocking(). Exception style platform.
<pexpect.spawn object at 0x7f7106fb3650>
version: 2.3 ($Revision: 399 $)
command: /usr/bin/java
args: ['/usr/bin/java', '-Xmx1800m', '-cp', '/home/matteorr/stanford-corenlp-pyhton/stanford-corenlp-full-2014-08-27/stanford-corenlp-3.4.1.jar:/home/matteorr/stanford-corenlp-pyhton/stanford-corenlp-full-2014-08-27/stanford-corenlp-3.4.1-models.jar:/home/matteorr/stanford-corenlp-pyhton/stanford-corenlp-full-2014-08-27/joda-time.jar:/home/matteorr/stanford-corenlp-pyhton/stanford-corenlp-full-2014-08-27/xom.jar:/home/matteorr/stanford-corenlp-pyhton/stanford-corenlp-full-2014-08-27/jollyday.jar', 'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-props', 'default.properties']
searcher: searcher_re:
0: re.compile("done.")
buffer (last 100 chars):
before (last 100 chars): va:448)
at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:869)
... 2 more

after: <class 'pexpect.EOF'>
match: None
match_index: None
exitstatus: None
flag_eof: True
pid: 28392
child_fd: 3
closed: False
timeout: 30
delimiter: <class 'pexpect.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1

The same script, run in the same directory as corenlp works fine.
Is this expected behavior or is something wrong?

Thanks in advance for your help.
I apologize if this was not the correct place post this issue.

Best regards,

matteorr

Arabic language

is this corenlp can be used for Arabic?

Abount connection refused

When I run
result = loads(server.parse("Hello world. It is so beautiful"))
It is an connection error.

Traceback (most recent call last):
File "", line 1, in
File "jsonrpc.py", line 934, in call
return self.__req(self.__name, args, kwargs)
File "jsonrpc.py", line 906, in __req
raise RPCTransportError(err)
jsonrpc.RPCTransportError: [Errno 111] Connection refused

result = loads(server.parse("Hello world. It is so beautiful"))
Traceback (most recent call last):
File "", line 1, in
File "jsonrpc.py", line 934, in call
return self.__req(self.__name, args, kwargs)
File "jsonrpc.py", line 906, in __req
raise RPCTransportError(err)
jsonrpc.RPCTransportError: [Errno 111] Connection refused

Parsing Q

I'm not sure why, but when I pass 'Q' to the coreNLP server, it breaks down.

Here is the code I'm using:

>>> import jsonrpc
>>> server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))
>>> server.parse('Q')
u'{"sentences": []}'

Here is the server error:

NLP> 
========================================
Q
Annotation pipeline timing information:
PTBTokenizerAnnotator: 0.0 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.0 sec.
MorphaAnnotator: 0.0 sec.
NERCombinerAnnotator: 0.1 sec.
ParserAnnotator: 0.5 sec.
DeterministicCorefAnnotator: 0.0 sec.
TOTAL: 0.7 sec. for 11 tokens at 16.5 tokens/sec.
Pipeline setup: 13.4 sec.
Total time for StanfordCoreNLP pipeline: 75.1 sec.

I'm not sure, if this is a feature or a bug.

weird UnicodeDecodeError in StanfordCoreNLP.parse()

Hi Dustin,

I just found a really weird error. While corenlp can parse '100 dollars' just fine, '100 yen' causes it to crash.

Python 2.7.3 (default, Feb 27 2014, 19:37:34) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import corenlp
>>> c = corenlp.StanfordCoreNLP()
Loading Models: 5/5                                                                                                                                                                                                                         
>>> c.parse('100 dollars')
'{"sentences": [{"parsetree": "(ROOT (X (NP (CD 100) (NNS dollars))))", "text": "100 dollars", "dependencies": [["root", "ROOT", "dollars"], ["num", "dollars", "100"]], "words": [["100", {"NormalizedNamedEntityTag": "$100.0", "Lemma": "100", "CharacterOffsetEnd": "3", "PartOfSpeech": "CD", "CharacterOffsetBegin": "0", "NamedEntityTag": "MONEY"}], ["dollars", {"NormalizedNamedEntityTag": "$100.0", "Lemma": "dollar", "CharacterOffsetEnd": "11", "PartOfSpeech": "NNS", "CharacterOffsetBegin": "4", "NamedEntityTag": "MONEY"}]]}]}'

>>> c.parse('100 yen')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/corenlp-3.4.1-py2.7.egg/corenlp.py", line 240, in parse
    response = self._parse(text)
  File "/usr/local/lib/python2.7/dist-packages/corenlp-3.4.1-py2.7.egg/corenlp.py", line 230, in _parse
    raise e
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 169: ordinal not in range(128)

Any ideas?

Attribute error in client.py

Hi, I have nltk version 3.0.3 and i am getting this error.

tree = Tree.parse(result['sentences'][0]['parsetree'])
AttributeError: type object 'Tree' has no attribute 'parse'

Multiple occurrences of a word not handled properly while creating tuples

If there are multiple occurrences of a word in a sentence, lack of ids makes it impossible to identify the source and target of a dependency correctly.

If you are open to accepting a patch for this, I can submit one. My idea is to keep the ids in the "tuples" and store the dependents of a word in the "words" array.

Certain characters lead to Internal Error

I am trying to parse the sentence

WASHINGTON — Republicans on Thursday vowed a swift and forceful response to the executive action on immigration that President Obama is to announce in a prime-time address, accusing the president of exceeding the power of his office and promising a legislative fight when they take full control of Congress next year.

but I keep getting the error

Traceback (most recent call last):
  File "client.py", line 19, in <module>
    result = nlp.parse(text2)
  File "client.py", line 12, in parse
    return json.loads(self.server.parse(text))
  File "/Users/Pi_Joules/projects/kompact/stanford-corenlp-python/jsonrpc.py", line 934, in __call__
    return self.__req(self.__name, args, kwargs)
  File "/Users/Pi_Joules/projects/kompact/stanford-corenlp-python/jsonrpc.py", line 907, in __req
    resp = self.__data_serializer.loads_response( resp_str )
  File "/Users/Pi_Joules/projects/kompact/stanford-corenlp-python/jsonrpc.py", line 626, in     loads_response
    raise RPCInternalError(error_data)
jsonrpc.RPCInternalError: <RPCFault -32603: 'Internal error.' (None)>

The error doesn't appear though when I remove the EM Dash (—) in the first sentence. The same goes for curly single and double quotes like “”. Is there any way I can still parse these characters in this wrapper?

Thanks

Getting sentiment value via server implementation

Hi, i am interested in using the server implementation of your wrapper but it seems it doesn't seem to output the sentiment score while in the package implementation, there is a field for the same. What is the cause of this difference?

how can corenlp handle non-ascii string?

I put the word 'Víctor' into corenlp.parse. 'Víctor' contains non-ascii character. I would like to get the lemma of 'Víctor'. But when I put corenlp.parse('Víctor'). It gives error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128).

How can I change corenlp setting, so corenlp can handle non-ascii string?

Corenlp.py does not loading any modules

Traceback (most recent call last):
File "D:\fahma\corefernce resolution\stanford-corenlp-python-master\corenlp.py", line 281, in
nlp = StanfordCoreNLP()
File "D:\fahma\corefernce resolution\stanford-corenlp-python-master\corenlp.py", line 173, in init
self.corenlp.expect("done.", timeout=20) # Load pos tagger model (~5sec)
File "C:\Python27\lib\site-packages\pexpect\spawnbase.py", line 341, in expect
timeout, searchwindowsize, async_)
File "C:\Python27\lib\site-packages\pexpect\spawnbase.py", line 369, in expect_list
return exp.expect_loop(timeout)
File "C:\Python27\lib\site-packages\pexpect\expect.py", line 117, in expect_loop
return self.eof(e)
File "C:\Python27\lib\site-packages\pexpect\expect.py", line 63, in eof
raise EOF(msg)
EOF: End Of File (EOF).
<pexpect.popen_spawn.PopenSpawn object at 0x021863B0>
searcher: searcher_re:
0: re.compile('done.')

RPCTransportError: timed out

Hi,
I was trying to use the client.py code to parse a long paragraph. It generates the following error message:
File "/home/mings/Toolkits/stanford-corenlp-python/jsonrpc.py", line 934, in __call__
return self.__req(self.__name, args, kwargs)
File "/home/mings/Toolkits/stanford-corenlp-python/jsonrpc.py", line 906, in __req
raise RPCTransportError(err)
jsonrpc.RPCTransportError: timed out

I find this not very consistent. Sometimes, it is able to parse, but sometimes it is not.

[EDIT]
I changed the default timeouts in jsonrpc.py to 20 secs, it seems to work fine now.

Corenlp.py does not go further after loading all 5 modules

Traceback (most recent call last):
  File "corenlp.py", line 257, in <module>
    nlp = StanfordCoreNLP()
  File "corenlp.py", line 178, in __init__
    self.corenlp.expect("Entering interactive shell.")
  File "/home/whiskey/.local/lib/python2.7/site-packages/pexpect/spawnbase.py", line 341, in expect
    timeout, searchwindowsize, async_)
  File "/home/whiskey/.local/lib/python2.7/site-packages/pexpect/spawnbase.py", line 369, in expect_list
    return exp.expect_loop(timeout)
  File "/home/whiskey/.local/lib/python2.7/site-packages/pexpect/expect.py", line 116, in expect_loop
    return self.timeout(e)
  File "/home/whiskey/.local/lib/python2.7/site-packages/pexpect/expect.py", line 80, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f1cbb072050>
command: /usr/bin/java
args: ['/usr/bin/java', '-Xmx1800m', '-cp', './stanford-corenlp-full-2018-02-27/stanford-corenlp-3.9.1.jar:./stanford-corenlp-full-2018-02-27/stanford-corenlp-3.9.1-models.jar:./stanford-corenlp-full-2018-02-27/joda-time.jar:./stanford-corenlp-full-2018-02-27/xom.jar:./stanford-corenlp-full-2018-02-27/jollyday.jar', 'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-props', 'default.properties']
buffer (last 100 chars): '[0.7 sec].\r\nAdding annotator dcoref\r\n'
before (last 100 chars): '[0.7 sec].\r\nAdding annotator dcoref\r\n'
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 7185
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
    0: re.compile("Entering interactive shell.")