Comments (11)
Oh, okay \x00
is really the plain 0
, I overlooked that. This causes C string handling to terminate it. This makes sense now. Then I'd suggest to use the encoding UTF-8 which should be close to the original NewStringUTF
-based implementation.
The lines
jclass strClass = (*env)->FindClass(env, "Ljava/lang/String;");
jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BLjava/lang/String;)V");
can be removed because in JyNI these IDs are already cached at startup. I suggest this implementation:
jstring charsetName;
jobject strByteArray, jstr;
jbyte* strBytes;
Py_ssize_t len = PyString_GET_SIZE(src);
env(NULL);
charsetName = (*env)->NewStringUTF(env, "UTF-8");
strByteArray = (*env)->NewByteArray(env, len);
strBytes = (*env)->GetByteArrayElements(env, strByteArray, NULL);
memcpy(strBytes, PyString_AS_STRING(src), len);
(*env)->ReleaseByteArrayElements(env, strByteArray, strBytes, 0); //copy back and free buffer
jstr = (*env)->NewObject(env, stringClass,
string_fromBytesAndCharsetNameConstructor, strByteArray, charsetName);
//todo: check interned-regulations on jython-side
if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);
We should move charsetName = (*env)->NewStringUTF(env, "UTF-8");
to be a global constant since this here might be a heavily used function. Same with charsetName = (*env)->NewStringUTF(env, "UTF-16BE");
in the PyUnicode equivalent. But that's another todo.
from jyni.
Thanks for reporting this issue. However, a copy/paste-able form would have been convenient for reproduction, so I'll share it here:
import sys
sys.path.append('/usr/lib/python2.7/lib-dynload')
import mmap
print mmap
f = open('jython.jar', "rb")
mp = mmap.mmap(int(f.fileno()), 0, access=mmap.ACCESS_READ)
print repr(mp[:8])
print len(mp)
Added a length output, which at least yields the correct value.
Does the truncation always occur at a line break? That might be a starting point.
I cannot tell right now when I will find time to investigate this further.
Feel free to track down this issue in native code, then I would help as far as I can.
from jyni.
I just tried it with some text file and there it works fine. Seems to be unrelated to newline characters.
On the other hand I tried it on a distinct binary file, where I could even provoke a crash:
Traceback (most recent call last):
File "/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py", line 17, in <module>
print repr(mp[:60])
java.lang.NullPointerException
at org.python.core.__builtin__.repr(__builtin__.java:1089)
at org.python.core.BuiltinFunctions.__call__(__builtin__.java:105)
at org.python.core.PyObject.__call__(PyObject.java:460)
at org.python.pycode._pyx0.f$0(/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py:20)
at org.python.pycode._pyx0.call_function(/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1589)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
at org.python.util.jython.run(jython.java:366)
at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException
So, this seems to depend somehow on file content. print repr(mp[:60])
and any higher value cause the crash for that file while e.g. print repr(mp[:50])
succeeds. There must be specific values in the data that blow up string conversion. Maybe this is a general issue with converting arbitrary data to strings or an encoding issue. Maybe it actually boils down to a Jython or even a Java thing. Does non-string access to the data work okay to some extend?
from jyni.
Thanks for your help. I think the problem is due to \x00. i have changed the function JySync_Init_JyString_From_PyString in JyNI-C/JySync.c . it seems to work fine.
this is my change.
jobject JySync_Init_JyString_From_PyString(PyObject* src, jclass subtype)
{
jstring jstr;
env(NULL);
int size = ((PyStringObject *)(src))->ob_size;
jclass strClass = (*env)->FindClass(env, "Ljava/lang/String;");
jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BLjava/lang/String;)V");
jbyteArray bytes = (*env)->NewByteArray(env, size);
(*env)->SetByteArrayRegion(env, bytes, 0, size, (jbyte*) ((PyStringObject *)(src))->ob_sval);
jstring encoding = (*env)->NewStringUTF(env, "GB2312");
jstr = (*env)->NewObject(env, strClass, ctorID, bytes, encoding);
if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);
}
from jyni.
But the crash still appears.
>>> mp[13:14]
'J'
>>> mp[12:14]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
java.lang.NullPointerException
at org.python.core.PySystemState.displayhook(PySystemState.java:1489)
at org.python.core.PySystemStateFunctions.__call__(PySystemState.java:1905)
at org.python.core.PyObject.invoke(PyObject.java:3727)
at org.python.core.Py.printResult(Py.java:2268)
at org.python.pycode._pyx40.f$0(<stdin>:1)
at org.python.pycode._pyx40.call_function(<stdin>)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1614)
at org.python.core.Py.exec(Py.java:1658)
at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:276)
at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:131)
at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:116)
at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:62)
at org.python.util.InteractiveConsole.push(InteractiveConsole.java:187)
at org.python.util.InteractiveConsole._interact(InteractiveConsole.java:168)
at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:126)
at org.python.util.jython.run(jython.java:419)
at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException
>>> mp[13:14]
'J'
>>> mp[10:11]
'n'
>>> mp[10:12]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
java.lang.NullPointerException
at org.python.core.PySystemState.displayhook(PySystemState.java:1489)
at org.python.core.PySystemStateFunctions.__call__(PySystemState.java:1905)
at org.python.core.PyObject.invoke(PyObject.java:3727)
at org.python.core.Py.printResult(Py.java:2268)
at org.python.pycode._pyx43.f$0(<stdin>:1)
at org.python.pycode._pyx43.call_function(<stdin>)
at org.python.core.PyTableCode.call(PyTableCode.java:171)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1614)
at org.python.core.Py.exec(Py.java:1658)
at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:276)
at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:131)
at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:116)
at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:62)
at org.python.util.InteractiveConsole.push(InteractiveConsole.java:187)
at org.python.util.InteractiveConsole._interact(InteractiveConsole.java:168)
at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:126)
at org.python.util.jython.run(jython.java:419)
at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException
from jyni.
Okay, these seem to be two separate things. The idea to fix it in JySync_Init_JyString_From_PyString
is promising. However, I don't see why changing to another encoding would solve it. As far as I understand it, we get bytes from a binary file that cannot be encoded as a string properly, using utf8. How can you know that this wouldn't fail with some other input for GB2312 as well? Actually I would think that an exception on attempting to turn arbitrary bytes into a string is justified. However, it does not occur with CPython, so they have some way to resolve it. We must look how it is done there and port that solution to JyNI. Do they use GB2312 in CPython? (I doubt, but haven't looked it up so far).
For the NullPointerException
we'll have to understand why PySystemState.displayHook
gets a null
-argument...
from jyni.
I modified the way to get the length of C string when it was converted to jstring. In the previous way, when the string of C was converted to jstring, the string was truncated because of \x00. Modifying the encoding method to GB2312 does not solve the problem of string truncation. The code is copied from the network, and the original code is GB2312 encoded. I found that the code can take effect so it doesn't change. I have the same results for utf8.
from jyni.
@Stewori Did you analyze the cause of NullPointerException? I couldn't find the reason.
from jyni.
Sorry, this is currently in conflict with my work on #22 in my local branch, causing additional errors. Sure, I could make a second clone for this, but anyway -- I prefer to work on one issue at a time. Given that #22 is tricky, it might take a while until I can follow up here. So far it's limited to rough comments, review and style hints, e.t.c. from my side.
from jyni.
i find pyfile in jython use the function to convent byte[] to string.
then i change the function JySync_Init_JyString_From_PyString.
jobject JySync_Init_JyString_From_PyString(PyObject* src, jclass subtype)
{
jstring charsetName;
jobject strByteArray, jstr;
jbyte* strBytes;
Py_ssize_t len = PyString_GET_SIZE(src);
env(NULL);
charsetName = (*env)->NewStringUTF(env, "UTF-8");
strByteArray = (*env)->NewByteArray(env, len);
strBytes = (*env)->GetByteArrayElements(env, strByteArray, NULL);
memcpy(strBytes, PyString_AS_STRING(src), len);
(*env)->ReleaseByteArrayElements(env, strByteArray, strBytes, 0); //copy back and free buff$
jclass strClass = (*env)->FindClass(env, "java/lang/String");
jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BIII)V");
jstr = (*env)->NewObject(env, strClass, ctorID, strByteArray, 0, 0, len);
//todo: check interned-regulations on jython-side
if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);
}
from jyni.
Did you check that the conversion matches CPython behavior?
The constructor you suggest now would fill the upper 8 bits with zeros. This should be the same like creating an array of jchar
s and assigning the bytes to it in a loop (goodbye memcopy
). Based on this we should get the same result using the constructor String(char[] value, int offset, int count)
, but avoiding a deprecated constructor.
With that taken into account, would you turn your fix into a PR?
If you would turn it into a PR, these two lines should be adjusted to fit with JyNI's approach to handle JNI:
jclass strClass = (*env)->FindClass(env, "java/lang/String");
jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BIII)V");
The first lookup can be avoided by directly using stringClass
, which is globally cached on JyNI startup. The second line should move to JyNI_JNI.c, see this post for a rough usage overview.
Otherwise I can refactor it into a commit as soon as I find time.
Thanks for finding this isue and thanks even more for solving it!
from jyni.
Related Issues (20)
- Iterator support (TypeError: argument of type 'dict' is not iterable) HOT 10
- Request to explore JavaCPP as an extension to model Python FFI. HOT 4
- JyNI-C/src/Python/pythonrun.c:1956 isnt' know struct sigaction context ocontext HOT 5
- 'getset_descriptor' object is not callable at 1668:0 in defchararray.py HOT 3
- Patch: NumPy 13.2 broken for JyNI 2.7-alpha.5 HOT 1
- nltk download crashes HOT 15
- NumPy 1.14+ compatibility HOT 8
- Name JyNI.jar Mandatory HOT 15
- support for the BufferProtocol HOT 8
- SciPy HOT 1
- Add debug target to makefile. HOT 2
- PyExc_TypeError has ob_type == NULL HOT 25
- Under memory stress JVM builds an invalid native jobject, potentially causing segfault during GC
- When using numpy, we call np.split(x,2),np.hsplit(x,2),np.vsplit(x,2), x is a numpy array. But the command didn't output anything or report any mistakes HOT 1
- Setting JyNI to work with Processing HOT 5
- Jython 2.7.2, NumPy advice HOT 1
- Can't seem to load the co_stacksize extension
- How to correcty install and use JyNI with numpy support on Jython? HOT 2
- custom module imported but functions not exported HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jyni.