Giter VIP home page Giter VIP logo

Comments (15)

vruusmann avatar vruusmann commented on July 19, 2024

Very nice - I can reproduce the StackOverflowError using your example files. Will investigate and fix it in the upcoming JPMML-SkLearn version that will be released either later today or tomorrow.

I suspect that Scikit-Learn has changed something about the encoding of random forest models. I've tested with Scikit-Learn versions 0.16.0 through 0.17.1. What's your Scikit-Learn version?

import sklearn
print(sklearn.__version__)

from jpmml-converter.

 avatar commented on July 19, 2024

Thank you very much. The version is 0.17.

from jpmml-converter.

vruusmann avatar vruusmann commented on July 19, 2024

This looks like a legitimate StackOverflowError, because the first member tree model in your random forest model is over 2000 levels deep. That's highly unusual.

How was your sklearn.ensemble.RandomForestRegressor instance parametrized? You should set the value of max_depth parameter to some sensible value such as 100.

from jpmml-converter.

vruusmann avatar vruusmann commented on July 19, 2024

There's a related issue, where a StackOverflowError happens when converting a random forest model that has been trained using the Iris dataset. It should be impossible to train a 2000-level deep tree model using a dataset that contains only 150 training instances.

jpmml/sklearn2pmml#4

from jpmml-converter.

 avatar commented on July 19, 2024

Thank you very much for your prompt response. I have set the max_depth to 100 and still getting the error. My java version is 1.7.0_79.

from jpmml-converter.

 avatar commented on July 19, 2024

I have also tested it with Oracle Java 1.8.0_40.

from jpmml-converter.

 avatar commented on July 19, 2024

The error however has changed to:

Exception in thread "main" java.lang.StackOverflowError
at sun.misc.FDBigInteger.leftShift(FDBigInteger.java:511)
at sun.misc.FDBigInteger.valueOfMulPow52(FDBigInteger.java:324)
at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(FloatingDecimal.java:714)
at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.access$100(FloatingDecimal.java:259)
at sun.misc.FloatingDecimal.getBinaryToASCIIConverter(FloatingDecimal.java:1785)
at sun.misc.FloatingDecimal.getBinaryToASCIIConverter(FloatingDecimal.java:1738)
at sun.misc.FloatingDecimal.toJavaFormatString(FloatingDecimal.java:70)
at java.lang.Double.toString(Double.java:204)
at org.jpmml.converter.ValueUtil.formatValue(ValueUtil.java:118)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:81)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)

which is the same as
https://github.com/jpmml/sklearn2pmml/issues/4

Which java version should I use?

from jpmml-converter.

vruusmann avatar vruusmann commented on July 19, 2024

You probably can't solve the issue simply by using a different Java version.

The problem is more fundamental, and appears to be an unpickling error (which is manifested on some Java versions, and not on others) or something like that. As a result, we have a situation where the unpickled Scikit-Learn data contains (invalid-) cross-references, which make the TreeModelUtil#encodeNode jump back and forth between two nodes, until the JVM dies with a StackOverflowError.

from jpmml-converter.

vruusmann avatar vruusmann commented on July 19, 2024

How were the example pickle files in the Model1.zip file generated? I am unable to unpickle them for closer inspection using either sklearn.externals.joblib or pickle modules:

>>> from sklearn.externals import joblib
>>> forest = joblib.load("pp_model_1_forest.pkl")

Traceback (most recent call last):
  File "load_joblib.py", line 3, in <module>
    forest = joblib.load("pp_model_1_forest.pkl")
  File "/usr/lib/python3.4/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 459, in load
    obj = unpickler.load()
  File "/usr/lib64/python3.4/pickle.py", line 1038, in load
    dispatch[key[0]](self)
  File "/usr/lib64/python3.4/pickle.py", line 1384, in load_reduce
    value = func(*args)
  File "sklearn/tree/_tree.pyx", line 579, in sklearn.tree._tree.Tree.__cinit__ (sklearn/tree/_tree.c:6774)
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'int'

and

>>> import pickle
>>> forest = pickle.load(open("pp_model_1_forest.pkl", "rb"))

Traceback (most recent call last):
  File "load_pickle.py", line 3, in <module>
    forest = pickle.load(open("pp_model_1_forest.pkl", "rb"))
_pickle.UnpicklingError: invalid load key, 'Z'.

from jpmml-converter.

 avatar commented on July 19, 2024

test.zip
I receive the same error for loading the pickle even for the Iris example provided (see test.zip). I have also put complied jar file. So may be the problem is in the joblib dump of the random forest not in the converter?

def store_pkl(obj, name):
joblib.dump(obj,"pkl/" + name, compress = 9)

from jpmml-converter.

vruusmann avatar vruusmann commented on July 19, 2024

The JPMML-SkLearn library should be able to consume the following dumps:

  1. sklearn.externals.joblib
  2. joblib
  3. pickle

Option 1 is recommended by Scikit-Learn documentation (eg. see http://scikit-learn.org/stable/modules/model_persistence.html). However, it may happen that this module is outdated and/or out of sync with other modules.

You could try dumping the RF object manually using options 2 and 3, and use the JPMML-SkLearn command-line application to do the conversion.

from jpmml-converter.

 avatar commented on July 19, 2024

I have tested all methods for dumping the .pkl files. Still stackoverflow error even with Iris data. The log file is provided in the attached file.
test.zip

I use Python 2.7 32bit (Anaconda).

This is the code for the model1.zip
from sklearn.externals import joblib
model 1.zip

def store_pkl(obj, name):
joblib.dump(obj,"pkl/" + name, compress = 9)

pp_model_regression = LinearRegression()
pp_model_regression.fit(pp_X, pp_y)

pp_model_forest = RandomForestRegressor(max_depth=100,min_samples_leaf = 5)
pp_model_forest.fit(pp_X, pp_y)

store_pkl(pp_mapper, "pp_mapper_1.pkl")
store_pkl(pp_model_regression, "pp_model_1_regression.pkl")
store_pkl(pp_model_forest, "pp_model_1_forest.pkl")

you should be able to load them with joblib. Can you please try again? I tried different java versions as well. So I am really confused.

from jpmml-converter.

vruusmann avatar vruusmann commented on July 19, 2024

I use Python 2.7 32bit (Anaconda)

This could be a 32-bit vs. 64-bit compatibility issue.

I'm running a 64-bit OS, and the JPMML-SkLearn project has been tested against 64-bit versions of Python2(.7) and Python3(.4).

My unpickling error message (ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'int') fits perfectly into this picture, as for me SIZE_t is long, not int.

from jpmml-converter.

 avatar commented on July 19, 2024

Fixed! Thank you very much for all your help. The problem was the compatibility of python 32 and java 64.

from jpmml-converter.

vruusmann avatar vruusmann commented on July 19, 2024

Closing this issue in favour of the following one: jpmml/jpmml-sklearn#6

from jpmml-converter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.