Giter VIP home page Giter VIP logo

jpmml-converter's Issues

PMML conversion : Casting float to decimal causing loss of Precision.

Hi,
I am trying to generate a PMML for Isolation Forest Using sklearn2pmml. While generating a PMML file, variable thresholds are getting changed in PMML file.
It shows correct result when we print Tree using Python pickle file but in Actual PMML variable values are changed.

In Python Tree:

if AA.SUB (#1) <= 526225.988222:
                              return [[ 0.36039911]]
                            else:  # if AA.SUB (#1) > 526225.988222
                              if BB.SUB (#1) <= 5192.53104377:
                                return [[ 0.41983035]]
                              else:  # if BB.SUB (#1) > 5192.53104377
                                return [[ 0.88258597]]

In PMML Tree:

<Node id="712">
  <SimplePredicate field="AA.SUB" operator="greaterThan" value="526226"/>
  <Node id="713" score="17.018322573802887">
    <SimplePredicate field="BB.SUB" operator="lessOrEqual" value="5192.5312"/>
  </Node>
</Node>

In above case 526225.988222 getting changed in to 526226 and 5192.53104377 into 5192.5312

I have analyzed the source code of jpmml-converter and found that the way converting Float values to Double is wrong.( in ValueUtil.java ).

Can you please analyze and see if this is the issue. If yes, can you please suggest any workaround/solution for this issue.

Support for transformer-only pipelines

Based on jpmml/jpmml-sklearn#86

At the moment it's impossible to generate transformer-only pipelines, because the ModelEncoder#encodePMML(Model) method applies a set of visitors that clean the soon-to-be-generated PMML document from all unused preprocessing instructions:
https://github.com/jpmml/jpmml-converter/blob/master/src/main/java/org/jpmml/converter/ModelEncoder.java#L53-L56

Possible solution: class ModelEncoder should provide a "transformer-only" conversion mode.

Error on a pipeline with OneHotEncoder and xgboost

Hello,

I trained a PMMLPipeline with OneHotEncoder and XGBClassifier using the following code snippet.

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import OneHotEncoder
from sklearn2pmml import sklearn2pmml, PMMLPipeline
from xgboost.sklearn import XGBClassifier


mapper = DataFrameMapper(
    [(col, None) for col in numerical_cols] +
    [([col], OneHotEncoder(handle_unknown='ignore')) for col in categorical_cols]
)

pipeline = PMMLPipeline(
    steps=[
        ('mapper', mapper),
        ('classifier', XGBClassifier())
    ]
)

pipeline.fit(X,  y)

The pipeline seemed to work and I was able to use it to do predictions.
But I got an error when I tried to turn the pipeline into a pmml file
sklearn2pmml(pipeline, "testing.pmml", with_repr=True)

Standard error:
Exception in thread "main" org.jpmml.model.MissingAttributeException: Required attribute Value@value is not defined
	at org.dmg.pmml.Value.requireValue(Value.java:67)
	at org.jpmml.converter.PMMLUtil.getValues(PMMLUtil.java:139)
	at org.jpmml.converter.PMMLUtil.getValues(PMMLUtil.java:124)
	at org.jpmml.converter.CategoricalFeature.<init>(CategoricalFeature.java:35)
	at org.jpmml.converter.WildcardFeature.toCategoricalFeature(WildcardFeature.java:61)
	at sklearn.preprocessing.MultiOneHotEncoder.encodeFeatures(MultiOneHotEncoder.java:118)
	at sklearn.Transformer.encode(Transformer.java:69)
	at sklearn_pandas.DataFrameMapper.encodeFeatures(DataFrameMapper.java:67)
	at sklearn.Transformer.encode(Transformer.java:69)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:212)
	at com.sklearn2pmml.Main.run(Main.java:84)
	at com.sklearn2pmml.Main.main(Main.java:62)

Can someone give me some advice on what I might have done wrong? Thanks.

Support for `forecast::ets` models

It should be possible to represent ets objects using the TimeSeriesModel element:

library("fpp")
library("forecast")

livestock.forecast = forecast(livestock)
print(livestock.forecast)
print(livestock.forecast$model)

plot(livestock.forecast)

Streaming conversion mode

The JPMML-Converter library is currently stuck in the " file in -> PMML file out" mindset. End users keep asking for more output options, such as URL or plain streaming:
https://stackoverflow.com/questions/74656521/how-to-save-xgboost-lightgbm-model-to-postgresql-database-in-python-for-subseque/74659868#comment131843709_74659868

Conceptually, JPMML converter command-line applications could support UNIX-style piping, which may make integration with non-Java application environments easier:

$ java -jar pmml-converter-executable-${version} < input.file > output.file

Support for transforming labels

Translating from one target value space to another (eg. from integer indices to string labels), or reversing the order of class labels (for binary classification problems) as outlined in jpmml/r2pmml#46 (comment).

Its much easier to generate PMML code based on a transformed label, than to try to "rewrite" an existing PMML document to achieve similar effect.

SVM's classificationMethod is always "OneAgainstOne"

Not sure this is the right place to ask this question.
When i try to export a SVC model (a pipeline) by sklearn2Pmml, i always get a pmml with classificationMethod="OneAgainstOne", though i explicitly specifying decision_function_shape as "ovr" in python. The pipeline is defined as following,

# create pipeline
 model_pipeline = PMMLPipeline([
         ("mapper", DataFrameMapper([
         (feat_names, [ContinuousDomain(with_data=False)])
         ])),
         ("SVC", SVC(probability=True, random_state=2018, decision_function_shape="ovr"))
     ])

After i checked the source code in converter/support_vector_machine/LibSVMUtil.java:116, i found the SupportVectorMachineModel.ClassificationMethod is initialized as ONE_AGAINST_ONE and without any reseting by the decision_function shape set in python.

Please correct me is anything i missed.
Many thanks for your help. You made a great project!

Version of jpmml-converter correspond to version of spark

Hi, when I use jpmml-converter and jpmml-sparkml,the following bugs arise:
(1)jpmml-converter-1.3.9 : Caused by: java.lang.ClassNotFoundException: org.jpmml.converter.HasNativeConfiguration
(2)jpmml-converter-1.4.6: Caused by: java.lang.IllegalArgumentException: Expected Apache Spark ML version 2.4, got version 2.3 (2.3.1)

So, is there the relationship between jpmml-converter and spark?

Environment:spark-2.3.1

StackOverflowError

I am trying to convert a random forest model for pkl to pmml, and I get stack overflow error. I can covert the regression version of the same model without any problem. Attached is the pkl files for regression and random forest and the mapper.

Model 1.zip

Exception in thread "main" java.lang.StackOverflowError
at java.lang.StrictMath.floorOrCeil(StrictMath.java:355)
at java.lang.StrictMath.floor(StrictMath.java:340)
at java.lang.Math.floor(Math.java:424)
at sun.misc.FloatingDecimal.dtoa(FloatingDecimal.java:629)
at sun.misc.FloatingDecimal.(FloatingDecimal.java:468)
at java.lang.Double.toString(Double.java:196)
at org.jpmml.converter.PMMLUtil.formatValue(PMMLUtil.java:387)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:82)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:97)

Request Support for 'survival::coxph' models

Under PMML - General Regression, CoxRegression is listed as a model type. Would it be easy to add support for this modeling framework?:

test   <- survival::coxph(survival::Surv(futime,fustat) ~ age + rx + ecog.ps, survival::ovarian, x=TRUE)
print(test)
library(r2pmml)
r2pmml(test, "test.pmml")

PMML version in xmlns tag does not match version tag

The pmml files generated by sklearn2pmml appear to have a version discrepancy in the header. Here's a header of a pmml file that I just generated

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3">
	<Header>
		<Application name="JPMML-SkLearn" version="1.6.3"/>
		<Timestamp>2020-07-27T18:08:57Z</Timestamp>

Note the discrepancy between PMML-4_4 and version 4.3. Compare to the header of an older pmml file we generated a month ago:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3">
	<Header>
		<Application name="JPMML-SkLearn" version="1.5.35"/>
		<Timestamp>2020-06-15T20:11:06Z</Timestamp>

I don't know what versions we were using last month, but I'm currently using sklearn2pmml version 0.60.0. The java version is:

openjdk version "1.8.0_152-release"
OpenJDK Runtime Environment (build 1.8.0_152-release-1056-b12)
OpenJDK 64-Bit Server VM (build 25.152-b12, mixed mode)

The problem is easy to fix for our purposes, but I thought I'd inform you of the bug.

Can you provide a parameter to control the Separator of output result

Hi @vruusmann, through the code, I see the prediction name and probability name of output result is hard coded. Is it possible to provide a parameter to control the Separator of output result? For example, change probability(1) to probability-1

Below is the existing naming rules, and separator is brackets:

static
public OutputField createProbabilityField(DataType dataType, String value){
	return createProbabilityField(FieldName.create("probability(" + value + ")"), dataType, value);
}

Thanks a lot.

Failing to prune XGBoost tree models

Hi, I want to use sklearn2pmml() function to convert a PMML file.

I created an issuse below, but I was not able to reopen it so I create this new issue and just copy the content again here.
jpmml/jpmml-sklearn#160

Here is my code to create a pipeline. But I saw an error

RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

How can I solve it? My version is 0.73.1

The standout is

Standard output is empty
Standard error:
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 219 ms.
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Converting PKL to PMML..
Jul 01, 2021 8:33:30 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException
	at org.jpmml.converter.visitors.AbstractTreeModelTransformer.initScore(AbstractTreeModelTransformer.java:173)
	at org.jpmml.converter.visitors.TreeModelPruner.exitNode(TreeModelPruner.java:81)
	at org.jpmml.converter.visitors.AbstractTreeModelTransformer.popParent(AbstractTreeModelTransformer.java:61)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:120)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.tree.TreeModel.accept(TreeModel.java:401)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
	at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
	at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
	at org.jpmml.model.visitors.AbstractVisitor.applyTo(AbstractVisitor.java:320)
	at org.jpmml.xgboost.Learner.encodeMiningModel(Learner.java:354)
	at xgboost.sklearn.BoosterUtil.encodeBooster(BoosterUtil.java:63)
	at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:45)
	at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:27)
	at sklearn.Estimator.encode(Estimator.java:83)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:235)
	at org.jpmml.sklearn.Main.run(Main.java:226)
	at org.jpmml.sklearn.Main.main(Main.java:143)

Exception in thread "main" java.lang.IllegalArgumentException
	at org.jpmml.converter.visitors.AbstractTreeModelTransformer.initScore(AbstractTreeModelTransformer.java:173)
	at org.jpmml.converter.visitors.TreeModelPruner.exitNode(TreeModelPruner.java:81)
	at org.jpmml.converter.visitors.AbstractTreeModelTransformer.popParent(AbstractTreeModelTransformer.java:61)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:120)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.tree.TreeModel.accept(TreeModel.java:401)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
	at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
	at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
	at org.jpmml.model.visitors.AbstractVisitor.applyTo(AbstractVisitor.java:320)
	at org.jpmml.xgboost.Learner.encodeMiningModel(Learner.java:354)
	at xgboost.sklearn.BoosterUtil.encodeBooster(BoosterUtil.java:63)
	at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:45)
	at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:27)
	at sklearn.Estimator.encode(Estimator.java:83)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:235)
	at org.jpmml.sklearn.Main.run(Main.java:226)
	at org.jpmml.sklearn.Main.main(Main.java:143)

support tensorflow

Hi guys,
I wonder if you guys have in the roadmap the conversion of tensorflow models into pmml?

Thank you,
Eliano

Support for `kernlab::ksvm` models

The standard pmml package can convert ksvm objects that have been trained using the kernlab::ksvm function. Unfortunately, the converter implementation is rather limited, because it fails to handle ksvm objects that have been trained using alternative means.

For example, it is impossible to convert a ksvm object that was trained using the caret package:

library("caret")
library("kernlab")
library("pmml")

iris.ksvm = ksvm(Species ~ ., data = iris)
class(iris.ksvm)

ksvm.pmml = pmml(iris.ksvm, dataset = iris)

iris_x = iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]
iris_y = iris[, c("Species")]

iris.train = train(x = iris_x, y = iris_y, data = iris, method = "svmRadial")
class(iris.train$finalModel)

# Error in if (field$class[[1]][1] == "numeric") { :
#   argument is of length zero
train.pmml = pmml(iris.train$finalModel, dataset = iris)

Controlling scientific notation in PMML document

Hello,

I've been actively using the PySpark2PMML package to write RF spark models into PMML documents, and was just noticing that sometimes I get scientific notation in the output:

 < ScoreDistribution value="0" recordCount="2.3252954E7" />

Is there a way to control whether or not scientific notation is used in the output? I'd prefer that it isn't used, as my C++ parser isn't written to accept it. Thanks!

Patrick Hofmann

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.