jpmml / jpmml-converter Goto Github PK

Java library for authoring PMML

License: GNU Affero General Public License v3.0

Java 100.00%

jpmml-converter's Issues

PMML conversion : Casting float to decimal causing loss of Precision.

Hi,
I am trying to generate a PMML for Isolation Forest Using sklearn2pmml. While generating a PMML file, variable thresholds are getting changed in PMML file.
It shows correct result when we print Tree using Python pickle file but in Actual PMML variable values are changed.

In Python Tree:

if AA.SUB (#1) <= 526225.988222:
                              return [[ 0.36039911]]
                            else:  # if AA.SUB (#1) > 526225.988222
                              if BB.SUB (#1) <= 5192.53104377:
                                return [[ 0.41983035]]
                              else:  # if BB.SUB (#1) > 5192.53104377
                                return [[ 0.88258597]]

In PMML Tree:

<Node id="712">
  <SimplePredicate field="AA.SUB" operator="greaterThan" value="526226"/>
  <Node id="713" score="17.018322573802887">
    <SimplePredicate field="BB.SUB" operator="lessOrEqual" value="5192.5312"/>
  </Node>
</Node>

In above case 526225.988222 getting changed in to 526226 and 5192.53104377 into 5192.5312

I have analyzed the source code of jpmml-converter and found that the way converting Float values to Double is wrong.( in ValueUtil.java ).

Can you please analyze and see if this is the issue. If yes, can you please suggest any workaround/solution for this issue.

Support for transformer-only pipelines

Based on jpmml/jpmml-sklearn#86

At the moment it's impossible to generate transformer-only pipelines, because the ModelEncoder#encodePMML(Model) method applies a set of visitors that clean the soon-to-be-generated PMML document from all unused preprocessing instructions:
https://github.com/jpmml/jpmml-converter/blob/master/src/main/java/org/jpmml/converter/ModelEncoder.java#L53-L56

Possible solution: class ModelEncoder should provide a "transformer-only" conversion mode.

Constant elements should require a data type hint

Method PMMLUtil#createConstant(Object) must be replaced with method PMMLUtil#createConstant(Object, DataType).

See jpmml/jpmml-evaluator#107 (comment)

Error on a pipeline with OneHotEncoder and xgboost

Hello,

I trained a PMMLPipeline with OneHotEncoder and XGBClassifier using the following code snippet.

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import OneHotEncoder
from sklearn2pmml import sklearn2pmml, PMMLPipeline
from xgboost.sklearn import XGBClassifier


mapper = DataFrameMapper(
    [(col, None) for col in numerical_cols] +
    [([col], OneHotEncoder(handle_unknown='ignore')) for col in categorical_cols]
)

pipeline = PMMLPipeline(
    steps=[
        ('mapper', mapper),
        ('classifier', XGBClassifier())
    ]
)

pipeline.fit(X,  y)

The pipeline seemed to work and I was able to use it to do predictions.
But I got an error when I tried to turn the pipeline into a pmml file
sklearn2pmml(pipeline, "testing.pmml", with_repr=True)

Standard error:
Exception in thread "main" org.jpmml.model.MissingAttributeException: Required attribute Value@value is not defined
	at org.dmg.pmml.Value.requireValue(Value.java:67)
	at org.jpmml.converter.PMMLUtil.getValues(PMMLUtil.java:139)
	at org.jpmml.converter.PMMLUtil.getValues(PMMLUtil.java:124)
	at org.jpmml.converter.CategoricalFeature.<init>(CategoricalFeature.java:35)
	at org.jpmml.converter.WildcardFeature.toCategoricalFeature(WildcardFeature.java:61)
	at sklearn.preprocessing.MultiOneHotEncoder.encodeFeatures(MultiOneHotEncoder.java:118)
	at sklearn.Transformer.encode(Transformer.java:69)
	at sklearn_pandas.DataFrameMapper.encodeFeatures(DataFrameMapper.java:67)
	at sklearn.Transformer.encode(Transformer.java:69)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:212)
	at com.sklearn2pmml.Main.run(Main.java:84)
	at com.sklearn2pmml.Main.main(Main.java:62)

Can someone give me some advice on what I might have done wrong? Thanks.

Support for `forecast::ets` models

It should be possible to represent ets objects using the TimeSeriesModel element:

library("fpp")
library("forecast")

livestock.forecast = forecast(livestock)
print(livestock.forecast)
print(livestock.forecast$model)

plot(livestock.forecast)

Streaming conversion mode

The JPMML-Converter library is currently stuck in the " file in -> PMML file out" mindset. End users keep asking for more output options, such as URL or plain streaming:
https://stackoverflow.com/questions/74656521/how-to-save-xgboost-lightgbm-model-to-postgresql-database-in-python-for-subseque/74659868#comment131843709_74659868

Conceptually, JPMML converter command-line applications could support UNIX-style piping, which may make integration with non-Java application environments easier:

$ java -jar pmml-converter-executable-${version} < input.file > output.file

Option to choose the carrier data format (XML vs. JSON vs. YAML) in all end user-facing converter tools

With the update to PMML schema version 4.4, it's time to shake things up some more!

In Scikit-Learn:

sklearn2pmml(pipeline, "MyPipeline.pmml.json", format = "json")

In R:

r2pmml(model, "MyModel.pmml.yaml", format = "yaml")

Support for transforming labels

Translating from one target value space to another (eg. from integer indices to string labels), or reversing the order of class labels (for binary classification problems) as outlined in jpmml/r2pmml#46 (comment).

Its much easier to generate PMML code based on a transformed label, than to try to "rewrite" an existing PMML document to achieve similar effect.

SVM's classificationMethod is always "OneAgainstOne"

Not sure this is the right place to ask this question.
When i try to export a SVC model (a pipeline) by sklearn2Pmml, i always get a pmml with classificationMethod="OneAgainstOne", though i explicitly specifying decision_function_shape as "ovr" in python. The pipeline is defined as following,

# create pipeline
 model_pipeline = PMMLPipeline([
         ("mapper", DataFrameMapper([
         (feat_names, [ContinuousDomain(with_data=False)])
         ])),
         ("SVC", SVC(probability=True, random_state=2018, decision_function_shape="ovr"))
     ])

After i checked the source code in converter/support_vector_machine/LibSVMUtil.java:116, i found the SupportVectorMachineModel.ClassificationMethod is initialized as ONE_AGAINST_ONE and without any reseting by the decision_function shape set in python.

Please correct me is anything i missed.
Many thanks for your help. You made a great project!

Version of jpmml-converter correspond to version of spark

Hi, when I use jpmml-converter and jpmml-sparkml，the following bugs arise:
（1）jpmml-converter-1.3.9 ： Caused by: java.lang.ClassNotFoundException: org.jpmml.converter.HasNativeConfiguration
（2）jpmml-converter-1.4.6： Caused by: java.lang.IllegalArgumentException: Expected Apache Spark ML version 2.4, got version 2.3 (2.3.1)

So, is there the relationship between jpmml-converter and spark?

Environment：spark-2.3.1

StackOverflowError

I am trying to convert a random forest model for pkl to pmml, and I get stack overflow error. I can covert the regression version of the same model without any problem. Attached is the pkl files for regression and random forest and the mapper.

Model 1.zip

Exception in thread "main" java.lang.StackOverflowError
at java.lang.StrictMath.floorOrCeil(StrictMath.java:355)
at java.lang.StrictMath.floor(StrictMath.java:340)
at java.lang.Math.floor(Math.java:424)
at sun.misc.FloatingDecimal.dtoa(FloatingDecimal.java:629)
at sun.misc.FloatingDecimal.(FloatingDecimal.java:468)
at java.lang.Double.toString(Double.java:196)
at org.jpmml.converter.PMMLUtil.formatValue(PMMLUtil.java:387)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:82)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:97)

Reusable visitor for (re-)generating score distributions from leaf elements

See jpmml/jpmml-sparkml#93

Initial implementation: https://github.com/vruusmann/rf_feature_impact/blob/master/src/main/java/feature_impact/visitors/ScoreDistributionGenerator.java

Request Support for 'survival::coxph' models

Under PMML - General Regression, CoxRegression is listed as a model type. Would it be easy to add support for this modeling framework?:

test   <- survival::coxph(survival::Surv(futime,fustat) ~ age + rx + ecog.ps, survival::ovarian, x=TRUE)
print(test)
library(r2pmml)
r2pmml(test, "test.pmml")

PMML version in xmlns tag does not match version tag

The pmml files generated by sklearn2pmml appear to have a version discrepancy in the header. Here's a header of a pmml file that I just generated

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3">
	<Header>
		<Application name="JPMML-SkLearn" version="1.6.3"/>
		<Timestamp>2020-07-27T18:08:57Z</Timestamp>

Note the discrepancy between PMML-4_4 and version 4.3. Compare to the header of an older pmml file we generated a month ago:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3">
	<Header>
		<Application name="JPMML-SkLearn" version="1.5.35"/>
		<Timestamp>2020-06-15T20:11:06Z</Timestamp>

I don't know what versions we were using last month, but I'm currently using sklearn2pmml version 0.60.0. The java version is:

openjdk version "1.8.0_152-release"
OpenJDK Runtime Environment (build 1.8.0_152-release-1056-b12)
OpenJDK 64-Bit Server VM (build 25.152-b12, mixed mode)

The problem is easy to fix for our purposes, but I thought I'd inform you of the bug.

Can you provide a parameter to control the Separator of output result

Hi @vruusmann, through the code, I see the prediction name and probability name of output result is hard coded. Is it possible to provide a parameter to control the Separator of output result? For example, change probability(1) to probability-1

Below is the existing naming rules， and separator is brackets:

static
public OutputField createProbabilityField(DataType dataType, String value){
	return createProbabilityField(FieldName.create("probability(" + value + ")"), dataType, value);
}

Thanks a lot.

Failing to prune XGBoost tree models

Hi, I want to use sklearn2pmml() function to convert a PMML file.

I created an issuse below, but I was not able to reopen it so I create this new issue and just copy the content again here.
jpmml/jpmml-sklearn#160

Here is my code to create a pipeline. But I saw an error

RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

How can I solve it? My version is 0.73.1

The standout is

Standard output is empty
Standard error:
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 219 ms.
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Converting PKL to PMML..
Jul 01, 2021 8:33:30 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException
	at org.jpmml.converter.visitors.AbstractTreeModelTransformer.initScore(AbstractTreeModelTransformer.java:173)
	at org.jpmml.converter.visitors.TreeModelPruner.exitNode(TreeModelPruner.java:81)
	at org.jpmml.converter.visitors.AbstractTreeModelTransformer.popParent(AbstractTreeModelTransformer.java:61)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:120)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.tree.TreeModel.accept(TreeModel.java:401)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
	at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
	at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
	at org.jpmml.model.visitors.AbstractVisitor.applyTo(AbstractVisitor.java:320)
	at org.jpmml.xgboost.Learner.encodeMiningModel(Learner.java:354)
	at xgboost.sklearn.BoosterUtil.encodeBooster(BoosterUtil.java:63)
	at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:45)
	at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:27)
	at sklearn.Estimator.encode(Estimator.java:83)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:235)
	at org.jpmml.sklearn.Main.run(Main.java:226)
	at org.jpmml.sklearn.Main.main(Main.java:143)

Exception in thread "main" java.lang.IllegalArgumentException
	at org.jpmml.converter.visitors.AbstractTreeModelTransformer.initScore(AbstractTreeModelTransformer.java:173)
	at org.jpmml.converter.visitors.TreeModelPruner.exitNode(TreeModelPruner.java:81)
	at org.jpmml.converter.visitors.AbstractTreeModelTransformer.popParent(AbstractTreeModelTransformer.java:61)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:120)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.tree.TreeModel.accept(TreeModel.java:401)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
	at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
	at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
	at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
	at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
	at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
	at org.jpmml.model.visitors.AbstractVisitor.applyTo(AbstractVisitor.java:320)
	at org.jpmml.xgboost.Learner.encodeMiningModel(Learner.java:354)
	at xgboost.sklearn.BoosterUtil.encodeBooster(BoosterUtil.java:63)
	at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:45)
	at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:27)
	at sklearn.Estimator.encode(Estimator.java:83)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:235)
	at org.jpmml.sklearn.Main.run(Main.java:226)
	at org.jpmml.sklearn.Main.main(Main.java:143)

Support for more complex workflows

Several transformers expect that the incoming feature is backed by a DataField element. This causes problems with more functional workflows, where the DataField element has been "replaced" with a DerivedField element.

See: https://groups.google.com/d/msg/jpmml/ellpOHvWyrk/7kskrINNAQAJ

support tensorflow

Hi guys,
I wonder if you guys have in the roadmap the conversion of tensorflow models into pmml?

Thank you,
Eliano

Support for `kernlab::ksvm` models

The standard pmml package can convert ksvm objects that have been trained using the kernlab::ksvm function. Unfortunately, the converter implementation is rather limited, because it fails to handle ksvm objects that have been trained using alternative means.

For example, it is impossible to convert a ksvm object that was trained using the caret package:

library("caret")
library("kernlab")
library("pmml")

iris.ksvm = ksvm(Species ~ ., data = iris)
class(iris.ksvm)

ksvm.pmml = pmml(iris.ksvm, dataset = iris)

iris_x = iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]
iris_y = iris[, c("Species")]

iris.train = train(x = iris_x, y = iris_y, data = iris, method = "svmRadial")
class(iris.train$finalModel)

# Error in if (field$class[[1]][1] == "numeric") { :
#   argument is of length zero
train.pmml = pmml(iris.train$finalModel, dataset = iris)

Controlling scientific notation in PMML document

Hello,

I've been actively using the PySpark2PMML package to write RF spark models into PMML documents, and was just noticing that sometimes I get scientific notation in the output:

 < ScoreDistribution value="0" recordCount="2.3252954E7" />

Is there a way to control whether or not scientific notation is used in the output? I'd prefer that it isn't used, as my C++ parser isn't written to accept it. Thanks!

Patrick Hofmann

Ability to show/hide default attribute values

See jpmml/jpmml-xgboost#38

jpmml / jpmml-converter Goto Github PK

jpmml-converter's Issues

Recommend Projects

Recommend Topics

Recommend Org