jpmml / jpmml-lightgbm Goto Github PK

View Code? Open in Web Editor NEW

171.0 171.0 58.0 5.11 MB

Java library and command-line application for converting LightGBM models to PMML

License: GNU Affero General Public License v3.0

Java 90.71% Python 8.18% R 1.12%

jpmml-lightgbm's Introduction

Java API for producing and scoring models in Predictive Model Markup Language (PMML).

IMPORTANT

This is a legacy codebase.

Starting from March 2014, this project has been superseded by [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) projects.

Features

Class model

Full support for PMML 3.0, 3.1, 3.2, 4.0 and 4.1 schemas:
- Class hierarchy.
- Schema version annotations.
Fluent API:
- Value constructors.
SAX Locator information
[Visitor pattern] (http://en.wikipedia.org/wiki/Visitor_pattern):
- Validation agents.
- Optimization and transformation agents.

Evaluation engine

Full support for [DataDictionary] (http://www.dmg.org/v4-1/DataDictionary.html) and [MiningSchema] (http://www.dmg.org/v4-1/MiningSchema.html) elements:
- Complete data type system.
- Complete operational type system. For example, continuous integers, categorical integers and ordinal integers are handled differently in equality check and comparison operations.
- Detection and treatment of outlier, missing and invalid values.
Full support for [transformations] (http://www.dmg.org/v4-1/Transformations.html) and [functions] (http://www.dmg.org/v4-1/Functions.html):
- Built-in functions.
- User defined functions (PMML, Java).
Full support for [Targets] (http://www.dmg.org/v4-1/Targets.html) and [Output] (http://www.dmg.org/v4-1/Output.html) elements.
Fully supported model elements:
- [Association rules] (http://www.dmg.org/v4-1/AssociationRules.html)
- [Cluster model] (http://www.dmg.org/v4-1/ClusteringModel.html)
- [General regression] (http://www.dmg.org/v4-1/GeneralRegression.html)
- [Naive Bayes] (http://www.dmg.org/v4-1/NaiveBayes.html)
- [k-Nearest neighbors] (http://www.dmg.org/v4-1/KNN.html)
- [Neural network] (http://www.dmg.org/v4-1/NeuralNetwork.html)
- [Regression] (http://www.dmg.org/v4-1/Regression.html)
- [Rule set] (http://www.dmg.org/v4-1/RuleSet.html)
- [Scorecard] (http://www.dmg.org/v4-1/Scorecard.html)
- [Support Vector Machine] (http://www.dmg.org/v4-1/SupportVectorMachine.html)
- [Tree model] (http://www.dmg.org/v4-1/TreeModel.html)
- [Ensemble model] (http://www.dmg.org/v4-1/MultipleModels.html)
Fully interoperable with popular open source software:
- [R] (http://www.r-project.org/) and [Rattle] (http://rattle.togaware.com/)
- [KNIME] (http://www.knime.org/)
- [RapidMiner] (http://rapid-i.com/content/view/181/190/)

Installation

JPMML library JAR files (together with accompanying Java source and Javadocs JAR files) are released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.0.22 (17 February, 2014).

Class model

<!-- Class model classes -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-model</artifactId>
	<version>${jpmml.version}</version>
</dependency>
<!-- Class model annotations -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-schema</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Evaluation engine

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Usage

Class model

The class model consists of two types of classes. There is a small number of manually crafted classes that are used for structuring the class hierarchy. They are permanently stored in the Java sources directory /pmml-model/src/main/java. Additionally, there is a much greater number of automatically generated classes that represent actual PMML elements. They can be found in the generated Java sources directory /pmml-model/target/generated-sources/xjc after a successful build operation.

All class model classes descend from class org.dmg.pmml.PMMLObject. Additional class hierarchy levels, if any, represent common behaviour and/or features. For example, all model classes descend from class org.dmg.pmml.Model.

There is not much documentation accompanying class model classes. The application developer should consult with the [PMML specification] (http://www.dmg.org/v4-1/GeneralStructure.html) about individual PMML elements and attributes.

Example applications

Copying an existing org.dmg.pmml.PMML instance from one file to another file: [CopyExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/CopyExample.java)
Building a new org.dmg.pmml.PMML instance for the "golfing" decision tree model: [TreeModelBuilderExample.java] (https://github.com/jpmml/jpmml-example/blob/master/src/main/java/org/jpmml/example/TreeModelBuilderExample.java)

Evaluation engine

A model evaluator class can be instantiated directly when the contents of the PMML document is known:

PMML pmml = ...;

ModelEvaluator<TreeModel> modelEvaluator = new TreeModelEvaluator(pmml);

Otherwise, a PMML manager class should be instantiated first, which will inspect the contents of the PMML document and instantiate the right model evaluator class later:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);
 
ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

Model evaluator classes follow functional programming principles. Model evaluator instances are cheap enough to be created and discarded as needed (ie. not worth the pooling effort).

It is advisable for application code to work against the org.jpmml.evaluator.Evaluator interface:

Evaluator evaluator = (Evaluator)modelEvaluator;

An evaluator instance can be queried for the definition of active (ie. independent), predicted (ie. primary dependent) and output (ie. secondary dependent) fields:

List<FieldName> activeFields = evaluator.getActiveFields();
List<FieldName> predictedFields = evaluator.getPredictedFields();
List<FieldName> outputFields = evaluator.getOutputFields();

The PMML scoring operation must be invoked with valid arguments. Otherwise, the behaviour of the model evaluator class is unspecified.

The preparation of field values:

Map<FieldName, FieldValue> arguments = new LinkedHashMap<FieldName, FieldValue>();

List<FieldName> activeFields = evaluator.getActiveFields();
for(FieldName activeField : activeFields){
	// The raw (ie. user-supplied) value could be any Java primitive value
	Object rawValue = ...;

	// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue activeValue = evaluator.prepare(activeField, rawValue);

	arguments.put(activeField, activeValue);
}

The scoring:

Map<FieldName, ?> results = evaluator.evaluate(arguments);

Typically, a model has exactly one predicted field, which is called the target field:

FieldName targetName = evaluator.getTargetField();
Object targetValue = results.get(targetName);

The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	Object primitiveValue = computable.getResult();
}

The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;
	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
	Entity winner = entities.get(hasEntityId.getEntityId());

	// Test for "probability" result feature
	if(targetValue instanceof HasProbability){
		HasProbability hasProbability = (HasProbability)targetValue;
		Double winnerProbability = hasProbability.getProbability(winner.getId());
	}
}

Example applications

Evaluating a PMML file interactively: [EvaluationExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/EvaluationExample.java)
Evaluating a PMML file non-interactively with CSV file input: [CsvEvaluationExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/CsvEvaluationExample.java)

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-lightgbm's People

Contributors

Stargazers

Watchers

jpmml-lightgbm's Issues

There are some mistakes. Who can help me

lightgbm shuubiasahi$ java -jar target/converter-executable-1.2-SNAPSHOT.jar --lgbm-input /Users/shuubiasahi/Documents/python/credit-tfgan/xml/lightgbm.txt --pmml-output /Users/shuubiasahi/Documents/python/credit-tfgan/xml/lightgbm.pmml
Exception in thread "main" java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:542)
at java.lang.Integer.parseInt(Integer.java:615)
at org.jpmml.lightgbm.Section.getInt(Section.java:46)
at org.jpmml.lightgbm.Tree.load(Tree.java:76)
at org.jpmml.lightgbm.GBDT.load(GBDT.java:108)
at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:61)
at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:53)
at org.jpmml.lightgbm.Main.run(Main.java:122)
at org.jpmml.lightgbm.Main.main(Main.java:115)

NumberFormatException happens

I tried the example you have provided with boston dataset. When I converted the model file to PMML, I got the following errors:

Exception in thread "main" java.lang.NumberFormatException: null
	at java.lang.Integer.parseInt(Integer.java:454)
	at java.lang.Integer.parseInt(Integer.java:527)
	at org.jpmml.lightgbm.Section.getInt(Section.java:46)
	at org.jpmml.lightgbm.Tree.load(Tree.java:74)
	at org.jpmml.lightgbm.GBDT.load(GBDT.java:97)
	at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:58)
	at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:50)
	at org.jpmml.lightgbm.Main.run(Main.java:122)
	at org.jpmml.lightgbm.Main.main(Main.java:115)

How to deal with it ? I attach my model file
lightgbm.txt

Thanks!

An error occurred when the dataset contains string features

My dataset contains some string features, and I used StringIndexer + OneHotEncoder to encode them. When I put the StringIndexers, OneHotEncoders, VectorAssembler, and LightGBM in a pipeline, and fit the pipeline, everything is ok.

But, when I want to saved the pipeline into PMML, an error occurred. The error log is: Py4JJavaError: An error occurred while calling o353581.buildFile.
: java.lang.IllegalArgumentException: Field userinfov4_worktype has data type string

userinfov4_worktype is one of the string features.
I used jpmml-lightgbm-1.2.13

Fail to convert lightgbm to pmml (NumberFormatException)

tried to run java -jar jpmml-lightgbm-executable-1.2-SNAPSHOT.jar --lgbm-input lightgbm.txt --pmml-output output.pmml to convert lightgbm model to pmml format, but encountered

Exception in thread "main" java.lang.NumberFormatException: null
	at java.lang.Integer.parseInt(Integer.java:542)
	at java.lang.Integer.parseInt(Integer.java:615)
	at org.jpmml.lightgbm.Section.getInt(Section.java:51)
	at org.jpmml.lightgbm.Tree.load(Tree.java:75)
	at org.jpmml.lightgbm.GBDT.load(GBDT.java:111)
	at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:59)
	at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:51)
	at org.jpmml.lightgbm.Main.run(Main.java:124)
	at org.jpmml.lightgbm.Main.main(Main.java:117)

Attached is the lightgbm model file.
lightgbm.txt

lightgbm version is 2.0.2.

Can someone please kindly help? Thanks much!

Why the converting process is so slow and resource-costing?

I use the jpmml-lightgbm tool to convert a lightgbm model to a pmml model.
The model is simple with num_trees=2 and num_leaves=3, but the converting process is so slow and resource-costing.
Why???
The origin model file of lightgbm is as follows:
lightgbm_model.txt

Error: Could not find or load main class org.jpmml.lightgbm.Main

Error: Could not find or load main class org.jpmml.lightgbm.Main
java 1.8 version. don't know why anyone could help with this? very appreciate that. Thanks

Exception in thread "main" java.lang.IllegalArgumentException: Right branch is not selectable

Hey,

Can someone help with this


Exception in thread "main" java.lang.IllegalArgumentException: Right branch is not selectable
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:210)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
	at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:96)
	at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:72)
	at org.jpmml.lightgbm.BinomialLogisticRegression.encodeMiningModel(BinomialLogisticRegression.java:47)
	at org.jpmml.lightgbm.GBDT.encodeMiningModel(GBDT.java:394)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:383)
	at org.jpmml.lightgbm.Main.run(Main.java:132)
	at org.jpmml.lightgbm.Main.main(Main.java:118)

getting this error abruptly

lightGBM version: 3.1.0
jpmml version: 1.2

on updating jpmml to latest version i.e: 1.3.11

getting this error


Sep 20, 2021 11:37:02 PM org.jpmml.lightgbm.Main run
INFO: Loading GBDT..
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
INFO: Loaded GBDT in 485 ms.
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
INFO: Converting GBDT to PMML..
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
SEVERE: Failed to convert GBDT to PMML
java.lang.IllegalArgumentException: Expected all values to be of the same data type, got 2 different data types ([integer, string])
	at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:129)
	at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:85)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:158)
	at org.jpmml.lightgbm.Main.main(Main.java:127)

Exception in thread "main" java.lang.IllegalArgumentException: Expected all values to be of the same data type, got 2 different data types ([integer, string])
	at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:129)
	at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:85)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:158)
	at org.jpmml.lightgbm.Main.main(Main.java:127)

Inconsistent predicted results between Python and Java

We build a binary classification model in Python using lightgbm.train() (https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/engine.py#L18) and saved the .pmml file for Java side to use. Our model includes both continuous and categorical features. It seems the predicted score is different for the same feature input in Python and Java.

The Python side prediction is using model.predict() (https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/basic.py#L473). And the Java side predicted score is from getProbability() (https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/ProbabilityDistribution.java#L35).

If we get the predicted probability for the same target label, they are very different. For example, python score is 0.003362 and java score is 0.09655096497772561.

BUT, If we only include numerical features, the predicted scores are exactly the same.

How we cast the categorical features in python is like:
df['col'] = df['col'].astype('category')
And how the categorical features look like in pmml is like:
<DataField name="col" optype="categorical" dataType="string">

Please advise if there's anything we can do to make the predicted scores consistent when including the categorical features. Thanks!

How to cancel the limits of the value ranges.

"schema" : {
"inputFields" : [ {
"id" : "build_year",
"dataType" : "double",
"opType" : "continuous",
"values" : [ "[0.0, 7.611347717403621]" ]
}

I am using "openscoring" to deploy my lightgbm model. When the value of "build_year" is greater than 7.611347717403621 or less than 0.0, the response of the post request is "400". How do I let the deployed model accept the values that are not within the range?

error with lgb's continue training model file

here is my training code:

gbm = lgb.train(params,
                lgb_valid,
                num_boost_round=5,
                evals_result=result,
                init_model = old_model,
                verbose_eval=5,
                learning_rates=lambda iter: 0.05 * (0.99 ** iter),
                valid_sets=lgb_valid,
                feature_name=list(old_feats)#,categorical_feature = cates
               )
gbm.save_model('lgb_continue.txt')

When i try to convert the "lgb_continue.txt" to pmml, something goes wrong:

Exception in thread "main" java.lang.NullPointerException
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:223)
        ...
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
        at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:96)
        at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:66)
        at org.jpmml.lightgbm.BinomialLogisticRegression.encodeMiningModel(BinomialLogisticRegression.java:47)
        at org.jpmml.lightgbm.GBDT.encodeMiningModel(GBDT.java:298)
        at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:287)
        at org.jpmml.lightgbm.Main.run(Main.java:131)
        at org.jpmml.lightgbm.Main.main(Main.java:117)

If I don't use continue training , everything will be fine.

conversion incorrect when category value contains comma

I have some variable with category

"AMD ARUBA (DRM 2.50.0 / 4.15.0-91-generic, LLVM 9.0.0)"

However during pmml conversion it becomes

  <DataField name="setup" optype="categorical" dataType="string">
  	<Value value="&quot;AMD ARUBA (DRM 2.50.0 / 4.15.0-91-generic"/>
  	<Value value="LLVM 9.0.0)&quot;"/>

It seems this comes from line 628 of https://github.com/jpmml/jpmml-lightgbm/blob/3e64f29a93f51126a5a590d27be987515c780f4c/src/main/java/org/jpmml/lightgbm/GBDT.java

Could you kindly fix it?

Support for pipeline

Hi Vilu,

Does this package support pipeline like sklearn2pmml?

Thanks,
Bohan

Failing when featureInfo is None

PMML encoder fails when some of the features are absent or when their featureInfo is None in lightGBM output model. Specifically, it raises exception when it encounters featureInfo as None in https://github.com/jpmml/jpmml-lightgbm/blob/master/src/main/java/org/jpmml/lightgbm/GBDT.java#L138

An easy way to reproduce is to use a feature matrix where entire column is None. This scenario happens when due to certain threshold or criteria entire column is dropped .

How to use java6 to load pmml model of LightGBM?

I made a pmml model file using the jpmml-lightgbm tool. And I have to use the jpmml-evaluator-1.1.14 to load the model which is compatible with java6.

But it threw a exception as follows:

Exception in thread "main" java.lang.IllegalArgumentException: http://www.dmg.org/PMML-4_3
	at org.jpmml.schema.Version.forNamespaceURI(Version.java:47)
	at org.jpmml.model.PMMLFilter.updateSource(PMMLFilter.java:111)
	at org.jpmml.model.PMMLFilter.startPrefixMapping(PMMLFilter.java:41)
 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startNamespaceMapping(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)

So I change the pmml file from

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  <PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
  ........

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  <PMML xmlns="http://www.dmg.org/PMML-4_2" version="4.2">
  ........

And the testcase passed!

So, are there any hidden problems for this informal method?
Also, is there more formal method to load the pmml model of LightGBM under jdk6 environment?

Support `-inf` and `inf` as continuous feature bounds

java.lang.NumberFormatException: For input string: "-inf"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.valueOf(Double.java:502)
at org.jpmml.lightgbm.LightGBMUtil.parseInterval(LightGBMUtil.java:214)
at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:277)
at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:383)
at org.jpmml.lightgbm.Main.run(Main.java:167)
at org.jpmml.lightgbm.Main.main(Main.java:136)

Exception in thread "main" java.lang.NumberFormatException: For input string: "-inf"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.valueOf(Double.java:502)
at org.jpmml.lightgbm.LightGBMUtil.parseInterval(LightGBMUtil.java:214)
at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:277)
at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:383)
at org.jpmml.lightgbm.Main.run(Main.java:167)
at org.jpmml.lightgbm.Main.main(Main.java:136)

hi:
Most of the converted lightgbm models contain inf information
for example
[-inf:603.33333333333337] [-inf:194.71428571428572] [0.0023538984627008803:151.26666666666668] [0:inf] [-inf:inf]
If the model contains this information, the above problems will occur
How to solve this problem？

Error with NullPointerException

Hi,
When I run "java -jar target/jpmml-lightgbm-executable-1.3-SNAPSHOT.jar --lgbm-input model.txt --pmml-output lightgbm.pmml", there was an error with the following information:

Exception in thread "main" java.lang.NullPointerException
	at org.jpmml.lightgbm.LightGBMUtil.parseStringArray(LightGBMUtil.java:100)
	at org.jpmml.lightgbm.LightGBMUtil.parseIntArray(LightGBMUtil.java:111)
	at org.jpmml.lightgbm.Section.getIntArray(Section.java:62)
	at org.jpmml.lightgbm.Tree.load(Tree.java:78)
	at org.jpmml.lightgbm.GBDT.load(GBDT.java:119)
	at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:53)
	at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:45)
	at org.jpmml.lightgbm.Main.run(Main.java:141)
	at org.jpmml.lightgbm.Main.main(Main.java:134)

so, how to solve it?

fail for : java.lang.NumberFormatException: For input string: "inf"

fail for

java.lang.NumberFormatException: For input string: "inf"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at java.lang.Double.valueOf(Double.java:502)
        at org.jpmml.lightgbm.LightGBMUtil.parseInterval(LightGBMUtil.java:213)

Support for boolean features

Encountered the following exception, when training a binary classifier with a sparse dataset that contains a boolean column ("Audit/Deductions"):

java.lang.IllegalArgumentException
        at org.jpmml.converter.PredicateManager.createArray(PredicateManager.java:73)
        at org.jpmml.converter.PredicateManager.createSimpleSetPredicate(PredicateManager.java:44)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:217)
        at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)

The boolean value space contains exactly two scalar values. It's weird that the values.size() == 1, didn't fire in PredicateManager#createSimpleSetPredicate(...), which suggests that the values variable is either an empty collection, or a two-valued one.

Use `long` to represent `unsigned int` attributes

The cat_threshold attribute may take values that exceed the numeric range of the int data type:
https://groups.google.com/d/msg/jpmml/10uOILNhXY8/wpWrAJjrCQAJ

how to transform a lightgbm model into a regression model

Excuse me, how to transform a lightgbm model into a regression model

Results from the LightGBM predict method, are not the same as after evaluate method.

I'm creating the lightgbm.basic.Booster from the previously trained and saved to file LightGBM model. After that, I'm doing a prediction using "predict" method.
From the same file with saved LightGBM model I'm creating a pmml using "jpmml-lightgbm" library and doing prediction using "evaluate" method.
Compare results and they are not the same.
What can be the reason of this behavior?

ref #27 #22

#22 #27
Hello, I also encounter this problem. And I count the pandas_categorical number is right, but when convert, it also out of bounds.where could the redundant number from?

Broken tests

Trying to build this library according to manual, it builds fine but fails to pass classifier tests:

org.jpmml.lightgbm.ClassificationTest.txt

Built it with -Dskiptests flag for now, but this is probably something you would want to fix.

OS: mac os 10.13

P.S: Since the difference is in 15th digit or something like that, you could check not for exact equality but for approximate equality with small tolerance 1e-10.

categorical features size when some are not in trees

i am getting the following exception on conversion of the LGBM model which i supplied the categorical_feature parameter to :

Exception in thread "main" java.lang.IllegalArgumentException
        at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:296)
        at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:381)
        at org.jpmml.lightgbm.Main.run(Main.java:132)
        at org.jpmml.lightgbm.Main.main(Main.java:118)

i think from looking at the code that the reason is that pandasCategoryIndex is incremented only if it appears in one of the trees , so in my case some of them are not , so the size of the pandas_categorical (that have all of the supplied categorical features ) doesnt match the calculated size (according to the tree appearance).

Loading pandas categorical breaks when a name contains square bracket character "]"

The code doesn't differentiate between syntax "]" and name "]" and throws:

Exception in thread "main" java.lang.IllegalArgumentException: ...
at org.jpmml.lightgbm.GBDT.loadPandasCategorical(GBDT.java:460)
...

presumably in the second iteration of the while loop.

NullPointerException when converting a binary classifier with converter-executable-1.0-SNAPSHOT.jar

I get the error below when trying to convert a saved booster.

The LGBM model file is here: https://gist.github.com/travisbrady/3408dedf1a8217a32eb859337b354648

$ java -jar converter-executable-1.0-SNAPSHOT.jar --lgbm-input models/clf_2017_03_12data.lgb.txt --pmml-output lightgbm.pmml
Exception in thread "main" java.lang.NullPointerException
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.lang.Double.parseDouble(Double.java:538)
	at org.jpmml.lightgbm.Section.getDouble(Section.java:54)
	at org.jpmml.lightgbm.GBDT.load(GBDT.java:76)
	at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:56)
	at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:48)
	at org.jpmml.lightgbm.Main.run(Main.java:110)
	at org.jpmml.lightgbm.Main.main(Main.java:103)

Interestingly the PMML converter that comes with Lightgbm runs without crashing but generates a file without a single tree in it.

Other details:
Model created with lgb Python bindings

$ java -version
java version "1.8.0_71"
Java(TM) SE Runtime Environment (build 1.8.0_71-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)

IllegalArgumentException: Out of range: 12045138254372

The following error appears when trying to covert a model.txt file to pmml file

Exception in thread "main" java.lang.IllegalArgumentException: Out of range: 12045138254372
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:202)
	at com.google.common.primitives.Ints.checkedCast(Ints.java:88)
	at org.jpmml.converter.ValueUtil.asInt(ValueUtil.java:80)
	at org.jpmml.converter.ValueUtil.asInteger(ValueUtil.java:88)
	at org.jpmml.lightgbm.LightGBMUtil$2.apply(LightGBMUtil.java:332)
	at org.jpmml.lightgbm.LightGBMUtil$2.apply(LightGBMUtil.java:324)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:226)
	at org.jpmml.lightgbm.Main.run(Main.java:131)
	at org.jpmml.lightgbm.Main.main(Main.java:117)

the following is the model file
model.txt

I deleted the BRANCHID field which contains value '12045138254372', and the model file has been converted successfully.

jpmml-lightgbm seems broken in current model LightGBM generated

OS: Debian 8.4
LightGBM Python API version: 2.0.6 (installed from pip)
jpmml-lightgbm version: current master

The traceback:

Exception in thread "main" java.lang.IllegalArgumentException
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:153)
	at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:77)
	at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:67)
	at org.jpmml.lightgbm.BinomialLogisticRegression.encodeMiningModel(BinomialLogisticRegression.java:49)
	at org.jpmml.lightgbm.GBDT.encodeMiningModel(GBDT.java:196)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:188)
	at org.jpmml.lightgbm.Main.run(Main.java:113)
	at org.jpmml.lightgbm.Main.main(Main.java:103)

It looks like the same problem in pmml.py from LightGBM's repo microsoft/LightGBM#877 . Tree.java didn't deal with num_cat in the origin model when iterating a tree, so decision_type loads the wrong data.

A single tree in my model looks like below

Tree=0
num_leaves=31
num_cat=0
split_feature=130 137 25 136 43 41 34 136 137 136 44 137 136 21 137 25 35 1 60 136 137 132 138 34 24 2 137 136 137 136
split_gain=5812.6865850533359 4750.7712912423303 4724.2992261658801 4667.4854093973117 2357.4763379315846 2483.3486637981259 6945.2259999360776 2243.4213386228075 2374.0770696816035 1467.3039153231657 1285.9200319876691 1236.355555283842 921.04117468325421 891.75068549756679 882.8234877595678 843.49713683005211 757.18833604357701 602.65069390989947 556.37066702117988 551.82002711453242 1869.1694134594436 759.48473026501597 842.03145439751813 539.22499822972213 529.71317166717029 524.1214251666097 440.46416312502413 416.64349749434041 640.79000547653413 407.67497246622224
threshold=841.5 1020.5 10150 313.5 9.9999996826552254e-21 4.8500749999999995 32907.5 946.5 1894.5 1331 9.8333333333333339 804.5 82.5 362504 1145.5 4565.5 2378.5 9.9999996826552254e-21 1277.5 107.5 616.5 971.5 -27974418597.5 83611.5 1275.5 9.9999996826552254e-21 120.5 196.5 2573.5 145.5
decision_type=2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
left_child=4 2 3 10 5 9 -7 27 15 19 29 -11 -6 -12 -14 -9 -5 -8 -4 -1 26 -22 -23 -15 -17 -10 -21 -3 -29 -2
right_child=1 7 18 16 12 6 17 8 25 11 13 -13 14 23 -16 24 -18 -19 -20 20 21 22 -24 -25 -26 -27 -28 28 -30 -31
leaf_value=-0.18077119270056649 -0.15993369880754066 -0.18453940579902467 0.13216132565084632 0.027432727681506765 -0.19213865580343734 -0.16818294237615483 -0.14436090440678417 -0.10215443431535599 -0.16384608642321105 0.093979443397465959 0.11306532831797048 -0.1286364873173152 -0.14502337944082802 -0.11760013470900144 -0.18429276208529438 0.13068826105671855 -0.085921384051446295 0.16820182122156815 -0.12460733169972585 0.10546737370562512 -0.17406932486954349 -0.16393200539917041 0.07552986625072354 0.023529412115321439 -0.035882059505199085 -0.13294451392918474 -0.078587338110272617 -0.1493108119203401 -0.17757694880208788 -0.11940408491334077
leaf_count=361208 114021 321333 2901 6875 1214601 12157 266 5245 198186 1362 796 3733 23741 12034 646046 1235 3587 3409 382 567 97429 1353 1038 1190 2001 24690 6286 36347 273308 10874
internal_value=0 -1.6756035684553279 -1.3503995807677192 -1.4025586268301011 -1.8564467494467376 -1.7326803161977709 -0.95351187468418397 -1.7331740776603333 -1.5647073570283156 -1.7587615439261188 -1.4995788791707159 -0.6912659470068695 -1.8885516146356272 -0.92524964336661908 -1.8290083265276871 -0.52611720316000476 -0.1143184859491493 1.4557823129251701 1.022844958879074 -1.7703860597032151 -1.6439961377293224 -1.7133640552995393 -0.59974905897114184 -1.0490018148820326 0.276885043263288 -1.6042283601643963 -0.63359112797315043 -1.7949438024177955 -1.7425909479905055 -1.5640498018335403
internal_count=3388201 1015005 152660 149377 2373196 488808 15832 862345 231357 472976 138915 5095 1884388 14020 669787 8481 10462 3675 3283 467881 106673 99820 2391 13224 3236 222876 6853 630988 309655 124895
shrinkage=0.1

Support tweedie and gamma distributions

A model that was trained on a dense dataset makes incorrect predictions for sparse datasets

Hi,
I found that the prediction results produce by python lightgbm model and pmml file is different.
It happens when training data did not contain missing value but predict the data which contains missing value.

Here is the example to show this case.

NPE for a combination of categorical features and empty trees

the real model contains some invalid trees, example the tree has not contain any values of split_feature. so NPE will happen when trans GBDT to PMML, exactly NPE will occuer in “org.jpmml.lightgbm.Tree.isBinary(int feature)” lined 319.

support cross_entropy as objective for lightgbm

Hello,

Thanks for the great work on this project.
I was wondering if supporting cross entropy objective in your supporting roadmap or not.
I have a use case that I need to use numeric probability labels in [0, 1]. I got the following error message. Could you help to take a look? thanks!

Jun 30, 2021 3:56:41 AM org.jpmml.lightgbm.Main run
INFO: Loading GBDT..
Jun 30, 2021 3:56:41 AM org.jpmml.lightgbm.Main run
SEVERE: Failed to load GBDT
java.lang.IllegalArgumentException: cross_entropy
        at org.jpmml.lightgbm.GBDT.loadObjectiveFunction(GBDT.java:529)
        at org.jpmml.lightgbm.GBDT.load(GBDT.java:103)
        at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:51)
        at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:43)
        at org.jpmml.lightgbm.Main.run(Main.java:137)
        at org.jpmml.lightgbm.Main.main(Main.java:127)

Exception in thread "main" java.lang.IllegalArgumentException: cross_entropy
        at org.jpmml.lightgbm.GBDT.loadObjectiveFunction(GBDT.java:529)
        at org.jpmml.lightgbm.GBDT.load(GBDT.java:103)
        at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:51)
        at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:43)
        at org.jpmml.lightgbm.Main.run(Main.java:137)
        at org.jpmml.lightgbm.Main.main(Main.java:127)

Support lightgbm (boosting_type="rf") ?

Here is my code,

import numpy as np
import pandas as pd
import lightgbm as lgb  # version 2.3.1
from sklearn2pmml import sklearn2pmml, make_pmml_pipeline # 0.52.0

df_ = pd.DataFrame({"aaaaaaaaaaaaaaaaaa": np.random.rand(10000)})
for i in range(20):
    df_["var_" + str(i)] = np.random.rand(10000)
for i in range(30, 100):
    df_["var_" + str(i)] = np.random.randint(0, 20, 10000)

df_.iloc[-2000:] = np.NaN
df_["target"] = np.random.randint(0, 2, 10000)

y = df_["target"]
X = df_.drop("target", axis=1)

model1 = lgb.sklearn.LGBMClassifier(
    **{
        "boosting_type": "gbdt",
        "max_depth": 3,
        "learning_rate": 0.05,
        "n_estimators": 10,
        # "bagging_fraction": 0.8,
        # "bagging_freq": 1,
        # "subsample": 0.8,
        # "subsample_freq": 1,
    }
)
model2 = lgb.sklearn.LGBMClassifier(
    **{
        "boosting_type": "rf",
        "max_depth": 3,
        "learning_rate": 0.05,
        "n_estimators": 10,
        "bagging_fraction": 0.8,
        "bagging_freq": 1,
        "subsample": 0.8,
        "subsample_freq": 1,
    }
)

model1.fit(X, y)
model2.fit(X, y)

df_["model1_p1"] = model1.predict_proba(X)[:, 1]
df_["model2_p1"] = model2.predict_proba(X)[:, 1]

df_.to_csv("input.csv", index=False, encoding="utf-8")

sklearn2pmml(make_pmml_pipeline(
    model1, active_fields=X.columns.tolist(), target_fields="target"), "model1.pmml")
sklearn2pmml(make_pmml_pipeline(
    model2, active_fields=X.columns.tolist(), target_fields="target"), "model2.pmml")

java -cp pmml-evaluator-example-executable-1.4.12.jar org.jpmml.evaluator.EvaluationExample --model model1.pmml --input input.csv --output output1.csv --missing-values "" --separator ","

probability(1) == model1_p1

java -cp pmml-evaluator-example-executable-1.4.12.jar org.jpmml.evaluator.EvaluationExample --model model2.pmml --input input.csv --output output2.csv --missing-values "" --separator ","

probability(1) != model2_p1 :( ???

Integration tests fail on Mac OS

Hi!

I've got the next error during installation. Fixed by editing ClassificationAuditInvalid for those 3 predictions. I guess difference in 16 decimal is not critical, but the program does not build without this test.

[INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.62 s - in org.jpmml.lightgbm.RegressionTest
[INFO] Running org.jpmml.lightgbm.ClassificationTest
Conflict{id=478, arguments={Age=50, Employment=Private, Education=HSgrad, Marital=-999, Occupation=Repair, Income=59745.14, Gender=Male, Deductions=FALSE, Hours=40, Adjusted=1}, difference=not equal: value differences={probability(1)=(0.12367296373967719, 0.12367296373967716)}}
Conflict{id=831, arguments={Age=26, Employment=Consultant, Education=College, Marital=-999, Occupation=Repair, Income=120415.46, Gender=Male, Deductions=FALSE, Hours=30, Adjusted=0}, difference=not equal: value differences={probability(1)=(0.0280894042937836, 0.028089404293783607)}}
Conflict{id=864, arguments={Age=34, Employment=Private, Education=HSgrad, Marital=Divorced, Occupation=Clerical, Income=-999, Gender=Male, Deductions=FALSE, Hours=40, Adjusted=0}, difference=not equal: value differences={probability(1)=(0.027595431517656793, 0.0275954315176568)}}
[ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.461 s <<< FAILURE! - in org.jpmml.lightgbm.ClassificationTest

MAC OS - 10.15.2
Python 3.7.3
llightgbm 2.3.1
maven 3.6.3
java version "13.0.1" 2019-10-15
Java(TM) SE Runtime Environment (build 13.0.1+9)
Java HotSpot(TM) 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing)

One more small thing:
lgbm.fit(boston.data, boston.target, feature_name = boston.feature_names)
The type of boston.feature_names is 'numpy.ndarray' and the one should convert it to list before passing to fit function. (at least for my python and modules versions)

Fail to convert lightgbm to pmml （IndexOutOfBoundsException）

when I run java -jar jpmml-lightgbm-executable-1.2-SNAPSHOT.jar --lgbm-input aa.txt --pmml-output lightgbm.pmml to convert lightgbm model to pmml format, but encountered,

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at org.jpmml.lightgbm.Tree.selectValues(Tree.java:244)
at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:155)
at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:191)
at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:191)
at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:191)
at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:94)
at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:66)
at org.jpmml.lightgbm.BinomialLogisticRegression.encodeMiningModel(BinomialLogisticRegression.java:48)
at org.jpmml.lightgbm.GBDT.encodeMiningModel(GBDT.java:285)
at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:274)
at org.jpmml.lightgbm.Main.run(Main.java:131)
at org.jpmml.lightgbm.Main.main(Main.java:117)

Attached is the lightgbm model file.
aa.txt

lightgbm version is 2.2.3

Can someone please kindly help? Thanks much!

Support for empty (constant) boosters

ERROR - Convert lightgbm model to pmml error! b'Exception in thread "main" java.lang.NullPointerException
org.jpmml.lightgbm.LightGBMUtil.parseStringArray(LightGBMUtil.java:100)
org.jpmml.lightgbm.LightGBMUtil.parseIntArray(LightGBMUtil.java:111)
org.jpmml.lightgbm.Section.getIntArray(Section.java:62)
org.jpmml.lightgbm.Tree.load(Tree.java:79)
org.jpmml.lightgbm.GBDT.load(GBDT.java:119)
org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:53)
org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:45)
org.jpmml.lightgbm.Main.run(Main.java:125)
org.jpmml.lightgbm.Main.main(Main.java:118)

possible to add some settings?

1\ set default value for missing or invalid value in categorical feature? it may occurs error in system when meet values like test1=4 ( not in (0,1,2,3) )

<DataField name="test1" optype="categorical" dataType="integer">
	<Value value="0"/>
	<Value value="1"/>
	<Value value="2"/>
	<Value value="3"/>
</DataField>

2\ like below , is possible to close the margin setting? it may occurs error in system when meet values like 200 (>100)

<DataField name="test2" optype="continuous" dataType="double">
	<Interval closure="closedClosed" leftMargin="0.0" rightMargin="100.0" />
</DataField>

Error converting mode output txt to PMML

Got the following error when converting txt to PMML

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 122, Size: 1
	at java.util.ArrayList.rangeCheck(Unknown Source)
	at java.util.ArrayList.get(Unknown Source)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:132)
	at org.jpmml.lightgbm.Main.main(Main.java:118)

For security reason I couldn't attach the model txt file. But could you explain what the error means? Trying to see if I can give you a toy example

Detect and normalize poorly encoded categorical splits

When performing the binarization of categorical features (eg. using LabelBinarizer) instead of integer-encoding them (eg. using LabelEncoder), then splits of categorical values are encoded as double comparisons against a reference value 1.0000000180025095E-35 (the smallest 64-bit value that is still greater than 0):

<Node id="8" score="0.0745191789134865" recordCount="39">
            <SimplePredicate field="lookup(Employment)" operator="lessOrEqual" value="1.0000000180025095E-35"/>
</Node>

It would be much more transparent and space efficient to encode the same as integer comparisons against 0 and 1 reference values:

<Node id="8" score="0.0745191789134865" recordCount="39">
            <SimplePredicate field="lookup(Employment)" operator="equal" value="0"/>
</Node>

and/or:

<Node id="8" score="0.0745191789134865" recordCount="39">
            <SimplePredicate field="lookup(Employment)" operator="notEqual" value="1"/>
</Node>

trans lightgbm.txt to lightgbm.pmml failed.

It's a real lightgbm model file, but trans to pmml failed.
because some value of section is null, such as “left_child”, then NPE will happen in parseStringArray of LightGBMUtil lined 100。

the model builed by lightgbm version 3.0.0。

how can i do it?

Exception during converting process

Ran converting operation on my small lgbm model and it failed with some exception. How could i fix it?

lgbm_model.txt

import project, cannot find "PandasCategoricalParser"

"PandasCategoricalParser" in "GBDT", where is from?

Example to load the jpmml-lightgbm file and use it to predict

Hi @vruusmann,

The first thing I want to say is thank you for this project.
I found it really helpful for me when I apply lightgbm for a java application.

It would be even better if I can have an example to load the light gbm pmml file, and use the model (in java version) to predict for a input data file.

Can you please give me some more document about using the output (light gbm pmml file)?

Any help would be appreciated.
Thanks.

categorical features support

It seems the categorical features isn't supported.
any plans for this ?

Mapped Value in Datafield

I am getting features in pmml file as -
DataField name="screen_size" optype="categorical" dataType="integer"
Value value="1"
Value value="3"
Value value="4"
Value value="5"

where as the original screen_size feature value looks like
"1440x900"
" 412x732"
"360x640"
"414x736 "

similar thing is happening for all features

Exception in thread "main" java.lang.IllegalArgumentException: Right branch is not selectable

Hey,

Can someone help with this

command run java -jar jpmml-lightgbm-executable-1.2.jar --lgbm-input lightgbm_2021092012 --pmml-output test.pmml


Exception in thread "main" java.lang.IllegalArgumentException: Right branch is not selectable
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:210)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:236)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
	at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:235)
	at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:96)
	at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:72)
	at org.jpmml.lightgbm.BinomialLogisticRegression.encodeMiningModel(BinomialLogisticRegression.java:47)
	at org.jpmml.lightgbm.GBDT.encodeMiningModel(GBDT.java:394)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:383)
	at org.jpmml.lightgbm.Main.run(Main.java:132)
	at org.jpmml.lightgbm.Main.main(Main.java:118)

getting this error abruptly

lightGBM version: 3.1.0
jpmml version: 1.2

on updating jpmml to latest version i.e: 1.3.11

getting this error


Sep 20, 2021 11:37:02 PM org.jpmml.lightgbm.Main run
INFO: Loading GBDT..
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
INFO: Loaded GBDT in 485 ms.
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
INFO: Converting GBDT to PMML..
Sep 20, 2021 11:37:03 PM org.jpmml.lightgbm.Main run
SEVERE: Failed to convert GBDT to PMML
java.lang.IllegalArgumentException: Expected all values to be of the same data type, got 2 different data types ([integer, string])
	at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:129)
	at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:85)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:158)
	at org.jpmml.lightgbm.Main.main(Main.java:127)

Exception in thread "main" java.lang.IllegalArgumentException: Expected all values to be of the same data type, got 2 different data types ([integer, string])
	at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:129)
	at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:85)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:158)
	at org.jpmml.lightgbm.Main.main(Main.java:127)

Command-line option for promoting `NaN` values from invalid to missing status

See #33

Both LightGBM and XGBoost treat NaN as missing values. This is in conflict with (J)PMML conventions, which treat NaN as invalid values.

The solutiion would be to generate MiningField@invalidValueTreatment="asMissing" attributes, ehich would cause the model to promote all invalid values (including NaN) to missing values.

Lightgbm categorical_feature predicate wrong

Hi, Mr. Ruusmann, when I used your 1.1.4 jpmml-lightgbm and successfully got a pmml from a pipeline, I find following errors in the pmml which makes the Verification failed.

A labelencodered categorical feature is used in this SimpleSetPredicate. As the pmml itself demands，the array type should be "int" . However, the values in the array are actually the string type original values before labelencodered.
I think there must be some mistakes.
Did I mistakenly used your tool or there is some bugs?
I read your Java code, and I think maybe the bug is in the encodeFeatures part. I‘m not very sure since I'm not good at Java. Hope this helps you.

not support objective='quantile'

i try use pmml API fit quantile lightgbm model, but not work.
as:

Standard output is empty
Standard error:
Dec 26, 2019 4:19:51 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Dec 26, 2019 4:19:51 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 194 ms.
Dec 26, 2019 4:19:51 PM org.jpmml.sklearn.Main run
INFO: Converting..
Dec 26, 2019 4:19:51 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: quantile
        at org.jpmml.lightgbm.GBDT.loadObjectiveFunction(GBDT.java:521)
        at org.jpmml.lightgbm.GBDT.load(GBDT.java:105)
        at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:53)
        at lightgbm.sklearn.Booster.loadGBDT(Booster.java:54)
        at lightgbm.sklearn.Booster.getGBDT(Booster.java:42)
        at lightgbm.sklearn.BoosterUtil.getGBDT(BoosterUtil.java:68)
        at lightgbm.sklearn.BoosterUtil.getNumberOfFeatures(BoosterUtil.java:37)
        at lightgbm.sklearn.LGBMRegressor.getNumberOfFeatures(LGBMRegressor.java:34)
        at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:227)
        at org.jpmml.sklearn.Main.run(Main.java:145)
        at org.jpmml.sklearn.Main.main(Main.java:94)

Exception in thread "main" java.lang.IllegalArgumentException: quantile
        at org.jpmml.lightgbm.GBDT.loadObjectiveFunction(GBDT.java:521)
        at org.jpmml.lightgbm.GBDT.load(GBDT.java:105)
        at org.jpmml.lightgbm.LightGBMUtil.loadGBDT(LightGBMUtil.java:53)
        at lightgbm.sklearn.Booster.loadGBDT(Booster.java:54)
        at lightgbm.sklearn.Booster.getGBDT(Booster.java:42)
        at lightgbm.sklearn.BoosterUtil.getGBDT(BoosterUtil.java:68)
        at lightgbm.sklearn.BoosterUtil.getNumberOfFeatures(BoosterUtil.java:37)
        at lightgbm.sklearn.LGBMRegressor.getNumberOfFeatures(LGBMRegressor.java:34)
        at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:227)
        at org.jpmml.sklearn.Main.run(Main.java:145)
        at org.jpmml.sklearn.Main.main(Main.java:94)

Traceback (most recent call last):
  File "train.py", line 48, in <module>
    train(debug=True, boost_type='lgb', is_quantile=1, is_pmml=True)
  File "train.py", line 39, in train
    lgb_q.train_and_save(is_pmml=is_pmml)
  File "/home/jupyter/code/AdPutting/model.py", line 104, in train_and_save
    self.regressor.set_params(alpha=0.5)
  File "/home/jupyter/code/AdPutting/model.py", line 98, in __train_pmml
    pipeline.fit(self.train[self.feats], self.train['y'], regressor__categorical_feature=cat_indices)
  File "/home/jupyter/anaconda3/envs/py37/lib/python3.7/site-packages/sklearn2pmml/__init__.py", line 265, in sklearn2pmml
    raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

Possible to support LGBMRanker?

Is it possible to support https://github.com/Microsoft/LightGBM/blob/master/python-package/lightgbm/sklearn.py#L639 ?
If so I'd be happy to contribute, just not sure if it's even doable.

thanks

jpmml / jpmml-lightgbm Goto Github PK

jpmml-lightgbm's Introduction

IMPORTANT

Features

Class model

Evaluation engine

Installation

Class model

Evaluation engine

Usage

Class model

Example applications

Evaluation engine

Example applications

Additional information

jpmml-lightgbm's People

Contributors

Stargazers

Watchers

Forkers

jpmml-lightgbm's Issues

Recommend Projects

Recommend Topics

Recommend Org