Giter VIP home page Giter VIP logo

jpmml-h2o's Introduction

Java API for producing and scoring models in Predictive Model Markup Language (PMML).

IMPORTANT

This is a legacy codebase.

Starting from March 2014, this project has been superseded by [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) projects.

Features

Class model

  • Full support for PMML 3.0, 3.1, 3.2, 4.0 and 4.1 schemas:
    • Class hierarchy.
    • Schema version annotations.
  • Fluent API:
    • Value constructors.
  • SAX Locator information
  • [Visitor pattern] (http://en.wikipedia.org/wiki/Visitor_pattern):
    • Validation agents.
    • Optimization and transformation agents.

Evaluation engine

Installation

JPMML library JAR files (together with accompanying Java source and Javadocs JAR files) are released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.0.22 (17 February, 2014).

Class model

<!-- Class model classes -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-model</artifactId>
	<version>${jpmml.version}</version>
</dependency>
<!-- Class model annotations -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-schema</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Evaluation engine

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Usage

Class model

The class model consists of two types of classes. There is a small number of manually crafted classes that are used for structuring the class hierarchy. They are permanently stored in the Java sources directory /pmml-model/src/main/java. Additionally, there is a much greater number of automatically generated classes that represent actual PMML elements. They can be found in the generated Java sources directory /pmml-model/target/generated-sources/xjc after a successful build operation.

All class model classes descend from class org.dmg.pmml.PMMLObject. Additional class hierarchy levels, if any, represent common behaviour and/or features. For example, all model classes descend from class org.dmg.pmml.Model.

There is not much documentation accompanying class model classes. The application developer should consult with the [PMML specification] (http://www.dmg.org/v4-1/GeneralStructure.html) about individual PMML elements and attributes.

Example applications

Evaluation engine

A model evaluator class can be instantiated directly when the contents of the PMML document is known:

PMML pmml = ...;

ModelEvaluator<TreeModel> modelEvaluator = new TreeModelEvaluator(pmml);

Otherwise, a PMML manager class should be instantiated first, which will inspect the contents of the PMML document and instantiate the right model evaluator class later:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);
 
ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

Model evaluator classes follow functional programming principles. Model evaluator instances are cheap enough to be created and discarded as needed (ie. not worth the pooling effort).

It is advisable for application code to work against the org.jpmml.evaluator.Evaluator interface:

Evaluator evaluator = (Evaluator)modelEvaluator;

An evaluator instance can be queried for the definition of active (ie. independent), predicted (ie. primary dependent) and output (ie. secondary dependent) fields:

List<FieldName> activeFields = evaluator.getActiveFields();
List<FieldName> predictedFields = evaluator.getPredictedFields();
List<FieldName> outputFields = evaluator.getOutputFields();

The PMML scoring operation must be invoked with valid arguments. Otherwise, the behaviour of the model evaluator class is unspecified.

The preparation of field values:

Map<FieldName, FieldValue> arguments = new LinkedHashMap<FieldName, FieldValue>();

List<FieldName> activeFields = evaluator.getActiveFields();
for(FieldName activeField : activeFields){
	// The raw (ie. user-supplied) value could be any Java primitive value
	Object rawValue = ...;

	// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue activeValue = evaluator.prepare(activeField, rawValue);

	arguments.put(activeField, activeValue);
}

The scoring:

Map<FieldName, ?> results = evaluator.evaluate(arguments);

Typically, a model has exactly one predicted field, which is called the target field:

FieldName targetName = evaluator.getTargetField();
Object targetValue = results.get(targetName);

The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	Object primitiveValue = computable.getResult();
}

The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;
	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
	Entity winner = entities.get(hasEntityId.getEntityId());

	// Test for "probability" result feature
	if(targetValue instanceof HasProbability){
		HasProbability hasProbability = (HasProbability)targetValue;
		Double winnerProbability = hasProbability.getProbability(winner.getId());
	}
}
Example applications

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-h2o's People

Contributors

vruusmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

jpmml-h2o's Issues

Score mismatch between PMML file and H2O Mojo prediction

Hi,
I am seeing inconsistency between the PMML file predicted score and H2O Mojo file predicted score for exactly same feature map. I am using flow UI to get predicted score from H2O mojo file, and the difference in score I am seeing is large, for example, (0.005 with H2O v/s 0.90 with Pmml) for exact same feature values. could you please help here ?

Throws error when I change the max_depth > 5 and ntrees = 100 for a GBM

When I change the max_depth for a GBM model in H2O, export the mojo, and try to convert it with the tool, it throws the following error:

Exception in thread "main" java.lang.IllegalArgumentException
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:87)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeTreeModel(SharedTreeMojoModelConverter.java:68)
        at org.jpmml.h2o.GbmMojoModelConverter.lambda$encodeModel$0(GbmMojoModelConverter.java:73)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:74)
        at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:48)
        at org.jpmml.h2o.Converter.encodePMML(Converter.java:71)
        at org.jpmml.h2o.Main.run(Main.java:107)
        at org.jpmml.h2o.Main.main(Main.java:88)

The error happens for when I change the max_depth for any value above 5 with ntrees = 100

image

H2O version 3.19.0.4274

Handling of missing values

I don't see support for missing values in mojo-pmml conversion.
While executing the pmml using jpmml-evaluator, the following error is thrown:
"Exception in thread "main" org.jpmml.evaluator.InvalidResultException : Field "X" cannot accept input value NaN".

This error doesn't appear for the jpmml-lightgbm package, since missing values are defined as 'NaN' in the pmml file.
Could the missing value handling be added to this jpmml-h2o package?

Output PMML file too big in size

Hi @vruusmann,
I've converted an h2o DRF model (9 MB) but the output PMML is 200 MB, occupying almost 1 GB on RAM.
There seems to be no compact option to reduce size.

Do you have any idea what can be done?
Thanks.

Support of categorical variables

My model is a H2O Gradient Boosting Machine Learner trained using the Knime IDE with the H2O Machine Learning extension. I save it to a MOJO file and then use your project following all the steps. The model uses categorical and numerical variables, the categorical are in String format and I use the Domain Calculator of Knime to treat them as categorical.

I'm getting this error stack:

Exception in thread "main" java.lang.IllegalArgumentException: Field nivel_1 has data type string
        at org.jpmml.converter.PMMLEncoder.toContinuous(PMMLEncoder.java:209)
        at org.jpmml.converter.CategoricalFeature.toContinuousFeature(CategoricalFeature.java:56)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:195)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:257)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:257)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:257)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeNode(SharedTreeMojoModelConverter.java:221)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeTreeModel(SharedTreeMojoModelConverter.java:98)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.lambda$encodeTreeModels$0(SharedTreeMojoModelConverter.java:74)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.jpmml.h2o.SharedTreeMojoModelConverter.encodeTreeModels(SharedTreeMojoModelConverter.java:75)
        at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:58)
        at org.jpmml.h2o.Converter.encodePMML(Converter.java:87)
        at org.jpmml.h2o.example.Main.run(Main.java:123)
        at org.jpmml.h2o.example.Main.main(Main.java:93)

Your project works with categorical variables stored as Strings? If not, what should be the type for categorical variables?

Thanks in advance!
Julio Paciello
from Paraguay

release 1.0.10 is not compatible with h2o-3.28.x

release 1.0.10 is not compatible with h2o-3.28.x. When doing PMML conveting, generate error message like

Exception in thread"main" ... MOJO version imcompatibility - the model MOJO version (1.10) is higher than the current h2o version (1.00) supports ...
    hex.genmodel.ModelMojoReader.checkMaxSupportedMojoVersion(ModelMojoReader.java:296)
    ...
    at org.jpmml.h2o.Main.main(Main.java:88)

rebuild with updated h2o dependency version does not solve the issue.

Throws and error when converting a poisson GBM to PMML

When trying to convert a poisson GBM built in H2O and exported as a MOJO to PMML format, I get the following error:
jpmml-h2o

I managed to solve this error by adding the following piece of code to the "\jpmml-h2o-master\src\main\java\org\jpmml\h2o\GbmMojoModelConverter.java" file:

if((DistributionFamily.poisson).equals(model._family)){
	ContinuousLabel continuousLabel = (ContinuousLabel)label;

	MiningModel miningModel = new MiningModel(MiningFunction.REGRESSION, ModelUtil.createMiningSchema(continuousLabel))
		.setSegmentation(MiningModelUtil.createSegmentation(MultipleModelMethod.SUM, treeModels))
		.setTargets(ModelUtil.createRescaleTargets(null, (double)model._init_f, continuousLabel));

	return miningModel;
} else

Once I added this piece of code (by copying the gaussian section and changing gaussian to poisson) the converter worked successfully, however, I'd like you to confirm whether this is okay and whether it can perhaps be added to the tool?

Thanks for your time.

Regards,
Paulo

Downgrading PMML 4.4 to PMML 4.2

Hi there,

I have a GBM model and converted it to a PMML using the jpmml-h2o library. The little problem with this is that I was trying to import this model file into a software that only accepts PMML 4.2 versions or lower. I was wondering if there is a way to downgrade PMML version.

Regards.
Valentina

Detect feature promotion from (high cardinality-) categorical to pseudo-numeric

Hi Villu,

Thanks for your assistance with the previous issue that I raised, it's greatly appreciated!

I have however stumbled across a new issue and was wondering whether you could perhaps take a look at it? I'm getting the following error when trying to convert a Tweedie GBM to PMML:

image

It seems like one of the inputs to the model (MAKE) is causing an issue, however, the same input was used for the Poisson model that I referred to you previously and there were no issues with it one you made allowance for Poisson models in your code.

Your assistance with this would be greatly appreciated.

Thanks for your time.

Regards,
Paulo

openscoring can't read output .pmml

I am testing out this tool and openscoring for some future actual use case.
i was able to get the openscoring working using the pmml that came with the repos.
but when i create a pmml from h2o using jpmml-h2o, the output pmml can't be read by openscoring i can't figure out why. Any help pointing the a solution would be appreciated.
Thank you!
Warning: Couldn't read data from file "xgboost_test1.pmml", this makes an empty
Warning: POST.

{ "message" : "Bad Request" }

Support for `quantile` distribution in GBM

INFO 2021-10-29 16:26:52 [main] org.jpmml.h2o.Main [TID: N/A]- Loading MOJO..
INFO 2021-10-29 16:26:52 [main] org.jpmml.h2o.Main [TID: N/A]- Loaded MOJO in 305 ms.
INFO 2021-10-29 16:26:52 [main] org.jpmml.h2o.Main [TID: N/A]- Converting MOJO to PMML..
ERROR 2021-10-29 16:26:53 [main] org.jpmml.h2o.Main [TID: N/A]- Failed to convert MOJO to PMML
java.lang.IllegalArgumentException: Distribution family quantile is not supported
at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:111)
at org.jpmml.h2o.GbmMojoModelConverter.encodeModel(GbmMojoModelConverter.java:45)
at org.jpmml.h2o.Converter.encodePMML(Converter.java:88)
at org.jpmml.h2o.Main.run(Main.java:120)
at org.jpmml.h2o.Main.main(Main.java:90)

Capturing variable importances

The MojoModel#_modelDescriptor field is a model descriptor. Among other things, it exposes variable importances information via the ModelDescriptor#variableImportances() method.

An user has requested that varimp information should be collected and stored in the generated PMML document.

Missing number of values in the array

Hi,
I tried to convert an H20 random forest model. If the variable uses a IsIn operator, i'm expecting to get a simple set predicate similar to the figure below. It should contain the number of values (n=?) in the array and quotes between the values.

image

However, i'm getting these results (no n=? option and no quotes).
image

Please let me know your thoughts on how to fix this issue.
Let me know if you need additional info.
Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.