Giter VIP home page Giter VIP logo

jpmml-evaluator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jpmml-evaluator's Issues

Duplicate value Exception for Tree Model using Iris Data --- execution

Hi,
am new to this, while i have execute below exception is occuring.
R-PMML tree model getting exception,

used below line for execution:
D:\JPMML\jpmml-evaluator-master\pmml-evaluator-example>java -cp target/example-1.2-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model D:\JPMML\Test\pmml\IrisTree.pmml --input D:\JPMML\Test\csv\Iris.csv --output D:\JPMML\Test\output\TreeOutput.csv

Exception in thread "main" org.jpmml.evaluator.DuplicateValueException: class
at org.jpmml.evaluator.EvaluationContext.declare(EvaluationContext.java:91)
at org.jpmml.evaluator.OutputUtil.evaluate(OutputUtil.java:330)
at org.jpmml.evaluator.TreeModelEvaluator.evaluate(TreeModelEvaluator.java:93)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:406)
at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:261)
at org.jpmml.evaluator.Example.execute(Example.java:60)
at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:127)

Boolean input variables not recognised by evaluator function.

I have the following R code for generating 2 csv and 2 pmml files based on the iris dataset:

data(iris)
library(pmml)

# build a model for Sepal.Length based on remaining variables
model.glm <- glm(Sepal.Length ~ ., data=iris)
saveXML(pmml(model.glm), "iris.glm.pmml")

# write csv file for testing
write.csv(iris, 'iris.csv', quote=FALSE, row.names=FALSE)

# set remaining variables to booleans
iris$Sepal.Width <- as.logical(iris$Sepal.Width > 3)
iris$Petal.Length <- as.logical(iris$Petal.Length > 4)
iris$Petal.Width <- as.logical(iris$Petal.Width > 1)
iris$Species   <- as.logical(iris$Species=='setosa')

# rebuild model for Sepal.Length
model.glm <- glm(Sepal.Length ~ ., data=iris)
saveXML(pmml(model.glm), "iris.glm.bool.pmml")

# write csv file for testing
write.csv(iris, 'iris.bool.csv', quote=FALSE, row.names=FALSE)

The problem becomes apparent when doing predictions. The files iris.csv and iris.glm.pmml produce the desired output. The files iris.bool.csv and iris.glm.bool.pmml produce the same value
for every record, regardless of the input data.

FunctionUtil.evaluate order

we have an issue in production where we have a DefineFunction in the pmml

Looking at the FunctionUtil.evaluate it will try to do some reflection stuff to find a user defined one before trying to use the one from the pmml. The problem is that reflection is a bit too slow for us in production. It would be great to either have some way to supply our own FunctionRegistry or to change the order in which the functions are resolved in FunctionUtil.

screenshot from 2016-03-03 20-24-23

Null in Result

Hi villu,

ProbabilityDistribution prob = (ProbabilityDistribution) results.get(evaluator.getTargetField().getName());

This is returning me null. I dont know whats going on.
I have tried to match my PMML with the example that you showed me but even then its failing.
Can you please look into it and guide me.

Thanks

Performance issues while running evaluator for multiple threads #1

My .pmml file contains ~900 input fields of type double.
I'm running an application which runs on a multi-threaded environment evaluating with 30 threads.
Since there's a method in org.jpmml.evaluator.TypeUtil Line 208 - return (Double.parseDouble(value) + 0d); it has one synchronized method which blocks 29 threads and affects the overall performance
Ref: http://dalelane.co.uk/blog/?p=2936
I did a workaround adding this class from
https://gist.github.com/dalelane/7720269
and calling
return (DoubleParser.parseDouble(value) + 0d);
on line 208 which solved the issued.

Suggest you to do the same if required.

InvalidFeatureException from spark context

Hi,

When I try to use the evaluator from a spark context, it will not create the model manager because of pmml validation problems.

Exception in thread "main" org.jpmml.evaluator.InvalidFeatureException (at or around line 8): DataDictionary                                          [13/1951]
        at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:58)
        at org.jpmml.evaluator.ModelEvaluator.<init>(ModelEvaluator.java:113)
        at org.jpmml.evaluator.TreeModelEvaluator.<init>(TreeModelEvaluator.java:54)
        at org.jpmml.evaluator.ModelEvaluatorFactory.newModelManager(ModelEvaluatorFactory.java:101)
        at org.jpmml.evaluator.ModelEvaluatorFactory.newModelManager(ModelEvaluatorFactory.java:45)
        at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:66)
        at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:46)
        at com.example.Main.main(Main.java:23)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.dmg.pmml.DataField cannot be cast to org.dmg.pmml.Indexable
        at org.jpmml.evaluator.IndexableUtil.ensureKey(IndexableUtil.java:78)
        at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:64)
        at org.jpmml.evaluator.ModelEvaluator$1.load(ModelEvaluator.java:538)
        at org.jpmml.evaluator.ModelEvaluator$1.load(ModelEvaluator.java:534)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3953)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
        at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:50)
        ... 16 more

Here is my java class I am submitting:

package com.example;

import org.dmg.pmml.PMML;
import org.jpmml.evaluator.Evaluator;
import org.jpmml.evaluator.ModelEvaluatorFactory;
import org.jpmml.model.ImportFilter;
import org.jpmml.model.JAXBUtil;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.bind.JAXBException;
import javax.xml.transform.Source;

public final class Main {
    public static void main(String[] args) {
        System.out.println("hello world");

        try {
            Source transformedSource = ImportFilter.apply(new InputSource(Main.class.getResourceAsStream("/DecisionTreeIris.pmml")));
            PMML pmml = JAXBUtil.unmarshalPMML(transformedSource);
            ModelEvaluatorFactory modelEvaluatorFactory = ModelEvaluatorFactory.newInstance();
            Evaluator evaluator = modelEvaluatorFactory.newModelManager(pmml);
            evaluator.verify();
        } catch (SAXException | JAXBException e) {
            // could not parse pmml as xml
            throw new RuntimeException(e);
        }
    }
}

Submitting with spark-submit --class com.example.Main /path/to/example-assembly.jar.

It does not throw the error when I run the assembled jar like java -jar /path/to/example-assembly.jar.

DecisionTreeIris.pmml is from here.

Thanks for the project. Any help is appreciated.

TypeUtil.getDataType(Object value) does not recognize BigDecimal

An exception is thrown when you pass a BigDecimal into the ModelEvaluator method prepare(activeField, rawValue). This is because TypeUtil.getDataType(Object value) does not check for BigDecimal values and an EvaluationException is thrown.

BigDecimal values are recognized as superior to Doubles/Floats for financial calculations and considered 'best practice'. It is recommended that the JPMML framework handles them without having to convert to a Double first.

Adding the following to the getDataType(Object value) method should resolve this issue:

if(value instanceof BigDecimal){
return DataType.DOUBLE;
} else

In addition: improving the message provided within the EvaluationException would also help with diagnosis of future issues. For example, the message 'the class java.math.BigDecimal is not a supported type' would improve the usability of the framework.

Question: multipleModelMethod="max" when multiple classes have max

Hi @vruusmann,

Looking through the implementation of multipleModelMethod="max" for classification, particularly: https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/ProbabilityAggregator.java#L207

Suppose we have a case with three segments that are predicting three classes and we have the following probabilities:

{a: 0.8, b: 0.1, c: 0.1},
{a: 0.5, b: 0.1, c: 0.4},
{a: 0.1, b: 0.1, c: 0.8}

Then using the max I would expect the average of the first and third model:

{a: 0.45, b: 0.1, c: 0.45}
  1. Is that your interpretation of the spec? max: consider the model(s) that have contributed the chosen probability for the winning category. Return their average probabilities;
  2. Will the implementation linked to above return that?

Getting wrong svm model result

#20

As per your inputs on above url , we have generated PMML file but output is not coming as per desire output.

PMML snippet:

output file: We are getting output(Predicted_Cluster) as 1->1 and for 2->3 and 3->2.

Please suggest on the above mention.

TreeModel prediction mismatch between KNIME and JPMML

Hello,

I trained two models in Knime: a Neural Network and a Decision Tree.

Im comparing the results in Knime and in Java.

When taking look at the Neural Network, Im getting the same results.

When Decision Tree Model, Im getting all observation going to false.

I tried to read de PMML Model inside Knime and the results are not getting it.

Can you help me?

image

Incompatible Google Guava library dependency

When running the following code
ModelEvaluator<RegressionModel> modelEvaluator = new RegressionModelEvaluator(model); Evaluator evaluator = (Evaluator) modelEvaluator;

I got error like this :
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.from(Lcom/google/common/cache/CacheBuilderSpec;)Lcom/google/common/cache/CacheBuilder;

Migrating active field preparation code from 1.2.X API to 1.3.X API

When I run the following code

Object rawValue = 1.0;
FieldValue activeValue = input.prepare(rawValue);

The error always happen:

Exception in thread "main" org.jpmml.evaluator.InvalidResultException
    at org.jpmml.evaluator.FieldValueUtil.performInvalidValueTreatment(FieldValueUtil.java:190)
    at org.jpmml.evaluator.FieldValueUtil.prepareInputValue(FieldValueUtil.java:94)
    at org.jpmml.evaluator.InputField.prepare(InputField.java:64)
    at cn.pmml.test1.PMMLTest.arguments(PMMLTest.java:87)
    at cn.pmml.test1.PMMLTest.main(PMMLTest.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

I tried many ways to fix that, but all failed.

Attempt to invoke virtual method 'boolean org.dmg.pmml.PMML.hasModels()' on a null object reference

I am currently attempting to evaluate the a .pmml model created with sklearn2pmml. However, whenever I attempt to run the code ModelEvaluator<NearestNeighborModel> modelEvaluator = new NearestNeighborModelEvaluator(pmml);, I get the following error:

java.lang.NullPointerException: Attempt to invoke virtual method 'boolean org.dmg.pmml.PMML.hasModels()' on a null object reference
  at org.jpmml.evaluator.ModelEvaluator.selectModel(ModelEvaluator.java:584)
  at org.jpmml.evaluator.nearest_neighbor.NearestNeighborModelEvaluator.<init>(NearestNeighborModelEvaluator.java:105)
  at com.mygdx.game.DrawView.pitchAngle(DrawView.java:295)
  at com.mygdx.game.StartGdxGame.render(StartGdxGame.java:113)
  at com.badlogic.gdx.backends.android.AndroidGraphics.onDrawFrame(AndroidGraphics.java:459)
  at android.opengl.GLSurfaceView$GLThread.guardedRun(GLSurfaceView.java:1522)
  at android.opengl.GLSurfaceView$GLThread.run(GLSurfaceView.java:1239)

I have double checked and the code can find and has access to the .pmml-file and a model does exist in the .pmml file in the form <NearestNeighborModel functionName="regression" numberOfNeighbors="400" continuousScoringMethod="average">.

Is there any other reason for the error? Did I maybe compile the .pmml incorrectly?

logistic regression fail in 1.1.7

In 1.1.7, when we try to consume a logistic regression under RegressionModel, we encountered the below error message.

We also tried linear regression and regression with more than two categories, they are working all fine. We also tried to switch back to 1.1.3, under 1.1.3, the logistic regression works fine also.

Exception in thread "main" org.jpmml.manager.InvalidFeatureException (at or around line 33): RegressionModel
at org.jpmml.evaluator.RegressionModelEvaluator.evaluateClassification(RegressionModelEvaluator.java:130)
at org.jpmml.evaluator.RegressionModelEvaluator.evaluate(RegressionModelEvaluator.java:71)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:425)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:211)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:108)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:86)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:68)
at org.jpmml.evaluator.CsvEvaluationExample.evaluateAll(CsvEvaluationExample.java:226)
at org.jpmml.evaluator.CsvEvaluationExample.execute(CsvEvaluationExample.java:97)
at org.jpmml.evaluator.Example.execute(Example.java:45)
at org.jpmml.evaluator.CsvEvaluationExample.main(CsvEvaluationExample.java:72)

different results in evaluation

I get different results in evaluation from using predict in R in comparison in using published pmml code via jpmml-xgboost and openscoring

interested in sample data set? and the r code?!

Getting wrong results for myata using svm pmml model

Hi,
Using jpmml evaluator for SVM pmml model execution for Audit data set working fine , But for user data getting wrong results. Actually the data set having 4 fields in that one is target field,contains three categories.
I have used below line for execution in my console.

java -cp target/example-1.2-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model model.pmml --input input.tsv --output output.tsv

Please help me on the above mention query.
Thanks in advance...

why PMMLEvaluationContext is invisible?

Hello,
In this commit (a309d50) "Restricted the visibility of EvaluationContext constructors", you remove the access control "public", so that I cannot new java object "PMMLEvaluationContext"

  1. Why do that?
  2. In my code, I use PMMLEvaluationContext to process DataTransformation with a non-model PMML file. In the lastest version, how can I do DataTransformation(PMML file only contain ) in another way?

Lost task 0.0 in stage 1.0 (TID 2, byd0158): org.jpmml.evaluator.InvalidFeatureException (at or around line 5759): Target

16/12/13 14:58:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, byd0158): org.jpmml.evaluator.InvalidFeatureException (at or around line 5759): Target
at org.jpmml.evaluator.IndexableUtil.ensureKey(IndexableUtil.java:81)
at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:64)
at org.jpmml.evaluator.ModelEvaluator$7.load(ModelEvaluator.java:586)
at org.jpmml.evaluator.ModelEvaluator$7.load(ModelEvaluator.java:582)
at com.shaded.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
at com.shaded.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323)
at com.shaded.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286)
at com.shaded.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
at com.shaded.google.common.cache.LocalCache.get(LocalCache.java:3953)
at com.shaded.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957)
at com.shaded.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875)
at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:50)
at org.jpmml.evaluator.ModelEvaluator.(ModelEvaluator.java:139)
at org.jpmml.evaluator.MiningModelEvaluator.(MiningModelEvaluator.java:79)
at org.jpmml.evaluator.ModelEvaluatorFactory.newModelManager(ModelEvaluatorFactory.java:66)
at org.jpmml.evaluator.MiningModelEvaluator.createSegmentHandler(MiningModelEvaluator.java:559)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:355)
at org.jpmml.evaluator.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:223)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:190)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:167)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:162)
at org.jpmml.spark.PMMLTransformer$2.apply(PMMLTransformer.java:128)
at org.jpmml.spark.PMMLTransformer$2.apply(PMMLTransformer.java:113)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalExpr2$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

jpmml-evaluator requires terminal classification TreeModel Nodes to have score attributes even if they have ScoreDistributions

From the PMML spec (versions 2.0 and up):

When a Node is selected as the final Node and if this Node has no score attribute, then the highest recordCount in the ScoreDistribution determines which value is selected as the predicted class. If a Node contains a sequence of ScoreDistribution elements such that there is more than one entry where recordCount_i is an upper bound, then the first entry is selected.

Note: If a Node has an attribute score then this attribute value overrides the computation of a predicted value from the ScoreDistribution.

The above suggests that it should be OK for a terminal Node in a TreeModel to omit the score attribute so long as it contains at least one ScoreDistribution element and, further, that including a score attribute may in fact weaken the contribution of the ScoreDistributions (though it is of course always possible to add a score attribute that accurately reflects the behavior specified in the above).

Note that, when using multipleModelMethod="average" for a series of TreeModels, jpmml-evaluator (as of 1.1.17) appears to completely ignore the score attributes (i.e. you can set them all to "foo"), instead relying entirely on the ScoreDistributions to make its prediction. It seems odd to be required to provide an attribute that isn't going to be used at all.

Impose soft limit on the maximum number of input fields

People are working with models that specify tens to hundreds of THOUSANDS input fields:

Evaluator evaluator = ...;
List<InputField> inputFields = evaluator.getInputFields();
System.out.println(inputFields.size()); // Prints 100'000

For example: http://stats.stackexchange.com/questions/152891/bad-performance-of-pmml-evaluator and http://stackoverflow.com/questions/42074491/evaluate-method-takes-long-time-pmml-models-using-jpmml

Understandably, such "structurally valid but conceptually/functionally invalid" models cannot be made to perform, not by the JPMML-Evaluator library, or any other PMML scoring engine.

By default, the JPMML-Evaluator library should simply refuse to deal with them:

if(inputFields.size() > 1000){
  throw new EvaluationException("The model specifies unreasonably large number of input fields, which is indicative of bad data science/engineering process. Please refactor the model");
}

However, the limit should be programmatically customizable. If people want to do stupid things, then they should have technical means to do so.

Maven pulling snapshot dependencies

Hi, I'm using this library as a dependency in a maven project.

            <dependency>
                <groupId>org.jpmml</groupId>
                <artifactId>pmml-evaluator</artifactId>
                <version>1.3.3</version>
            </dependency>

When I compile the project, I get

[WARNING] The POM for com.google.guava:guava:jar:19.0-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for org.apache.commons:commons-math3:jar:3.5-SNAPSHOT is missing, no dependency information available

It looks like this is due to the version 'constraint' of guava for example <version>[14.0, 19.0]</version>.

The documentation regarding version ranges tell that it is possible to take snapshots into account when resolving them. https://docs.oracle.com/middleware/1221/core/MAVEN/maven_version.htm#MAVEN8903

Why do you use version constraints and not just pick one version? And is there a way to get rid of these SNAPSHOT resolution?

Does JPMML support TransformationDictionary in PMML 4.3 for pre-processing

As I'm using NN, and providing API for prediction querying.

I am expecting normal input params like age=26,gender=m.

So I have to use some pre-processing work before input these into nn-evaluator.

Does JPMML support TransformationDictionary?

If yes, in which package? and how?
If no, any plan scheduled?

org.jpmml.evaluator.InvalidFeatureException: DataField

Hi,
I keep getting this exception for my LogisticRegression Model and LinearRegressionModel. This my xml. Please guide me as what is the problem.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_2">
    <DataDictionary numberOfFields="9">
        <DataField name="Attribute0" optype="continuous" dataType="double"/>
        <DataField name="Attribute1" optype="continuous" dataType="double"/>
        <DataField name="Attribute2" optype="continuous" dataType="double"/>
        <DataField name="Attribute3" optype="continuous" dataType="double"/>
        <DataField name="Attribute4" optype="continuous" dataType="double"/>
        <DataField name="Attribute5" optype="continuous" dataType="double"/>
        <DataField name="Attribute6" optype="continuous" dataType="double"/>
        <DataField name="Attribute7" optype="continuous" dataType="double"/>
        <DataField name="Attribute8" optype="continuous" dataType="double"/>
    </DataDictionary>
    <RegressionModel functionName="classification" algorithmName="logisticRegression" normalizationMethod="logit">
        <MiningSchema>
            <MiningField name="Attribute0"/>
            <MiningField name="Attribute1"/>
            <MiningField name="Attribute2"/>
            <MiningField name="Attribute3"/>
            <MiningField name="Attribute4"/>
            <MiningField name="Attribute5"/>
            <MiningField name="Attribute6"/>
            <MiningField name="Attribute7"/>
            <MiningField name="Attribute8" usageType="target"/>
        </MiningSchema>
        <RegressionTable intercept="0.0" targetCategory="1"/>
        <RegressionTable intercept="-8.397856251858588" targetCategory="0">
            <NumericPredictor name="Attribute0" coefficient="0.1230185712966992"/>
            <NumericPredictor name="Attribute1" coefficient="0.03514316177407176"/>
            <NumericPredictor name="Attribute2" coefficient="-0.013282878621280676"/>
            <NumericPredictor name="Attribute3" coefficient="6.631624570875322E-4"/>
            <NumericPredictor name="Attribute4" coefficient="-0.0011962985482762522"/>
            <NumericPredictor name="Attribute5" coefficient="0.08961636497438935"/>
            <NumericPredictor name="Attribute6" coefficient="0.943894934066085"/>
            <NumericPredictor name="Attribute7" coefficient="0.014842809237409734"/>
        </RegressionTable>
    </RegressionModel>
</PMML>

This is .xml file for LinearRegression:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_2">
    <DataDictionary numberOfFields="9">
        <DataField name="Attribute0" optype="continuous" dataType="double"/>
        <DataField name="Attribute1" optype="continuous" dataType="double"/>
        <DataField name="Attribute2" optype="continuous" dataType="double"/>
        <DataField name="Attribute3" optype="continuous" dataType="double"/>
        <DataField name="Attribute4" optype="continuous" dataType="double"/>
        <DataField name="Attribute5" optype="continuous" dataType="double"/>
        <DataField name="Attribute6" optype="continuous" dataType="double"/>
        <DataField name="Attribute7" optype="continuous" dataType="double"/>
        <DataField name="Attribute8" optype="continuous" dataType="double"/>
    </DataDictionary>
    <RegressionModel functionName="regression" algorithmName="LinearRegression" normalizationMethod="logit">
        <MiningSchema>
            <MiningField name="Attribute0"/>
            <MiningField name="Attribute1"/>
            <MiningField name="Attribute2"/>
            <MiningField name="Attribute3"/>
            <MiningField name="Attribute4"/>
            <MiningField name="Attribute5"/>
            <MiningField name="Attribute6"/>
            <MiningField name="Attribute7"/>
            <MiningField name="Attribute8" usageType="target"/>
        </MiningSchema>
        <RegressionTable intercept="-8.397856251858588" targetCategory="0">
            <NumericPredictor name="Attribute0" coefficient="0.1230185712966992"/>
            <NumericPredictor name="Attribute1" coefficient="0.03514316177407176"/>
            <NumericPredictor name="Attribute2" coefficient="-0.013282878621280676"/>
            <NumericPredictor name="Attribute3" coefficient="6.631624570875322E-4"/>
            <NumericPredictor name="Attribute4" coefficient="-0.0011962985482762522"/>
            <NumericPredictor name="Attribute5" coefficient="0.08961636497438935"/>
            <NumericPredictor name="Attribute6" coefficient="0.943894934066085"/>
            <NumericPredictor name="Attribute7" coefficient="0.014842809237409734"/>
        </RegressionTable>
    </RegressionModel>
</PMML>

Here is the stack trace
org.jpmml.evaluator.InvalidFeatureException: DataField
at org.jpmml.evaluator.RegressionModelEvaluator.evaluateClassification(RegressionModelEvaluator.java:119)
at org.jpmml.evaluator.RegressionModelEvaluator.evaluate(RegressionModelEvaluator.java:69)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:406)
at com.norkorm.blake.pmml.LogisticRegressionPMMLTest.makePredictions(LogisticRegressionPMMLTest.java:250)
at com.norkorm.blake.pmml.LogisticRegressionPMMLTest.testLogisticPMML(LogisticRegressionPMMLTest.java:217)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:131)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

org.dmg.pmml.MiningField.getOptype()Lorg/dmg/pmml/OpType

16/12/12 19:04:26 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, byd0158): java.lang.NoSuchMethodError: org.dmg.pmml.MiningField.getOptype()Lorg/dmg/pmml/OpType;
at org.jpmml.evaluator.ArgumentUtil.isOutlier(ArgumentUtil.java:153)
at org.jpmml.evaluator.ArgumentUtil.prepare(ArgumentUtil.java:69)
at org.jpmml.evaluator.ModelEvaluator.prepare(ModelEvaluator.java:110)
at org.jpmml.spark.PMMLTransformer$2.apply(PMMLTransformer.java:120)
at org.jpmml.spark.PMMLTransformer$2.apply(PMMLTransformer.java:110)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply119245_186$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Getting exception for clustering

Hi,
I am new to clustering pmml model execution using jpmml evaluator.
Getting exception when I am running below line.

java -cp target/example-1.2-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model D:\analytics\Test\pmml\AuditKMeans.pmml --input D:\analytics\Test\csv\AuditData_Test.csv--output D:\analytics\Test\output\Audit_KmeansRes.csv

Exception in thread "main" java.lang.IllegalArgumentException: Missing active field(s): [Age, Income, Deductions, Hours]
at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:217)
at org.jpmml.evaluator.Example.execute(E
Sample.zip
xample.java:60)
at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:127)

I have attached input and model files.
Sample.zip

Please help to sort out of the above exception.

WARNUNG: CSV evaluation failed: Mark invalid

hi

i got a very simple message

Apr 20, 2016 5:21:58 PM org.openscoring.client.CsvEvaluator run
WARNUNG: CSV evaluation failed: Mark invalid

form the evaluation:

"java -cp $jpmml/target/client-executable-1.2-SNAPSHOT.jar org.openscoring.client.CsvEvaluator --model http://localhost:8080/openscoring/model/460012_p_aktiv --input ~/test.csv --output ~/test_output.csv

any Idea???

has it to do with my missing (jpmml- xgboost)

Generate DMatrix file

mpg.dmatrix = genDMatrix(mpg_y, mpg_X, "xgboost.svm")

part?? I realised that i dont need xgboost.svm in order to get the pmml file

i simple used

xgboost(param=param,
data = data.matrix(training[,feature.names]),
label=training$aktiv_target,
nrounds=trounds_tmp,
base_score = base,
missing=NA
)

so I used the implicit transform of the data from xgboost

Performance issues with 'at org.jpmml.evaluator.ModelEvaluator.getInputFields(ModelEvaluator.java:207)' for multiple threads #2

With around ~900 input fields of type double in my model , most of the threads waste time (28% of the execution time ) in this method 'at org.jpmml.evaluator.ModelEvaluator.getInputFields(ModelEvaluator.java:207)'which is called everytime per thread execution.

Same method is called while creating the arguments per thread , I made a common inputField over there which solved that issues but again for evaluate , it is calling that method and affecting the performance.

Can we pass inputFields in evaluate method along with arguments , this could save 28% of the execution time ? This would require changing arguments everywhere.

Thread Dump:

    at org.jpmml.evaluator.ModelEvaluator.createInputFields(ModelEvaluator.java:397)
    at org.jpmml.evaluator.ModelEvaluator.getInputFields(ModelEvaluator.java:207)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.createSegmentHandler(MiningModelEvaluator.java:600)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:367)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:240)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:207)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:185)

JPMML Compilation Error

Description - Duplicate methods named spliterator with the parameters () and () are inherited from the types Collection and IterableSparseArrayUtil.java /pmml- evaluator/src/
main/java/org/jpmml/evaluator

While compiling the project eclipse, I am getting above error. Have downloaded latest code yesterday.

Generalized Regression Model: Output not a valid pmf (probability mass function)

Hello,

My issue pertains to GeneralRegressionModelEvaluator.java and specifically to Generalized Linear Model.

If we take into consideration a two-class (here, +1 and -1) classification problem, then the generalized linear model would estimate the Pr(class = class1) and Pr(class = class2) for any data point given it's feature vector. This is done by modeling the distributions with a logit function. Since we're estimating a pmf, we will have Pr(class = class1) + Pr(class = class2) = 1.

If we look at the loop starting at line 337, it is basically supposed to do the same thing -- iterate our the different classes/categories and compute its probability. Everything goes well for class1, but when the code does the computation for class2 (which is the last category), it assigns value = 0 in line 417 and passes that through the logit function. This will always give the probability of last category to be 0.5, no matter how many categories are there.

For a two-category problem, say the probability we compute in the first iteration of the for loop starting at line 337 for category 1 is value1, then the probability of the other class should be simply (1 - value1). This is not achieved by the code. In fact it would always assign the probability for the last category to be equal to 0.5.

If I'm right, a quick fix could be that for the last category, the probability should be just 1 - sum(all the rest probabilities).

Thanks
Akshay

installation

I do realise this might sound like a stupid question, but I am not used to java nor mvn. I've spent already more than an hour trying to install the evaluator. I've first tried mvn get with the central repository, then git cloning and mvn build, both haven't got me nowhere. Please advise, highly appreciated.

  1. approach:

    mvn org.apache.maven.plugins:maven-dependency-plugin:2.8:get -Dartifact=org.jpmml:pmml-evaluator:1.2.5:jar -DoutputDirectory=.
    I've tried a lot of variants of this command searching around and looking over tutorials. But I am still not sure, where to go from here.
    When I try
    java -jar target/pmml-evaluator-1.2.5-sources.jar
    it tells me about a missing manifest file. I've tried including this in the pom file provided in the central repository including the option -DpomFile=pom.xml, but it's complaining about the execution ids.

  2. approach:

    git clone https://github.com/jpmml/jpmml-evaluator
    cd jpmml-evaluator/
    mvn build pom.xml

[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.jpmml:pmml-evaluator:jar:1.2-SNAPSHOT
[WARNING] 'parent.relativePath' of POM org.jpmml:jpmml-evaluator:1.2-SNAPSHOT (/Users/<>/target/jpmml-evaluator/pom.xml) points at org.jpmml:pmml-evaluator instead of org.sonatype.oss:oss-parent, please verify your project structure @ org.jpmml:jpmml-evaluator:1.2-SNAPSHOT, /Users/benjamin/target/jpmml-evaluator/pom.xml, line 5, column 10
...
[much more of this]
...
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] JPMML-Evaluator
[INFO] JPMML evaluator
[INFO] JPMML evaluator example
[INFO] JPMML KNIME integration tests
[INFO] JPMML RapidMiner integration tests
[INFO] JPMML R/Rattle integration tests
[INFO] JPMML evaluator code coverage
[INFO] JPMML extension
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building JPMML-Evaluator 1.2-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] JPMML-Evaluator .................................... FAILURE [ 0.388 s]
[INFO] JPMML evaluator .................................... SKIPPED
[INFO] JPMML evaluator example ............................ SKIPPED
[INFO] JPMML KNIME integration tests ...................... SKIPPED
[INFO] JPMML RapidMiner integration tests ................. SKIPPED
[INFO] JPMML R/Rattle integration tests ................... SKIPPED
[INFO] JPMML evaluator code coverage ...................... SKIPPED
[INFO] JPMML extension .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.158 s
[INFO] Finished at: 2015-10-05T16:24:08+01:00
[INFO] Final Memory: 5M/65M
[INFO] ------------------------------------------------------------------------
[ERROR] Unknown lifecycle phase "pom.xml". You must specify a valid lifecycle phase or a goal in the format : or :[:]:. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
...

InvalidFeatureException: MiningField

Hi Villu,
I have also attached my file

)

I am generating PMML for NeuralNetwork but when i use the evaluator it keeps throwing this exception.

org.jpmml.evaluator.InvalidFeatureException: MiningField
	at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:72)
	at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:61)
	at org.jpmml.evaluator.ModelEvaluator$4.load(ModelEvaluator.java:688)
	at org.jpmml.evaluator.ModelEvaluator$4.load(ModelEvaluator.java:684)
	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3628)
	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2336)
	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2295)
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2208)
	at com.google.common.cache.LocalCache.get(LocalCache.java:4053)
	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4057)
	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4986)
	at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:51)
	at org.jpmml.evaluator.ModelEvaluator.<init>(ModelEvaluator.java:128)
	at org.jpmml.evaluator.neural_network.NeuralNetworkEvaluator.<init>(NeuralNetworkEvaluator.java:90)
	at org.jpmml.evaluator.neural_network.NeuralNetworkEvaluator.<init>(NeuralNetworkEvaluator.java:86)
	at com.baesystems.ai.analytics.smile.pmml.NeuralNetworkPMMLTest.createEvaluator(NeuralNetworkPMMLTest.java:130)
	at com.baesystems.ai.analytics.smile.pmml.NeuralNetworkPMMLTest.testLeastMeanSqaures(NeuralNetworkPMMLTest.java:123)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

jpmml is not a member of package org

I'm using Scala and SBT. In my build.sbt, I added this line:

libraryDependencies += "org.jpmml" % "jpmml-evaluator" % "1.3.3"

But I still got error jpmml is not a member of package org when importing.

For more information: Scala version is 2.11.8

Evaluate error

Sorry to trouble you again~
The jpmml works well when I use LogisticRegression, but fails with other models like randomforest
The model comes from sklearn, and I use your awesome tool sklearn2pmml

the error is

Exception in thread "main" org.jpmml.evaluator.EvaluationException
    at org.jpmml.evaluator.CategoricalValue.compareToString(CategoricalValue.java:39)
    at org.jpmml.evaluator.FieldValue.compareTo(FieldValue.java:139)
    at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:131)
    at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:63)
    at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:51)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.evaluateNode(TreeModelEvaluator.java:201)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.handleTrue(TreeModelEvaluator.java:218)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.evaluateTree(TreeModelEvaluator.java:162)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.evaluateClassification(TreeModelEvaluator.java:137)
    at org.jpmml.evaluator.tree.TreeModelEvaluator.evaluate(TreeModelEvaluator.java:106)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:407)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:240)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:207)
    at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:185)
    at com.ctrip.hotelbi.jpmml.Score.gettingProbability(Score.java:32)
    at com.ctrip.hotelbi.jpmml.Score.gettingProbability(Score.java:53)
    at com.ctrip.hotelbi.jpmml.PMMLTest.main(PMMLTest.java:41)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

The codes are

public class Process {
    private String[] data;
    private Evaluator evaluator;

    public Process() {
    }

    public Process(String[] data, Evaluator evaluator) {
        this.data = data;
        this.evaluator = evaluator;
    }

    public Map<FieldName, FieldValue> testData() {
        /**
         * Prepare test data
         * @return input data for prediction
         */
        Map<FieldName, FieldValue> arguments = new LinkedHashMap<>();
        List<InputField> inputs = this.evaluator.getActiveFields();
        for (InputField input : inputs) {
            FieldName activeName = input.getName();
            int i = inputs.indexOf(input);
            FieldValue activeValue = null;
            try {
                if (input.getDataType().equals(DataType.DOUBLE)) {
                    activeValue = input.prepare(Double.parseDouble(this.data[i]));
                }else activeValue = FieldValueUtil.create( this.data[i] );

            }catch (Exception e){
                activeValue = FieldValueUtil.create(0.0);
                e.printStackTrace();
            }
            arguments.put(activeName, activeValue);


        }

        return arguments;
    }
}

public class Score extends Process{
    private String[] data;
    private Evaluator evaluator;

    public Score(String[] data, Evaluator evaluator) {
        super(data, evaluator);
    }

    public ArrayList<?> gettingProbability(Evaluator evaluator){
        /**
         Predict all target label probabilities
         @param evaluator pmml model
         @return probability score of each label
         */
        Map<FieldName, FieldValue> testData = super.testData();

        ArrayList<Object> score = new ArrayList();

        System.out.println(testData.size());
        Map<FieldName,?> finalResults = evaluator.evaluate(testData);


        for(FieldName t : finalResults.keySet()){

            if (finalResults.get(t) instanceof Double) {
                score.add((Double) finalResults.get(t));
            }else{
                score.add(finalResults.get(t));
            }
        }
        return score;
    }

    public Double gettingProbability(Evaluator evaluator,int targetLabelIndex){
        /**
         Predict target label probability
         @param evaluator pmml model
         @param targetLabelIndex the index of target label that you want to predict
         @return probability score of each label
         */
        ArrayList<?> scoreArray = this.gettingProbability(evaluator);
        Double targetScore = (Double) scoreArray.get(targetLabelIndex);
        return targetScore;

    }
}

public class PMMLTest {
    public static void main(String[] args) throws IOException, JAXBException, SAXException {
        //Loading data
        CSVReader reader = new CSVReader(new FileReader("d:\\Users\\shuangyangwang\\Desktop\\JPMML\\Iris1.csv"));
        List<String[]> data = reader.readAll();
        data.remove(0);
        reader.close();

        //Loading model

        InputStream is = new FileInputStream("d:\\Users\\shuangyangwang\\Desktop\\Test\\ExtraTreesClassifier.pmml");
        PMML model = PMMLUtil.unmarshal(is);
        is.close();

        ModelEvaluatorFactory mef = ModelEvaluatorFactory.newInstance();
        ModelEvaluator<?> modelEvaluator = mef.newModelEvaluator(model);
        Evaluator evaluator = (Evaluator) modelEvaluator;
        evaluator.verify();

        //Predicting probability
        List<ArrayList<?>> listArray = new ArrayList<>();
        for (String[] s : data) {
//            PreprocessData ppd = new PreprocessData(s, evaluator);
//            Map<FieldName, FieldValue> testData = ppd.testData();
            Score scoreE = new Score(s, evaluator);
            //ArrayList<Double> result = (ArrayList<Double>) scoreE.gettingProbability(evaluator);
            Double score = scoreE.gettingProbability( evaluator ,1);
            System.out.println(score);
            //listArray.add(result);
        }

    }
}

I really don't know what is wrong with that, please give me some suggestions
Thank you very much

Evaluator#getActiveFields() should include a synthetic InputField if the model needs to calculate residual values

This issue is based on the following JPMML mailing list thread: https://groups.google.com/forum/#!topic/jpmml/1IsR9zTm4KY

Technically, it is possible to detect if the model contains a residual-type output field, and if so, add an extra value to the argument data record:

List<OutputField> outputFields = evaluator.getOutputFields();
for(OutputField outputField : outputFields){
  if((ResultFeature.RESIDUAL).equals(outputField.getResultFeature())){
    TargetField targetField = Iterables.getOnlyElement(evaluator.getTargetFields()); // Get the sole target field
    arguments.put(targetField.getName(), userArguments.get(targetField.getName()));
  }
}

However, this assumes great familiarity with the PMML specification and the JPMML-Evaluator way of doing things, which is an unreasonable expectation (also, the above code might not work if the residual value is calculated at some deeper model nesting level).

Extended support for the `clusterAffinity` output feature

Hi,

following code (using 1.2.5 release):

final Map<FieldName, ?> results = kMeansModel.evaluate(params);
for (final Entry<FieldName, ?> resultEntry : results.entrySet())
{
    System.out.printf("%s = %s%n", resultEntry.getKey(), resultEntry.getValue());
}

returns this:

null = ClusterAffinityDistribution{result=5, distance_entries=[1=46.498128117308376, 2=47.12002804402491, 3=49.17335819210169, 4=43.117652229258695, 5=39.95722874558617, 6=45.533022467040844, 7=46.711182656888525], entityId=5}
predictedValue = 5
clusterAffinity_1 = 39.95722874558617
clusterAffinity_2 = 39.95722874558617
clusterAffinity_3 = 39.95722874558617
clusterAffinity_4 = 39.95722874558617
clusterAffinity_5 = 39.95722874558617
clusterAffinity_6 = 39.95722874558617
clusterAffinity_7 = 39.95722874558617

shouldn't the clusterAffinity_? have the same values as in the first line?

Regards,
Juraj.

Clustering pmml model execution

Hi,
I am new to PMML execution using JPMML Evaluator.
When i tried to execute clustering pmml model(KNIME) for Iris data from DMG site got the exception.
Exception in thread "main" java.lang.NullPointerException
at org.jpmml.evaluator.BatchUtil.formatRecords(BatchUtil.java:190)
at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:295)
at org.jpmml.evaluator.Example.execute(Example.java:60)
at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:127)

I have used the below line for pmml execution in my local command prompt.
java -cp target/example-1.2-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model model.pmml --input input.tsv --output output.tsv

Please help me.

org.jpmml.evaluator.EvaluationException

I am working on RulesInduction model and JPMML keeps complaining about this file whose contents i have copied in this issue. I am not able to figure out what is the problem. Please help me with it.

Exception:

org.jpmml.evaluator.EvaluationException
    at org.jpmml.evaluator.CategoricalValue.compareToString(CategoricalValue.java:39)
    at org.jpmml.evaluator.FieldValue.compareTo(FieldValue.java:143)
    at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:131)
    at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:63)
    at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:51)
    at org.jpmml.evaluator.PredicateUtil.evaluateCompoundPredicateInternal(PredicateUtil.java:200)
    at org.jpmml.evaluator.PredicateUtil.evaluateCompoundPredicate(PredicateUtil.java:168)
    at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:71)
    at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:51)
    at org.jpmml.evaluator.RuleSetModelEvaluator.evaluateRule(RuleSetModelEvaluator.java:190)
    at org.jpmml.evaluator.RuleSetModelEvaluator.evaluateRules(RuleSetModelEvaluator.java:216)
    at org.jpmml.evaluator.RuleSetModelEvaluator.evaluateClassification(RuleSetModelEvaluator.java:109)
    at org.jpmml.evaluator.RuleSetModelEvaluator.evaluate(RuleSetModelEvaluator.java:84)
    at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:406)
    at com.norkom.blake.pmml.IrepRuleTest.makePredictions(IrepRuleTest.java:177)
    at com.norkom.blake.pmml.IrepRuleTest.testRules(IrepRuleTest.java:106)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at junit.framework.TestCase.runTest(TestCase.java:176)
    at junit.framework.TestCase.runBare(TestCase.java:141)
    at junit.framework.TestResult$1.protect(TestResult.java:122)
    at junit.framework.TestResult.runProtected(TestResult.java:142)
    at junit.framework.TestResult.run(TestResult.java:125)
    at junit.framework.TestCase.run(TestCase.java:129)
    at junit.framework.TestSuite.runTest(TestSuite.java:255)
    at junit.framework.TestSuite.run(TestSuite.java:250)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

PMML file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_2">
    <DataDictionary numberOfFields="13">
        <DataField name="TransactionType" optype="categorical" dataType="string">
            <Value value="ATM"/>
            <Value value="Point of Sale"/>
            <Value value="Point of Sale BGC"/>
            <Value value="Term Deposit Post Office"/>
        </DataField>
        <DataField name="Amount" optype="continuous" dataType="double"/>
        <DataField name="CreditOrDebit" optype="categorical" dataType="string">
            <Value value="Credit"/>
            <Value value="Debit"/>
        </DataField>
        <DataField name="Currency" optype="categorical" dataType="string">
            <Value value="DOLLAR"/>
            <Value value="EUR"/>
        </DataField>
        <DataField name="POSAmount3Days.acc.day.present" optype="continuous" dataType="double"/>
        <DataField name="POSAmount3Days.acc.day.total" optype="continuous" dataType="double"/>
        <DataField name="POSAmount4hr.acc.hour4" optype="continuous" dataType="double"/>
        <DataField name="POSAmount60Mins.acc.minute60" optype="continuous" dataType="double"/>
        <DataField name="POSCount3Days.cnt.day.present" optype="continuous" dataType="double"/>
        <DataField name="POSCount3Days.cnt.day.total" optype="continuous" dataType="double"/>
        <DataField name="POSCount4hr.cnt.hour4" optype="continuous" dataType="double"/>
        <DataField name="POSCount60Mins.cnt.minute60" optype="continuous" dataType="double"/>
        <DataField name="Fraud" optype="categorical" dataType="double">
            <Value value="0.0"/>
            <Value value="1.0"/>
        </DataField>
    </DataDictionary>
    <RuleSetModel modelName="RulesSetModel" functionName="classification">
        <MiningSchema>
            <MiningField name="TransactionType"/>
            <MiningField name="Amount"/>
            <MiningField name="CreditOrDebit"/>
            <MiningField name="Currency"/>
            <MiningField name="POSAmount3Days.acc.day.present"/>
            <MiningField name="POSAmount3Days.acc.day.total"/>
            <MiningField name="POSAmount4hr.acc.hour4"/>
            <MiningField name="POSAmount60Mins.acc.minute60"/>
            <MiningField name="POSCount3Days.cnt.day.present"/>
            <MiningField name="POSCount3Days.cnt.day.total"/>
            <MiningField name="POSCount4hr.cnt.hour4"/>
            <MiningField name="POSCount60Mins.cnt.minute60"/>
            <MiningField name="Fraud" usageType="target"/>
        </MiningSchema>
        <RuleSet recordCount="5152.0" nbCorrect="5033.0" defaultScore="0" defaultConfidence="0.0">
            <RuleSelectionMethod criterion="firstHit"/>
            <SimpleRule id="Rule0" score="1.0" recordCount="95.0" nbCorrect="89.0" confidence="0.9325842696629213">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="104.1"/>
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="182.63"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="6.0"/>
                <ScoreDistribution value="1.0" recordCount="89.0"/>
            </SimpleRule>
            <SimpleRule id="Rule1" score="1.0" recordCount="8.0" nbCorrect="8.0" confidence="1.0">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="80.0"/>
                    <SimplePredicate field="Amount" operator="greaterOrEqual" value="104.1"/>
                    <SimplePredicate field="Amount" operator="lessOrEqual" value="104.16"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="0.0"/>
                <ScoreDistribution value="1.0" recordCount="8.0"/>
            </SimpleRule>
            <SimpleRule id="Rule2" score="1.0" recordCount="16.0" nbCorrect="13.0" confidence="0.7692307692307693">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount60Mins.acc.minute60" operator="greaterOrEqual" value="37.64"/>
                    <SimplePredicate field="POSAmount3Days.acc.day.present" operator="greaterOrEqual" value="148.57"/>
                    <SimplePredicate field="TransactionType" operator="greaterOrEqual" value="13.0"/>
                    <SimplePredicate field="POSAmount3Days.acc.day.present" operator="greaterOrEqual" value="261.19"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="3.0"/>
                <ScoreDistribution value="1.0" recordCount="13.0"/>
            </SimpleRule>
            <SimpleRule id="Rule3" score="1.0" recordCount="8.0" nbCorrect="6.0" confidence="0.6666666666666666">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="90.95"/>
                    <SimplePredicate field="Amount" operator="greaterOrEqual" value="147.57"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="2.0"/>
                <ScoreDistribution value="1.0" recordCount="6.0"/>
            </SimpleRule>
            <SimpleRule id="Rule4" score="1.0" recordCount="4.0" nbCorrect="3.0" confidence="0.6666666666666666">
                <CompoundPredicate booleanOperator="and">
                    <SimplePredicate field="POSAmount4hr.acc.hour4" operator="greaterOrEqual" value="90.95"/>
                    <SimplePredicate field="POSAmount60Mins.acc.minute60" operator="lessOrEqual" value="90.95"/>
                </CompoundPredicate>
                <ScoreDistribution value="0.0" recordCount="1.0"/>
                <ScoreDistribution value="1.0" recordCount="3.0"/>
            </SimpleRule>
        </RuleSet>
    </RuleSetModel>
</PMML>

Facing Typed check exception org.jpmml.evaluator.TypeCheckException: Expected FLOAT, but got DOUBLE (3.4)

While trying to read the pmml file created using sklearn2pmml using jpmml evaluator for prediction facing this error:

org.jpmml.evaluator.TypeCheckException: Expected FLOAT, but got DOUBLE (3.4)
	at org.jpmml.evaluator.TypeUtil.toFloat(TypeUtil.java:419)
	at org.jpmml.evaluator.TypeUtil.cast(TypeUtil.java:333)

I am using the version 1.3.5 of the evaluator. PFB the mapper used while creating the pmml file no transformation was specified

iris_pipeline = PMMLPipeline([
  ("mapper", DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), Imputer()])
  ])),
  ("classifier", RandomForestClassifier(n_estimators = 100))
])

MiningModelEvaluator multipleModelMethod with weightedAverage not support probability OutputField feature

my scenario is this , I train an random forest pmml file
I use multipleModelMethod=weightedAverage , and want to output label's multi class probability
the pmml file like this

2015-10-17 7 54 13

then it throws an Exception

org.jpmml.evaluator.TypeCheckException: Expected org.jpmml.evaluator.HasProbability, but got org.jpmml.evaluator.ClassificationMap ({0=0.6526508348685987, 1=0.3473491651314011})
at org.jpmml.evaluator.OutputUtil.asResultFeature(OutputUtil.java:848)
at org.jpmml.evaluator.OutputUtil.getProbability(OutputUtil.java:478)
at org.jpmml.evaluator.OutputUtil.evaluate(OutputUtil.java:182)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:117)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:85)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:79)
at com.alipay.mymdp.model.component.impl.pmml.engine.PmmlComponentEngine.execute(PmmlComponentEngine.java:49)
at com.alipay.mymdp.model.component.impl.pmml.engine.PmmlComponentEngine.executePmmlComponentEngine(PmmlComponentEngine.java:35)
at com.alipay.mymdp.model.component.impl.pmml.engine.TestPmmlComponentEngine.testRF2PmmlCom

MiningModelEvaluator multipleModelMethod with weightedAverage not support probability OutputField feature

my scenario is this , I train an random forest pmml file
I use multipleModelMethod=weightedAverage , and want to output label's multi class probability
the pmml file like this
























then it throws an Exception

org.jpmml.evaluator.TypeCheckException: Expected org.jpmml.evaluator.HasProbability, but got org.jpmml.evaluator.ClassificationMap ({0=0.6526508348685987, 1=0.3473491651314011})
at org.jpmml.evaluator.OutputUtil.asResultFeature(OutputUtil.java:848)
at org.jpmml.evaluator.OutputUtil.getProbability(OutputUtil.java:478)
at org.jpmml.evaluator.OutputUtil.evaluate(OutputUtil.java:182)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:117)
at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:85)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:79)
at com.alipay.mymdp.model.component.impl.pmml.engine.PmmlComponentEngine.execute(PmmlComponentEngine.java:49)
at com.alipay.mymdp.model.component.impl.pmml.engine.PmmlComponentEngine.executePmmlComponentEngine(PmmlComponentEngine.java:35)
at com.alipay.mymdp.model.component.impl.pmml.engine.TestPmmlComponentEngine.testRF2PmmlCom

Add a changelog

Hello,

could you please add (and maintain) a changelog?

Cheers,
Thomas

Openscoring not supporting ensemble.GradientBoostingClassifier

Hello Vilu,

I've trained a ensemble.GradientBoostingClassifier classifier and deployed it to openscoring but I keep getting 400 after the requests.

Using the same pipeline to generate the pmml (using sklearn2pmml) and requesting with the same input works well on simplier models (like linear_model.LogisticRegression()).

Is GradientBoostingClassifier supported by the sklearn2pmml but not by openscoring?

Thanks!

LoadingCache maybe lead to OOM,Does the jpmml support scene of Model Iteration?

Hi,I have got a problem,my scene is Model iteration by every day,but the framework of jpmml use LoadingCache as cache, that has a characteristics of delaying to delete.so jpmml leads to jvm memory is very big, even OOM.
The solution : At the same time using weakKeys() and weakValues():

private static LoadingCache<MiningModel, BiMap<String, Segment>> entityCache = CacheUtil.buildLoadingCache(new CacheLoader<MiningModel, BiMap<String, Segment>>(){

		@Override
		public BiMap<String, Segment> load(MiningModel miningModel){
			Segmentation segmentation = miningModel.getSegmentation();

			return EntityUtil.buildBiMap(segmentation.getSegments());
		}
	});

Missing Value Penalty in Tree Model

Looking at the following from http://dmg.org/pmml/v4-3/TreeModel.html#xsdType_MISSING-VALUE-STRATEGY

missingValuePenalty:

This optional attribute of TreeModel allows computed confidences to be reduced by a specified factor each time certain kinds of missing value handling are invoked during the scoring of a case. For each Node where either surrogate rules or the defaultChild strategy had to be used to select a child, the final confidences are multiplied by this factor. Note that this is based on the number of Nodes, not on the overall number of missing values that were encountered (with operator surrogate, multiple missing values can be encountered within a single Node). For example, if two Nodes with missing values were encountered to get to the final prediction, confidence is multiplied by the two missingValuePenalty values.

It sounds like the value of missingLevels in https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/tree/TreeModelEvaluator.java should be the number of nodes that evaluate to Unknown, and nodes that rescue missing using surrogate should not count

that seems to be contrary to the logic here https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-evaluator/src/main/java/org/jpmml/evaluator/tree/TreeModelEvaluator.java#L193-L195

am I reading the code wrong, or misinterpreting PMML?

RuleSet Model doesn't support defaultScore attribute

According to http://www.dmg.org/v4-2-1/RuleSet.html#RuleSet it should be possible to define a default score for a rule set which is returned when none of the rules fire. However, OpenScoring returns a server error in this scenario.

My pmml model:

<PMML xmlns="http://www.dmg.org/PMML-4_2" version="4.2">

  <DataDictionary numberOfFields="1">
    <DataField name="$Result" displayName="$Result" optype="categorical" dataType="string"/>   
  </DataDictionary>

  <RuleSetModel modelName="Trivial" functionName="classification" algorithmName="RuleSet">

    <MiningSchema>
      <MiningField name="$Result" usageType="target"/>
    </MiningSchema>

    <LocalTransformations>
      <DerivedField name="foobar" displayName="foobar" optype="categorical" dataType="boolean">
        <Constant>true</Constant>
      </DerivedField>
    </LocalTransformations>

    <RuleSet defaultScore="True" defaultConfidence="0.0">
      <RuleSelectionMethod criterion="firstHit"/>

      <SimpleRule id="RULE1" score="Something">
        <SimplePredicate field="foobar" operator="equal" value="false"/>
      </SimpleRule>
    </RuleSet>

  </RuleSetModel>
</PMML>

JSON request:

{
    "id": "example-001", 
    "arguments": {}
}

The result:

$ curl -X POST --data-binary @trivial-example-request.json -H "Content-type: application/json" http://localhost:8080/openscoring/model/trivial
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 </title>
</head>
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /openscoring/model/trivial. Reason:
<pre>    Internal Server Error</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.