jpmml / jpmml-evaluator-python Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 9.0 56.2 MB

PMML evaluator library for Python

License: GNU Affero General Public License v3.0

Python 76.38% Java 23.62%

jpmml-evaluator-python's Introduction

Java API for producing and scoring models in Predictive Model Markup Language (PMML).

IMPORTANT

This is a legacy codebase.

Starting from March 2014, this project has been superseded by [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) projects.

Features

Class model

Full support for PMML 3.0, 3.1, 3.2, 4.0 and 4.1 schemas:
- Class hierarchy.
- Schema version annotations.
Fluent API:
- Value constructors.
SAX Locator information
[Visitor pattern] (http://en.wikipedia.org/wiki/Visitor_pattern):
- Validation agents.
- Optimization and transformation agents.

Evaluation engine

Full support for [DataDictionary] (http://www.dmg.org/v4-1/DataDictionary.html) and [MiningSchema] (http://www.dmg.org/v4-1/MiningSchema.html) elements:
- Complete data type system.
- Complete operational type system. For example, continuous integers, categorical integers and ordinal integers are handled differently in equality check and comparison operations.
- Detection and treatment of outlier, missing and invalid values.
Full support for [transformations] (http://www.dmg.org/v4-1/Transformations.html) and [functions] (http://www.dmg.org/v4-1/Functions.html):
- Built-in functions.
- User defined functions (PMML, Java).
Full support for [Targets] (http://www.dmg.org/v4-1/Targets.html) and [Output] (http://www.dmg.org/v4-1/Output.html) elements.
Fully supported model elements:
- [Association rules] (http://www.dmg.org/v4-1/AssociationRules.html)
- [Cluster model] (http://www.dmg.org/v4-1/ClusteringModel.html)
- [General regression] (http://www.dmg.org/v4-1/GeneralRegression.html)
- [Naive Bayes] (http://www.dmg.org/v4-1/NaiveBayes.html)
- [k-Nearest neighbors] (http://www.dmg.org/v4-1/KNN.html)
- [Neural network] (http://www.dmg.org/v4-1/NeuralNetwork.html)
- [Regression] (http://www.dmg.org/v4-1/Regression.html)
- [Rule set] (http://www.dmg.org/v4-1/RuleSet.html)
- [Scorecard] (http://www.dmg.org/v4-1/Scorecard.html)
- [Support Vector Machine] (http://www.dmg.org/v4-1/SupportVectorMachine.html)
- [Tree model] (http://www.dmg.org/v4-1/TreeModel.html)
- [Ensemble model] (http://www.dmg.org/v4-1/MultipleModels.html)
Fully interoperable with popular open source software:
- [R] (http://www.r-project.org/) and [Rattle] (http://rattle.togaware.com/)
- [KNIME] (http://www.knime.org/)
- [RapidMiner] (http://rapid-i.com/content/view/181/190/)

Installation

JPMML library JAR files (together with accompanying Java source and Javadocs JAR files) are released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.0.22 (17 February, 2014).

Class model

<!-- Class model classes -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-model</artifactId>
	<version>${jpmml.version}</version>
</dependency>
<!-- Class model annotations -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-schema</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Evaluation engine

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Usage

Class model

The class model consists of two types of classes. There is a small number of manually crafted classes that are used for structuring the class hierarchy. They are permanently stored in the Java sources directory /pmml-model/src/main/java. Additionally, there is a much greater number of automatically generated classes that represent actual PMML elements. They can be found in the generated Java sources directory /pmml-model/target/generated-sources/xjc after a successful build operation.

All class model classes descend from class org.dmg.pmml.PMMLObject. Additional class hierarchy levels, if any, represent common behaviour and/or features. For example, all model classes descend from class org.dmg.pmml.Model.

There is not much documentation accompanying class model classes. The application developer should consult with the [PMML specification] (http://www.dmg.org/v4-1/GeneralStructure.html) about individual PMML elements and attributes.

Example applications

Copying an existing org.dmg.pmml.PMML instance from one file to another file: [CopyExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/CopyExample.java)
Building a new org.dmg.pmml.PMML instance for the "golfing" decision tree model: [TreeModelBuilderExample.java] (https://github.com/jpmml/jpmml-example/blob/master/src/main/java/org/jpmml/example/TreeModelBuilderExample.java)

Evaluation engine

A model evaluator class can be instantiated directly when the contents of the PMML document is known:

PMML pmml = ...;

ModelEvaluator<TreeModel> modelEvaluator = new TreeModelEvaluator(pmml);

Otherwise, a PMML manager class should be instantiated first, which will inspect the contents of the PMML document and instantiate the right model evaluator class later:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);
 
ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

Model evaluator classes follow functional programming principles. Model evaluator instances are cheap enough to be created and discarded as needed (ie. not worth the pooling effort).

It is advisable for application code to work against the org.jpmml.evaluator.Evaluator interface:

Evaluator evaluator = (Evaluator)modelEvaluator;

An evaluator instance can be queried for the definition of active (ie. independent), predicted (ie. primary dependent) and output (ie. secondary dependent) fields:

List<FieldName> activeFields = evaluator.getActiveFields();
List<FieldName> predictedFields = evaluator.getPredictedFields();
List<FieldName> outputFields = evaluator.getOutputFields();

The PMML scoring operation must be invoked with valid arguments. Otherwise, the behaviour of the model evaluator class is unspecified.

The preparation of field values:

Map<FieldName, FieldValue> arguments = new LinkedHashMap<FieldName, FieldValue>();

List<FieldName> activeFields = evaluator.getActiveFields();
for(FieldName activeField : activeFields){
	// The raw (ie. user-supplied) value could be any Java primitive value
	Object rawValue = ...;

	// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue activeValue = evaluator.prepare(activeField, rawValue);

	arguments.put(activeField, activeValue);
}

The scoring:

Map<FieldName, ?> results = evaluator.evaluate(arguments);

Typically, a model has exactly one predicted field, which is called the target field:

FieldName targetName = evaluator.getTargetField();
Object targetValue = results.get(targetName);

The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	Object primitiveValue = computable.getResult();
}

The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;
	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
	Entity winner = entities.get(hasEntityId.getEntityId());

	// Test for "probability" result feature
	if(targetValue instanceof HasProbability){
		HasProbability hasProbability = (HasProbability)targetValue;
		Double winnerProbability = hasProbability.getProbability(winner.getId());
	}
}

Example applications

Evaluating a PMML file interactively: [EvaluationExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/EvaluationExample.java)
Evaluating a PMML file non-interactively with CSV file input: [CsvEvaluationExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/CsvEvaluationExample.java)

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-evaluator-python's People

Contributors

Stargazers

Watchers

Forkers

hydrawalker mdruger ainour reginababo wutonghua wushicanasl rory602 onlookerliu pwittchen

jpmml-evaluator-python's Issues

Function "lessOrEqual" cannot accept missing value at position 0

The following error occurs when making a prediction using jpmml-evaluator in Python.

It seems that the prompt cannot handle missing value, but it is indeed predictable in my pipeline

My raw data does contain missing values, and I processed it using ExpressionTransformer. Do I need to add additional processing so that the missing values can be supported by evaluator?

Py4JJavaError: An error occurred while calling z:org.jpmml.evaluator.python.PythonUtil.evaluateAll.
: org.jpmml.evaluator.MissingArgumentException: Function "lessOrEqual" cannot accept missing value at position 0
	at org.jpmml.evaluator.functions.AbstractFunction.getRequiredArgument(AbstractFunction.java:101)
	at org.jpmml.evaluator.functions.AbstractFunction.getArgument(AbstractFunction.java:76)
	at org.jpmml.evaluator.functions.BinaryFunction.evaluate(BinaryFunction.java:43)
	at org.jpmml.evaluator.ExpressionUtil.evaluateFunction(ExpressionUtil.java:463)
	at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:426)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
	at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:345)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
	at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
	at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
	at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
	at org.jpmml.evaluator.regression.RegressionModelEvaluator.evaluateRegressionTable(RegressionModelEvaluator.java:263)
	at org.jpmml.evaluator.regression.RegressionModelEvaluator.evaluateClassification(RegressionModelEvaluator.java:158)
	at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:446)
	at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:300)
	at org.jpmml.evaluator.python.PythonUtil.evaluate(PythonUtil.java:92)
	at org.jpmml.evaluator.python.PythonUtil.evaluateAll(PythonUtil.java:58)
	at org.jpmml.evaluator.python.PythonUtil.evaluateAll(PythonUtil.java:48)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

JavaError                                 Traceback (most recent call last)
<ipython-input-383-f4ba64f4738e> in <module>
----> 1 evaluator.evaluateAll(x_original.head())

~/.local/lib/python3.7/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df, nan_as_missing)
    129                         result_records = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluateAll", self.javaEvaluator, argument_records)
    130                 except Exception as e:
--> 131                         raise self.backend.toJavaError(e)
    132                 result_records = self.backend.loads(result_records)
    133                 return DataFrame.from_records(result_records)

JavaError: org.jpmml.evaluator.MissingArgumentException: Function "lessOrEqual" cannot accept missing value at position 0

My code for handling characteristics ExpressionTransformer looks like the following,

(['d1_under15s_called_opst_phone_cnt_rt'],ExpressionTransformer("-0.0841221664964877 if X[0] <= 0.0 else 0.19635799403181606 if X[0] <= 0.333 else 0.21592881028435115 if X[0] <= 0.5 else 0.33643433273608353 if X[0] <= 0.667 else 0.3116312513399116 if X[0] <= 1.0 else -0.25089876669559863"))

Why can't it handle missing values?

py4j.protocol.Py4JNetworkError: Answer from Java side is empty

from jpmml_evaluator import make_evaluator
from jpmml_evaluator.py4j import launch_gateway, Py4JBackend
gateway = launch_gateway()
backend = Py4JBackend(gateway)
make_evaluator(backend, "GBDT+LR.pmml", reporting = True).verify()

ERROR:py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
py4j.protocol.Py4JError: An error occurred while calling o0.setLocatable

how to solve this?

Atomic data exchange between Python and Java

Current data flow:

Python arguments -> Java arguments (converting a Python dict to Java map). See the JavaGateway.dict2map(dict) abstract method: https://github.com/jpmml/jpmml-evaluator-python/blob/0.4.2/jpmml_evaluator/__init__.py#L15-L16
Java arguments -> Java results (evaluating a Java map to another Java map)
Java results -> Python results (converting a Java map to a Python dict). See the JavaGateway.map2dict(map) abstract method: https://github.com/jpmml/jpmml-evaluator-python/blob/0.4.2/jpmml_evaluator/__init__.py#L18-L19

It appear to be the case that steps 1 and 3 are rate-limiting when dealing with larger data batches. A possible solution would be to avoid dict/map conversions in the Python layer altogether.

Refactored data flow:

Python dict is passed to inner Java layer in Pickle data format.
Java application unpacks Python dict pickle, performs the evaluation, and packs the results into a Python dict pickle.
Python dict is passed to outer Python layer in Pickle data format.

This approach could be used for passing single data records (a single dict), or passing batches of data records (list of dicts, Pandas' data frame).

The Pickle data format can be read and written using the awesome Pickle library.

AttributeError: 'Timestamp' object has no attribute '_get_object_id'

Hi,

I'm trying to run the pmml file I created, using pypmml package.
when running it on date features I get this error

 command_part = REFERENCE_TYPE + parameter._get_object_id()
 AttributeError: 'Timestamp' object has no attribute '_get_object_id'

I attached the pmml file as well.

Thanks.
pmml_model_xgb_dates.zip

Choosing a default backend depending on the system architecture

The default backend is JPype, which implements Python-to-Java intercommunication using JNI. The latter requires native library support to work.

The one architecture/platform where this default fails is Mac ARM (Mac M1, M2).

For example, see here:
jpmml/sklearn2pmml#407 (comment)

The solution would be to perform some architecture auto-detection, and fall back to the most universal (but the least performant) Py4J backend.

I don't have access to Mac ARM to see the exact error (type, message and the full stack trace). Would appreciate if anybody would share this information with me.

Advice for debugging erroneous input and/or PMML documents

Hi ,

I'm trying to run pmml file from jpmm_evaluator and it gives me an error as below :

File "PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\jpmml_evaluator\__init__.py", line 156, in evaluate raise self.backend.toJavaError(e) jpmml_evaluator.JavaError: java.lang.NumberFormatException: empty String

I sent all information with arguments ( no Empty string) but Is there any way to detect which field or fields cause this error or which part of pmml file ?

Reflect Java exception hierarchy in Python

The PyJNIus backend is catching Java exceptions and converting them to generic pyjnius.JavaException error type. There is no way to obtain a handle to the original "live" Java exception, and perform additional queries on it (eg. determining its sub-type as one of UnsupportedMarkupException, InvalidMarkupException or EvaluationException, which signal three main exception categories).

The proposed fix is to either try to reconstruct a Java-like exception object, or provide some utility methods for extracting most critical information from the pyjnius.JavaException.

How to handle NaN fields

In my data, there are many NaN fields. when I try to use the evaluator on a dataframe

results_df = evaluator.evaluateAll(df)
print(results_df)

I got the following error.
JavaException: JVM exception occurred: Field "abc" cannot accept user input value NaN org.jpmml.evaluator.InvalidResultException

How should I handle NaN fields?

Index of evaluateAll output DF does not match that of input

The output dataframe is created with default RangeIndex. A quick fix is to create the output like this instead

results_df = DataFrame(
    data = results_dict["data"], 
    columns = results_dict["columns"], 
    index=arguments_df.index
)

Using PMML with SkLearn's train-test split workflow

Hi, I'm using sklearn2pmml and jpmml_evaluator for saving and loading sklearn pipeline
My

from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
col_trans = ColumnTransformer(transformers=[
    ("onehot", OneHotEncoder(handle_unknown="ignore"), ["num_iid","weekday","nick","cid","activity_type","activity_countdown"])
], remainder="passthrough")
gbr = GradientBoostingRegressor(n_estimators=500)
model_pipeline = PMMLPipeline([
    ("preprocess", col_trans),
    ("gbregressor", gbr)
])
model_pipeline.fit(X_train, Y_train)
sklearn2pmml(model_pipeline,'sklearn_pipeline.pmml', with_repr = True)

and then read the pmml file:

from jpmml_evaluator import make_evaluator
evaluator = make_evaluator('sklearn_pipeline.pmml')
evaluator.verify()

and I got an error:

JavaError: org.jpmml.evaluator.EvaluationException: No PMML data type for Java data type null

then I tried to evaluate training data:

evaluator.evaluateAll(X_train)

It produced all None predictions

In addtion, when I perform:

str([x.getName() for x in evaluator.getInputFields()])

I found that the names of remaining columns in ColumnTransformer were missing, they became x3 x4 instead

Setting JAVA_HOME required although java is installed in PATH

I'm having trouble running this on travis for unit tests.

Java 8 is installed and in the PATH.
Not setting JAVA_HOME directly results in a python KeyError
Setting it to the base of the JDK or JRE on travis results in looking at the wrong path, see build log here: https://travis-ci.org/fact-project/aict-tools/jobs/550851228

Reporting of PMML

When I try to extract report of pmml with below link and follow all instraction , It gives me the results final probability values only. Is there something I miss?

https://openscoring.io/blog/2019/02/26/jpmml_evaluator_api_tracing_reporting_predictions/

My result:
{'Adjusted': 0,
'probability(0)': 0.8911293745040894,
'probability(1)': 0.10887060314416885}

Question: Can I use sklearn2pmml plugin in jpmml evaluator for Python?

Hi,

I have the following question:
Can I use sklearn2pmml plugin in jpmml evaluator for Python? I know, that in jpmml evaluator for Java I can simply add plugin as a jar dependency for Maven, but in case of Python I suppose, I need to add it to the java classpath in some way. Is it possible to do that in Python? Can you give me any example?

Regards,
Piotr

Problems when inputting values for date/datetime fields

Hello Villu

I'm sorry that I still need your help to troubleshoot a problem predicted by pmml

Last week, I successfully converted my Python model to pmml.

When I used pypmml to call the prediction, I found that the prediction value was inaccurate. Of course, I followed your instructions and installed JPMML-Evaluator-Python

However, when I used JPMML-Evaluator-Python, it didn't work properly and I just reported an error

Here is my code, written according to the readme prompt

from jpmml_evaluator import make_evaluator
from jpmml_evaluator.py4j import launch_gateway, Py4JBackend

# Launch the gateway
gateway = launch_gateway()

# Construct a Py4J backend based on the newly launched gateway
backend = Py4JBackend(gateway)

evaluator = make_evaluator(backend, "pipeline_test.pmml")
evaluator.evaluateAll(x_oot_1)

Here is the error code

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df, nan_as_missing)
    128                 try:
--> 129                         result_records = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluateAll", self.javaEvaluator, argument_records)
    130                 except Exception as e:

~/.local/lib/python3.7/site-packages/jpmml_evaluator/py4j.py in staticInvoke(self, className, methodName, *args)
     24                 javaMember = javaClass.__getattr__(methodName)
---> 25                 return javaMember(*args)
     26 

~/.local/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1322         return_value = get_return_value(
-> 1323             answer, self.gateway_client, self.target_id, self.name)
   1324 

~/.local/lib/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:

Py4JJavaError: An error occurred while calling z:org.jpmml.evaluator.python.PythonUtil.evaluateAll.
: org.jpmml.evaluator.TypeCheckException: Expected date value, got double value
	at org.jpmml.evaluator.TypeUtil.toDate(TypeUtil.java:784)
	at org.jpmml.evaluator.TypeUtil.cast(TypeUtil.java:508)
	at org.jpmml.evaluator.TypeUtil.parseOrCast(TypeUtil.java:69)
	at org.jpmml.evaluator.ScalarValue.<init>(ScalarValue.java:33)
	at org.jpmml.evaluator.DiscreteValue.<init>(DiscreteValue.java:30)
	at org.jpmml.evaluator.OrdinalValue.<init>(OrdinalValue.java:38)
	at org.jpmml.evaluator.OrdinalValue.create(OrdinalValue.java:122)
	at org.jpmml.evaluator.FieldValue.create(FieldValue.java:364)
	at org.jpmml.evaluator.FieldValue.cast(FieldValue.java:109)
	at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:72)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
	at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
	at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
	at org.jpmml.evaluator.ExpressionUtil.evaluateFieldRef(ExpressionUtil.java:226)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:143)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
	at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:405)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
	at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
	at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
	at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
	at org.jpmml.evaluator.ExpressionUtil.evaluateFieldRef(ExpressionUtil.java:226)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:143)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
	at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:405)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
	at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:345)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
	at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
	at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
	at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
	at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
	at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
	at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:142)
	at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
	at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:101)
	at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:73)
	at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:63)
	at org.jpmml.evaluator.PredicateUtil.evaluatePredicateContainer(PredicateUtil.java:53)
	at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateTree(SimpleTreeModelEvaluator.java:122)
	at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateAny(SimpleTreeModelEvaluator.java:90)
	at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateRegression(SimpleTreeModelEvaluator.java:77)
	at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:443)
	at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:595)
	at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateRegression(MiningModelEvaluator.java:231)
	at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:443)
	at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:224)
	at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:595)
	at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:303)
	at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:446)
	at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:224)
	at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:300)
	at org.jpmml.evaluator.python.PythonUtil.evaluate(PythonUtil.java:92)
	at org.jpmml.evaluator.python.PythonUtil.evaluateAll(PythonUtil.java:58)
	at org.jpmml.evaluator.python.PythonUtil.evaluateAll(PythonUtil.java:48)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

JavaError                                 Traceback (most recent call last)
<ipython-input-178-5a1bd5bd787f> in <module>
----> 1 evaluator.evaluateAll(x_oot_1)

~/.local/lib/python3.7/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df, nan_as_missing)
    129                         result_records = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluateAll", self.javaEvaluator, argument_records)
    130                 except Exception as e:
--> 131                         raise self.backend.toJavaError(e)
    132                 result_records = self.backend.loads(result_records)
    133                 return DataFrame.from_records(result_records)

JavaError: org.jpmml.evaluator.TypeCheckException: Expected date value, got double value

I tried to analyze the problem by myself, and it seemed that the data format was wrong

JavaError: org.jpmml.evaluator.TypeCheckException: Expected date value, got double value

However, none of the columns in my input need date format, nor does it need date format itself. I used pipepline before to predict with the same data is OK (I don't know if you still remember, Detailed requirements I mentioned in [sklearn2pmml # 356] (jpmml/sklearn2pmml#356))

I also checked my pmml file and it looks correct as well, none of the 60 features required are date columns

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
	<Header>
		<Application name="SkLearn2PMML package" version="0.86.0"/>
		<Timestamp>2022-10-28T06:27:47Z</Timestamp>
	</Header>
	<DataDictionary>
		<DataField name="my_single_target" optype="categorical" dataType="integer">
			<Value value="0"/>
			<Value value="1"/>
		</DataField>
		<DataField name="d3_daytime_opst_phone_called_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="m1_accu_cnt" optype="continuous" dataType="double"/>
		<DataField name="d1_under15s_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d15_creditcard_call_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_daytime_voice_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d15_under15s_called_opst_mbl_cnt" optype="continuous" dataType="double"/>
		<DataField name="d3_dur_under_10s_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_accu_rm_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_accu_rm_called_dur" optype="continuous" dataType="double"/>
		<DataField name="d1_once_called_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_once_called_opst_mbl_cnt" optype="continuous" dataType="double"/>
		<DataField name="d1_once_called_opst_mbl_cnt" optype="continuous" dataType="double"/>
		<DataField name="d3_called_dur_under_30s_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d15_nighttime_mbl_innet_cnt" optype="continuous" dataType="double"/>
		<DataField name="d3_daytime_voice_called_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_called_dur_under_10s_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_dur_over_180s_cnt" optype="continuous" dataType="double"/>
		<DataField name="m1_accu_called_cnt" optype="continuous" dataType="double"/>
		<DataField name="d1_daytime_voice_called_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d7_loc_called_inact_day_cnt" optype="continuous" dataType="double"/>
		<DataField name="d3_under15s_called_opst_mbl_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="m3_accu_called_cnt" optype="continuous" dataType="double"/>
		<DataField name="d1_called_dur_under_30s_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d15_rm_inact_day_cnt" optype="continuous" dataType="double"/>
		<DataField name="d3_daytime_opst_mbl_called_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_called_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_under15s_called_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="m3_accu_cnt" optype="continuous" dataType="double"/>
		<DataField name="d7_under15s_called_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_under15s_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="m3_accu_loc_cnt" optype="continuous" dataType="double"/>
		<DataField name="d15_called_dur_over_180s_cnt" optype="continuous" dataType="double"/>
		<DataField name="d3_accu_loc_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_once_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_daytime_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_dur_under_30s_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d7_under15s_called_opst_mbl_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_accu_loc_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_daytime_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d3_under15s_opst_mbl_cnt" optype="continuous" dataType="double"/>
		<DataField name="d1_accu_called_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d15_bankloan_call_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_dur_under_10s_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d7_once_called_opst_mbl_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d7_dur_over_180s_cnt" optype="continuous" dataType="double"/>
		<DataField name="modify_date" optype="continuous" dataType="double"/>
		<DataField name="day_id" optype="continuous" dataType="double"/>
		<DataField name="d7_once_called_opst_mbl_cnt" optype="continuous" dataType="double"/>
		<DataField name="d3_once_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_daytime_opst_phone_called_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d15_mbl_innet_day" optype="continuous" dataType="double"/>
		<DataField name="d3_once_called_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d15_express_call_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d15_insurance_call_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_daytime_opst_mbl_called_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="d1_accu_loc_called_dur" optype="continuous" dataType="double"/>
		<DataField name="d1_called_opst_phone_cnt_rt" optype="continuous" dataType="double"/>
		<DataField name="m3_accu_loc_called_cnt" optype="continuous" dataType="double"/>
		<DataField name="d7_called_dur_over_180s_cnt" optype="continuous" dataType="double"/>
		<DataField name="d3_nighttime_mbl_innet_cnt" optype="continuous" dataType="double"/>
		<DataField name="channel_type_cd_3" optype="categorical" dataType="string"/>
	</DataDictionary>

So I can't tell what the problem is.

The only thing I can think of is maybe the problem is not in the input but in the output?

Because I am in order to avoid an error (similar to ), added that one line of code

pipeline_test.target_fields = ["my_single_target"]

I don't know whether this is the cause of the problem, in a word, could you help me to make a simple analysis

Getting subprocess.CalledProcessError: Command '['which', 'javac']' returned non-zero exit status 1 when calling make_evaluator with jnius

Hi,

I have the following problem.

Here's my code:

        jnius_configure_classpath(user_classpath=[
            get_project_root_dir() + '/sklearn2pmml_config/',
            get_project_root_dir() + '/sklearn2pmml_plugin/target/plugin-1.0-SNAPSHOT.jar']
        )

        backend = PyJNIusBackend()

        evaluator = make_evaluator(backend, pmml_file_path).verify()
        inputs = backend.map2dict(input_data)
        results = evaluator.evaluate(inputs)

It throws an error:

subprocess.CalledProcessError: Command '['which', 'javac']' returned non-zero exit status 1.

during make_evaluator(...) invocation.

Complete stacktrace is provided below.

I debugged this code and I've noticed that error occurs in pyjnius.py file on line 15 in newObject function when calling from jnius import autoclass.

When I type echo $JAVA_HOME, I have the following output:

//home/piotrwittchen/.sdkman/candidates/java/current

current links to //home/piotrwittchen/.sdkman/candidates/java/8.0.252-zulu

When I type whereis javac, I have the following output:

javac: /home/piotrwittchen/.sdkman/candidates/java/8.0.252-zulu/bin/javac

I have no idea what can be wrong in my setup.

When I enter python and type:

>>> from jnius import autoclass

then I don't get any errors and I can perform simple hello world example without problems:

>>> autoclass('java.lang.System').out.println('Hello world')

It seems that there's an issue related to pyjnius integration inside jpmml-evaluator-python.

My stacktrace:

File "/home/piotrwittchen/.virtualenvs/vodka/lib/python3.6/site-packages/jpmml_evaluator/__init__.py", line 149, in make_evaluator
    evaluatorBuilder = LoadingModelEvaluatorBuilder(backend) \
  File "/home/piotrwittchen/.virtualenvs/vodka/lib/python3.6/site-packages/jpmml_evaluator/__init__.py", line 119, in __init__
    javaModelEvaluatorBuilder = backend.newObject("org.jpmml.evaluator.LoadingModelEvaluatorBuilder")
  File "/home/piotrwittchen/.virtualenvs/vodka/lib/python3.6/site-packages/jpmml_evaluator/pyjnius.py", line 15, in newObject
    from jnius import autoclass
  File "/home/piotrwittchen/.virtualenvs/vodka/lib/python3.6/site-packages/jnius/__init__.py", line 42, in <module>
    from .reflect import *  # noqa
  File "/home/piotrwittchen/.virtualenvs/vodka/lib/python3.6/site-packages/jnius/reflect.py", line 20, in <module>
    class Class(with_metaclass(MetaJavaClass, JavaClass)):
  File "/home/piotrwittchen/.virtualenvs/vodka/lib/python3.6/site-packages/six.py", line 856, in __new__
    return meta(name, resolved_bases, d)
  File "jnius/jnius_export_class.pxi", line 119, in jnius.MetaJavaClass.__new__
  File "jnius/jnius_export_class.pxi", line 179, in jnius.MetaJavaClass.resolve_class
  File "jnius/jnius_env.pxi", line 11, in jnius.get_jnienv
  File "jnius/jnius_jvm_dlopen.pxi", line 134, in jnius.get_platform_jnienv
  File "jnius/jnius_jvm_dlopen.pxi", line 84, in jnius.create_jnienv
  File "jnius/jnius_jvm_dlopen.pxi", line 47, in jnius.find_java_home
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['which', 'javac']' returned non-zero exit status 1.

py4j.protocol.Py4JNetworkError: Answer from Java side is empty

I copied the code in readme except replace the pmml file and csv file , and then raise a error.

Code:

from jpmml_evaluator import launch_gateway

gateway = launch_gateway()

from jpmml_evaluator import Evaluator, LoadingModelEvaluatorBuilder

evaluatorBuilder = LoadingModelEvaluatorBuilder(gateway) \
	.setLocatable(True) \
	.loadFile("model.pmml")

evaluator = evaluatorBuilder.build() \
	.verify()


import pandas

arguments_df = pandas.read_csv("Churn.csv", sep = ",")

results_df = evaluator.evaluateAll(arguments_df)
print(results_df)

gateway.shutdown()

Error:

E:\MachineLearning\venv\Scripts\python.exe E:/MachineLearning/GeekWeek/main.py
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "E:\MachineLearning\venv\lib\site-packages\py4j\java_gateway.py", line 1188, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\MachineLearning\venv\lib\site-packages\py4j\java_gateway.py", line 1014, in send_command
    response = connection.send_command(command)
  File "E:\MachineLearning\venv\lib\site-packages\py4j\java_gateway.py", line 1193, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
  File "E:/MachineLearning/GeekWeek/main.py", line 8, in <module>
    .setLocatable(True) \
  File "E:\MachineLearning\venv\lib\site-packages\jpmml_evaluator\__init__.py", line 110, in setLocatable
    self.javaModelEvaluatorBuilder.setLocatable(locatable)
  File "E:\MachineLearning\venv\lib\site-packages\py4j\java_gateway.py", line 1286, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "E:\MachineLearning\venv\lib\site-packages\py4j\protocol.py", line 336, in get_return_value
    format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o0.setLocatable

Process finished with exit code 1

Is there a way to turn off `too many input fields` exception?

I have a model with ~3,000 signals. When I try to load the model, it returns this error.
JavaException: JVM exception occurred: Model has too many input fields org.jpmml.evaluator.InvalidElementException
Is there a way to get around this?

Thanks,
shun

PyJNIus backend can't handle `None` dict values

Hi Villu.
My pandas dataframe has np.nan in it.
The PMML has missing handling in the mining field

The same data evaluates from a CSV file using org.jpmml.evaluator.example.EvaluationExample
I get this error:

~/.local/lib/python3.8/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df)
     89                 result_records = []
     90                 for argument_record in argument_records:
---> 91                         result_record = self.evaluate(argument_record)
     92                         result_records.append(result_record)
     93                 return DataFrame.from_records(result_records)

~/.local/lib/python3.8/site-packages/jpmml_evaluator/__init__.py in evaluate(self, arguments)
     80                 javaArguments = self.backend.dict2map(arguments)
     81                 javaArguments = self.backend.staticInvoke("org.jpmml.evaluator.EvaluatorUtil", "encodeKeys", javaArguments)
---> 82                 javaResults = self.javaEvaluator.evaluate(javaArguments)
     83                 javaResults = self.backend.staticInvoke("org.jpmml.evaluator.EvaluatorUtil", "decodeAll", javaResults)
     84                 results = self.backend.map2dict(javaResults)

jnius/jnius_export_class.pxi in jnius.JavaMultipleMethod.__call__()

jnius/jnius_export_class.pxi in jnius.JavaMethod.__call__()

jnius/jnius_export_class.pxi in jnius.JavaMethod.call_method()

jnius/jnius_jvm_dlopen.pxi in jnius.create_jnienv()

JavaException: JVM exception occurred: Field "xxxxx" cannot accept user input value NaN org.jpmml.evaluator.InvalidResultException

Convert PMML serialized model to Sklearn

Hi Team

Is it possilbe to convert PMML serialized model to scikit-learn model instance using jpmml-evaluator-python library

Considering jnius for the jni communication

The major difference between py4j and jnius is that py4j does a network socket based communication with it's own gateway server. While jnius does JNI communication with a JVM.

A while back I had the same usecase of running PMMLs in python, and evaluated them both. Redid the evaluation when I saw this library:

from jpmml_evaluator import launch_gateway, _package_classpath
from jpmml_evaluator import Evaluator, LoadingModelEvaluatorBuilder

import jnius_config
jnius_config.set_classpath(*_package_classpath())

py4j_gateway = None
py4j_evaluator = None
jnius_evaluator = None


def py4j_createjvm():
    global py4j_gateway
    if py4j_gateway is None:
        py4j_gateway = launch_gateway()

def py4j_loadmodel():
    global py4j_evaluator
    evaluatorBuilder = LoadingModelEvaluatorBuilder(py4j_gateway) \
        .setLocatable(True) \
        .loadFile("jpmml_evaluator/tests/resources/DecisionTreeIris.pmml")

    py4j_evaluator = evaluatorBuilder.build().verify()

def py4j_fieldquery():
    inputFields = py4j_evaluator.getInputFields()
    vals = [inputField.getName() for inputField in inputFields]
    # print("Input fields: ", vals)

    targetFields = py4j_evaluator.getTargetFields()
    vals = [targetField.getName() for targetField in targetFields]
    # print("Target field(s): ", vals)

    outputFields = py4j_evaluator.getOutputFields()
    vals = [outputField.getName() for outputField in outputFields]
    # print("Output fields: ", vals)

def py4j_score(val=0):
    arguments = {
        "Sepal_Length" : 5.1 + val,
        "Sepal_Width" : 3.5 + val,
        "Petal_Length" : 1.4 + val,
        "Petal_Width" : 0.2 + val,
    }

    results = py4j_evaluator.evaluate(arguments)
    # print(results)


def jnius_createjvm():
    import jnius

def jnius_loadmodel():
    global jnius_evaluator
    import jnius
    jLoadingModelEvaluatorBuilder = jnius.autoclass("org.jpmml.evaluator.LoadingModelEvaluatorBuilder")
    jFile = jnius.autoclass('java.io.File')
    jString = jnius.autoclass("java.lang.String")

    evaluatorBuilder = jLoadingModelEvaluatorBuilder()
    evaluatorBuilder.setLocatable(True)
    evaluatorBuilder.load(jFile(jString("jpmml_evaluator/tests/resources/DecisionTreeIris.pmml")))
    jnius_evaluator = evaluatorBuilder.build()

def jnius_fieldquery():
    import jnius
    inputFields = jnius_evaluator.getInputFields()
    vals = [inputFields.get(i).getName().getValue() for i in range(inputFields.size())]
    # print("Input fields: ", vals)

    targetFields = jnius_evaluator.getTargetFields()
    vals = [targetFields.get(i).getName().getValue() for i in range(targetFields.size())]
    # print("Target field(s): ", vals)

    outputFields = jnius_evaluator.getOutputFields()
    vals = [outputFields.get(i).getName().getValue() for i in range(outputFields.size())]
    # print("Output fields: ", vals)

def jnius_score(val=0):
    import jnius
    jModelEvaluationContext = jnius.autoclass('org.jpmml.evaluator.ModelEvaluationContext')
    jEvaluatorUtil = jnius.autoclass("org.jpmml.evaluator.EvaluatorUtil")
    jDouble = jnius.autoclass('java.lang.Double')
    jString = jnius.autoclass("java.lang.String")

    arguments = {
        "Sepal_Length" : 5.1 + val,
        "Sepal_Width" : 3.5 + val,
        "Petal_Length" : 1.4 + val,
        "Petal_Width" : 0.2 + val,
    }

    model_inputs = jModelEvaluationContext(jnius_evaluator)

    input_fields = jnius_evaluator.getInputFields()
    for i in range(input_fields.size()):
        field_name = input_fields.get(i).getName()
        model_inputs.declare(field_name, jDouble(arguments[field_name.getValue()]))

    results1 = jnius_evaluator.evaluate(model_inputs)
    results2 = jEvaluatorUtil.decode(results1)
    results = {k: results2.get(jString(k)) for k in results2.keySet().toArray()}
    # print(results)

Create JVM

In [2]: %time py4j_createjvm()
CPU times: user 2.2 ms, sys: 6.49 ms, total: 8.7 ms
Wall time: 180 ms

In [3]: %time jnius_createjvm()
CPU times: user 70.3 ms, sys: 22.6 ms, total: 92.9 ms
Wall time: 93.9 ms

Conclusion: In JVM creation, py4j is slower than jnius. (As this is one time activity - this is not too important)

Load Model

In [5]: %time py4j_loadmodel()
CPU times: user 5.39 ms, sys: 3.48 ms, total: 8.87 ms
Wall time: 1.04 s

In [6]: %timeit py4j_loadmodel()
2.82 ms ± 627 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %time jnius_loadmodel()
CPU times: user 3.86 s, sys: 196 ms, total: 4.06 s
Wall time: 1.45 s

In [8]: %timeit jnius_loadmodel()
1.03 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Conclusion: py4j is slower than jnius on the loading part

Field Queries

In [9]: %time py4j_fieldquery()
CPU times: user 4.6 ms, sys: 2.51 ms, total: 7.1 ms
Wall time: 17 ms

In [10]: %timeit py4j_fieldquery()
3.52 µs ± 352 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [11]: %time jnius_fieldquery()
CPU times: user 9.85 ms, sys: 1.3 ms, total: 11.1 ms
Wall time: 8.27 ms

In [12]: %timeit jnius_fieldquery()
217 µs ± 9.66 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Conclusion: jnius is faster on the first load, but py4j is faster for subsequent runs of the same.

Scoring

In [13]: %time py4j_score()
CPU times: user 6.97 ms, sys: 3.67 ms, total: 10.6 ms
Wall time: 77.9 ms

In [14]: %timeit py4j_score()
4.15 ms ± 589 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [15]: %time jnius_score()
CPU times: user 93 ms, sys: 4.61 ms, total: 97.6 ms
Wall time: 72.5 ms

In [16]: %timeit jnius_score()
527 µs ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [22]: val = 0.3
    ...: %time py4j_score(val)
    ...: %time jnius_score(val)
    ...:
CPU times: user 1.93 ms, sys: 1.33 ms, total: 3.26 ms
Wall time: 6.35 ms
CPU times: user 1.02 ms, sys: 40 µs, total: 1.06 ms
Wall time: 1.05 ms

In [23]: val = 0.4
    ...: %time py4j_score(val)
    ...: %time jnius_score(val)
    ...:
CPU times: user 1.87 ms, sys: 1.13 ms, total: 3 ms
Wall time: 5 ms
CPU times: user 1.08 ms, sys: 570 µs, total: 1.65 ms
Wall time: 1.09 ms

Conclusion: Both py4j and jnius have similar first scoring times. But jnius tends to be 5 times faster than py4j for subsequent runs.

Notes:

This is just a preliminary test. The pyjnius code above is not very optimized as it runs autoclass in eery function, converts Map -> set -> array and so on. Just put it together for a rough benchmark
Im not sure how much time you've spend on the py4j optimization either (so not sure how optimized it is)
As the PMML becomes more complex, the overheads involved here would reduce probably as more time would be spent inside Java and lesser in the communication (And the java time for both would be same as they both use same libraries)
Jnius uses JNI and py4j uses socket, the communication with JNI is expected to be much faster.

Impact:

PMML is typically run on batch data (bunch of records - jpmml-evaluator-python's evaluateAll()) or is run in a single record execution (jpmml-evaluator-python's evaluate())

In batch evaluation: Having a 5 times slower PMML evaluator may not be a major deal unless you go towards huge data where running things on a million records can take a much longer time (jnius with 1ms/record would take ~5mins for 300k records, while py4j would take ~25mins)
In single evaluation: In APIs/realtime contexts every millisec is helpful

PS: This was done on 23dd82f

Using Python equivalent of the basic usage of jpmml-evaluator from Java

Here's an example of basic usage of jpmml-evaluator for Java: https://github.com/jpmml/jpmml-evaluator#basic-usage.
How can I write equivalent code in Python?

Especially this part:

	Map<String, ?> inputRecord = readRecord();
	if(inputRecord == null){
		break;
	}

	Map<FieldName, FieldValue> arguments = new LinkedHashMap<>();

	// Mapping the record field-by-field from data source schema to PMML schema
	for(InputField inputField : inputFields){
		FieldName inputName = inputField.getName();

		Object rawValue = inputRecord.get(inputName.getValue());

		// Transforming an arbitrary user-supplied value to a known-good PMML value
		FieldValue inputValue = inputField.prepare(rawValue);

		arguments.put(inputName, inputValue);
	}

	// Evaluating the model with known-good arguments
	Map<FieldName, ?> results = evaluator.evaluate(arguments);

in order to create arguments map or dictionary in case of Python. I couldn't find anything like this in the documentation.

jpmml / jpmml-evaluator-python Goto Github PK

jpmml-evaluator-python's Introduction

IMPORTANT

Features

Class model

Evaluation engine

Installation

Class model

Evaluation engine

Usage

Class model

Example applications

Evaluation engine

Example applications

Additional information

jpmml-evaluator-python's People

Contributors

Stargazers

Watchers

Forkers

jpmml-evaluator-python's Issues

Create JVM

Load Model

Field Queries

Scoring

Impact:

Recommend Projects

Recommend Topics

Recommend Org