Giter VIP home page Giter VIP logo

jpmml-r's Introduction

Java API for producing and scoring models in Predictive Model Markup Language (PMML).

IMPORTANT

This is a legacy codebase.

Starting from March 2014, this project has been superseded by [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) projects.

Features

Class model

  • Full support for PMML 3.0, 3.1, 3.2, 4.0 and 4.1 schemas:
    • Class hierarchy.
    • Schema version annotations.
  • Fluent API:
    • Value constructors.
  • SAX Locator information
  • [Visitor pattern] (http://en.wikipedia.org/wiki/Visitor_pattern):
    • Validation agents.
    • Optimization and transformation agents.

Evaluation engine

Installation

JPMML library JAR files (together with accompanying Java source and Javadocs JAR files) are released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.0.22 (17 February, 2014).

Class model

<!-- Class model classes -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-model</artifactId>
	<version>${jpmml.version}</version>
</dependency>
<!-- Class model annotations -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-schema</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Evaluation engine

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Usage

Class model

The class model consists of two types of classes. There is a small number of manually crafted classes that are used for structuring the class hierarchy. They are permanently stored in the Java sources directory /pmml-model/src/main/java. Additionally, there is a much greater number of automatically generated classes that represent actual PMML elements. They can be found in the generated Java sources directory /pmml-model/target/generated-sources/xjc after a successful build operation.

All class model classes descend from class org.dmg.pmml.PMMLObject. Additional class hierarchy levels, if any, represent common behaviour and/or features. For example, all model classes descend from class org.dmg.pmml.Model.

There is not much documentation accompanying class model classes. The application developer should consult with the [PMML specification] (http://www.dmg.org/v4-1/GeneralStructure.html) about individual PMML elements and attributes.

Example applications

Evaluation engine

A model evaluator class can be instantiated directly when the contents of the PMML document is known:

PMML pmml = ...;

ModelEvaluator<TreeModel> modelEvaluator = new TreeModelEvaluator(pmml);

Otherwise, a PMML manager class should be instantiated first, which will inspect the contents of the PMML document and instantiate the right model evaluator class later:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);
 
ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

Model evaluator classes follow functional programming principles. Model evaluator instances are cheap enough to be created and discarded as needed (ie. not worth the pooling effort).

It is advisable for application code to work against the org.jpmml.evaluator.Evaluator interface:

Evaluator evaluator = (Evaluator)modelEvaluator;

An evaluator instance can be queried for the definition of active (ie. independent), predicted (ie. primary dependent) and output (ie. secondary dependent) fields:

List<FieldName> activeFields = evaluator.getActiveFields();
List<FieldName> predictedFields = evaluator.getPredictedFields();
List<FieldName> outputFields = evaluator.getOutputFields();

The PMML scoring operation must be invoked with valid arguments. Otherwise, the behaviour of the model evaluator class is unspecified.

The preparation of field values:

Map<FieldName, FieldValue> arguments = new LinkedHashMap<FieldName, FieldValue>();

List<FieldName> activeFields = evaluator.getActiveFields();
for(FieldName activeField : activeFields){
	// The raw (ie. user-supplied) value could be any Java primitive value
	Object rawValue = ...;

	// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue activeValue = evaluator.prepare(activeField, rawValue);

	arguments.put(activeField, activeValue);
}

The scoring:

Map<FieldName, ?> results = evaluator.evaluate(arguments);

Typically, a model has exactly one predicted field, which is called the target field:

FieldName targetName = evaluator.getTargetField();
Object targetValue = results.get(targetName);

The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	Object primitiveValue = computable.getResult();
}

The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;
	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
	Entity winner = entities.get(hasEntityId.getEntityId());

	// Test for "probability" result feature
	if(targetValue instanceof HasProbability){
		HasProbability hasProbability = (HasProbability)targetValue;
		Double winnerProbability = hasProbability.getProbability(winner.getId());
	}
}
Example applications

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-r's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jpmml-r's Issues

Java RExpParser exception

Hi,
thanks a lot for your work!
I noticed a problem when working with JPMML-R: if I have a matrix of features in which the features names contains some particular characters (such as &) the package throws an exception connected to RExpParser. On the opposite the JPMML-Sklearn package is not affected by this behaviour: it creates an xml file containing the names in which the character "&" is correctly substituted by "&amp;".
Do you think this is a problem? If so, can you fix it?
Best,
Simon

Dropped Accuracy

Hi!
I was able to export a xgb model from R to pmml format using r2pmml. However, when imported to Java accuracy dropped. I have an accuracy around 54% on R, but getting around 30% on pmml.
Do you know what could be causing this change?
Thanks in advance!

jpmml-r .jar file issue

Hi,

I'm trying to convert my gbm model to pmml by changing it into RDS format first and then use command line java to change it into pmml. This is the code i'm using..

saveRDS(model, "model.rds") ## Using R to build a model

java -jar target/converter-executable-1.1-SNAPSHOT.jar --rds-input model.rds --pmml-output model.pmml ## Using java to to convert to pmml in terminal(mac)

However this is the error i get - "Error: Unable to access jarfile target/converter-executable-1.1-SNAPSHOT.jar"

I'm using Java version "1.8.0_65" 64-bit. Any help appreciated.

"The value for field Species is not defined" when i use xgboost comes to this problem,anyone can help?

R code

library(xgboost)
model2 <- xgboost(
data = as.matrix(iris[, 1:4]),
label = as.numeric(iris[, 5]) - 1,
max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
objective = "multi:softprob", num_class = 3
)
as.matrix(iris[, 1:4])
#Save the tree information in an external file:
xgb.dump(model2, "model2.dumped.trees")
#Convert to PMML:
model.pmml<-pmml(model2,
input_feature_names = colnames(as.matrix(iris[, 1:4])),
missing_value_replacement=
output_label_name = "Species",
output_categories = c(1, 2, 3), xgb_dump_file = "model2.dumped.trees"
)
save_pmml(model.pmml, name="iris.xgb.pmml")

it seems "Species"has been defined in the pmml
image

Support for decision engineering

When using "formula interface", it should be possible to apply transformations also on the label. For example, log(y) ~ . should be expanded into a two-step workflow, where the model first computes "raw" y value, and then applies the logarithmic transformation to it.

Random forest survival (ranger)

I see in your description that you covered the random forest classification and regression of ranger package but not the survival. I already tried that and did not work for the ranger survival situation. Are a lot of modifications need to be done or I can do it with minor changes to the package?

Add support for the tweedie distribution in GLM models

Hi,

I need to convert a GLM with tweedie link function in R to pmml. I see the current version only support binomial or gaussian. Is there a way to do this or if you can guide me where to specify the parameter for link function in pmml. I have checked on pmml.org they support tweedie distribution.

Regards
Deepankar Arora

ClassCastException while converting rds to pmml

Hi,

When we tried convert rf.rds to rf.pmml, we got the following error:

Feb 07, 2017 9:53:23 PM org.jpmml.rexp.Main run
SEVERE: Failed to convert
java.lang.ClassCastException: org.jpmml.rexp.RDoubleVector cannot be cast to org.jpmml.rexp.RIntegerVector
        at org.jpmml.rexp.RandomForestConverter.encodeClassification(RandomForestConverter.java:259)
        at org.jpmml.rexp.RandomForestConverter.encodeModel(RandomForestConverter.java:95)
        at org.jpmml.rexp.RandomForestConverter.encodeModel(RandomForestConverter.java:50)
        at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:78)
        at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
        at org.jpmml.rexp.Main.run(Main.java:149)
        at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.ClassCastException: org.jpmml.rexp.RDoubleVector cannot be cast to org.jpmml.rexp.RIntegerVector
        at org.jpmml.rexp.RandomForestConverter.encodeClassification(RandomForestConverter.java:259)
        at org.jpmml.rexp.RandomForestConverter.encodeModel(RandomForestConverter.java:95)
        at org.jpmml.rexp.RandomForestConverter.encodeModel(RandomForestConverter.java:50)
        at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:78)
        at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
        at org.jpmml.rexp.Main.run(Main.java:149)
        at org.jpmml.rexp.Main.main(Main.java:97)

Kindly look into this issue.

Support for type cast functions

The expression parser component should support the following cast functions:

  • as.integer. Casts the argument to continuous integer.
  • as.numeric. Casts the argument to continuous double.
  • as.character. Casts the argument to categorical string.
  • as.factor. Casts the argument from continuous operational type to categorical operational type, while preserving the data type. For example, casting a continuous integer to categorical integer.

R pmml from glm scorecard

I tried to generate the scorecard from glm model but it didnt work.
It worked perfectly fine on tree model like rpart.

here is my script:

library(r2pmml)
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)

mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
mylogit_SC<-r2pmml::as.scorecard(mylogit)
pmml_mylogit<-r2pmml(mylogit_SC,"pmml_mylogit_sc.pmml")

I got these error messages:

org.jpmml.rexp.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException
	at org.jpmml.rexp.RExpUtil.getFactorLevels(RExpUtil.java:77)
	at org.jpmml.rexp.GLMConverter.encodeSchema(GLMConverter.java:59)
	at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:69)
	at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
	at org.jpmml.rexp.Main.run(Main.java:149)
	at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException
	at org.jpmml.rexp.RExpUtil.getFactorLevels(RExpUtil.java:77)
	at org.jpmml.rexp.GLMConverter.encodeSchema(GLMConverter.java:59)
	at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:69)
	at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
	at org.jpmml.rexp.Main.run(Main.java:149)
	at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 
  1
In addition: Warning message:
running command '"java" -cp "C:/Program Files/R/R-3.4.2/library/r2pmml/java/guava-25.1-jre.jar;C

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.