Giter VIP home page Giter VIP logo

jpmml-python's Introduction

Java API for producing and scoring models in Predictive Model Markup Language (PMML).

IMPORTANT

This is a legacy codebase.

Starting from March 2014, this project has been superseded by [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) projects.

Features

Class model

  • Full support for PMML 3.0, 3.1, 3.2, 4.0 and 4.1 schemas:
    • Class hierarchy.
    • Schema version annotations.
  • Fluent API:
    • Value constructors.
  • SAX Locator information
  • [Visitor pattern] (http://en.wikipedia.org/wiki/Visitor_pattern):
    • Validation agents.
    • Optimization and transformation agents.

Evaluation engine

Installation

JPMML library JAR files (together with accompanying Java source and Javadocs JAR files) are released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.0.22 (17 February, 2014).

Class model

<!-- Class model classes -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-model</artifactId>
	<version>${jpmml.version}</version>
</dependency>
<!-- Class model annotations -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-schema</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Evaluation engine

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Usage

Class model

The class model consists of two types of classes. There is a small number of manually crafted classes that are used for structuring the class hierarchy. They are permanently stored in the Java sources directory /pmml-model/src/main/java. Additionally, there is a much greater number of automatically generated classes that represent actual PMML elements. They can be found in the generated Java sources directory /pmml-model/target/generated-sources/xjc after a successful build operation.

All class model classes descend from class org.dmg.pmml.PMMLObject. Additional class hierarchy levels, if any, represent common behaviour and/or features. For example, all model classes descend from class org.dmg.pmml.Model.

There is not much documentation accompanying class model classes. The application developer should consult with the [PMML specification] (http://www.dmg.org/v4-1/GeneralStructure.html) about individual PMML elements and attributes.

Example applications

Evaluation engine

A model evaluator class can be instantiated directly when the contents of the PMML document is known:

PMML pmml = ...;

ModelEvaluator<TreeModel> modelEvaluator = new TreeModelEvaluator(pmml);

Otherwise, a PMML manager class should be instantiated first, which will inspect the contents of the PMML document and instantiate the right model evaluator class later:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);
 
ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

Model evaluator classes follow functional programming principles. Model evaluator instances are cheap enough to be created and discarded as needed (ie. not worth the pooling effort).

It is advisable for application code to work against the org.jpmml.evaluator.Evaluator interface:

Evaluator evaluator = (Evaluator)modelEvaluator;

An evaluator instance can be queried for the definition of active (ie. independent), predicted (ie. primary dependent) and output (ie. secondary dependent) fields:

List<FieldName> activeFields = evaluator.getActiveFields();
List<FieldName> predictedFields = evaluator.getPredictedFields();
List<FieldName> outputFields = evaluator.getOutputFields();

The PMML scoring operation must be invoked with valid arguments. Otherwise, the behaviour of the model evaluator class is unspecified.

The preparation of field values:

Map<FieldName, FieldValue> arguments = new LinkedHashMap<FieldName, FieldValue>();

List<FieldName> activeFields = evaluator.getActiveFields();
for(FieldName activeField : activeFields){
	// The raw (ie. user-supplied) value could be any Java primitive value
	Object rawValue = ...;

	// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue activeValue = evaluator.prepare(activeField, rawValue);

	arguments.put(activeField, activeValue);
}

The scoring:

Map<FieldName, ?> results = evaluator.evaluate(arguments);

Typically, a model has exactly one predicted field, which is called the target field:

FieldName targetName = evaluator.getTargetField();
Object targetValue = results.get(targetName);

The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	Object primitiveValue = computable.getResult();
}

The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;
	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
	Entity winner = entities.get(hasEntityId.getEntityId());

	// Test for "probability" result feature
	if(targetValue instanceof HasProbability){
		HasProbability hasProbability = (HasProbability)targetValue;
		Double winnerProbability = hasProbability.getProbability(winner.getId());
	}
}
Example applications

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-python's People

Contributors

vruusmann avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

jpmml-python's Issues

Missing `Series._data` attribute

I am unable to create a pmml extract of my sklearn pipeline.

This is how I create my pipeline:

from category_encoders import OrdinalEncoder
from xgboost import XGBClassifier
import sklearn2pmml

cat_vars = ['X1', 'X2', .. ]
categorical_transformer = Pipeline(steps=[
    ('WOEEnc', OrdinalEncoder(handle_missing='missing'))])

preprocessor = ColumnTransformer(remainder='passthrough',
    transformers=[
       ('cat', categorical_transformer, cat_vars)])

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', XGBClassifier(**params))])
clf.fit(X, y)

sklearn2pmml.sklearn2pmml(sklearn2pmml.make_pmml_pipeline(clf), 'Exp_model.pmml', debug=True)

Stacktrace:

python: 3.7.9
sklearn: 0.24.1
sklearn2pmml: 0.69.0
joblib: 0.17.0
sklearn_pandas: 2.1.0
pandas: 1.2.1
numpy: 1.20.0
openjdk: 13.0.1
Executing command:
java -cp /Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/gson-2.8.6.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/guava-21.0.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/h2o-genmodel-3.32.0.4.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/h2o-logger-3.32.0.4.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/h2o-tree-api-0.3.17.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/istack-commons-runtime-3.0.11.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jakarta.activation-1.2.2.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jakarta.xml.bind-api-2.3.3.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jaxb-runtime-2.3.3.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jcommander-1.72.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jpmml-converter-1.4.6.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jpmml-h2o-1.1.4.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jpmml-lightgbm-1.3.6.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jpmml-python-1.0.11.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jpmml-sklearn-1.6.15.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/jpmml-xgboost-1.5.0.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/pickle-1.1.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/pmml-model-1.5.11.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/pmml-model-metro-1.5.11.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/serpent-1.30.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/slf4j-api-1.7.30.jar:/Users/myUser//anaconda3/envs/dev/lib/python3.7/site-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.30.jar org.jpmml.sklearn.Main --pkl-pipeline-input /var/folders/kr/730vknb91nz9yr07hdyvwvlm0000gn/T/pipeline-1f8er7vh.pkl.z --pmml-output Documents/IBM/Projects/EID/Exp_model_joblib.pmml
Standard output is empty
Standard error:
Mar 26, 2021 9:48:37 AM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Mar 26, 2021 9:48:37 AM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 120 ms.
Mar 26, 2021 9:48:37 AM org.jpmml.sklearn.Main run
INFO: Converting PKL to PMML..
Mar 26, 2021 9:48:37 AM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: Attribute 'sklearn2pmml.pipeline.PMMLPipeline.target_fields' is not set. Assuming y as the name of the target field
Mar 26, 2021 9:48:37 AM org.jpmml.sklearn.Main run
SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException: Attribute 'pandas.core.series.Series._data' not set
	at org.jpmml.python.PythonObject.get(PythonObject.java:69)
	at pandas.core.Series.getData(Series.java:30)
	at category_encoders.OrdinalEncoder.getCategoryMapping(OrdinalEncoder.java:115)
	at category_encoders.OrdinalEncoder.encodeFeatures(OrdinalEncoder.java:62)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn.pipeline.PipelineTransformer.encodeFeatures(PipelineTransformer.java:65)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.compose.ColumnTransformer.encodeFeatures(ColumnTransformer.java:63)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:212)
	at org.jpmml.sklearn.Main.run(Main.java:233)
	at org.jpmml.sklearn.Main.main(Main.java:151)

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'pandas.core.series.Series._data' not set
	at org.jpmml.python.PythonObject.get(PythonObject.java:69)
	at pandas.core.Series.getData(Series.java:30)
	at category_encoders.OrdinalEncoder.getCategoryMapping(OrdinalEncoder.java:115)
	at category_encoders.OrdinalEncoder.encodeFeatures(OrdinalEncoder.java:62)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn.pipeline.PipelineTransformer.encodeFeatures(PipelineTransformer.java:65)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.compose.ColumnTransformer.encodeFeatures(ColumnTransformer.java:63)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:212)
	at org.jpmml.sklearn.Main.run(Main.java:233)
	at org.jpmml.sklearn.Main.main(Main.java:151)

Scipy in ExpressionTransformer

Hello Villu,
How can I use a scipy tools in ExpressionTransformer? Or where can I read about libraries which are support ExpressionTransformer

('feature', ExpressionTransformer('scipy.special.logit(X[0])'))

PMML said "No module name scipy". Version 0.74.1

(I know, that I can use numpy.log(X[0]/(1-X[0])), but maybe scipy will work for me?)

It seems like dynamic string slicing is not allowed in ExpressionTransformer

Hello Villu,

Not sure if this is an issue per se, but it seems like following statement is not allowed for PMML conversion:

ExpressionTransformer("X[1][0:X[0]]") - Where X[1] is a string and X[0] is an integer I want to use to slice X[1].

Is there any way to achieve this with the current functionality? I have tested different options and they all seem to not work for one reason or another.

The expression works fine when testing in Python, however, when using sklearn2pmml.sklearn2pmml it throws an error. Please find an example of the code:

prep_pipe = pipeline.Pipeline([
    # Previous transformations
    , ("trim_string", proc.ExpressionTransformer("X[1][0:X[0]]"))
    ])

fitted_pipe = prep_pipe.fit(df[column_list])
fitted_pipe_pmml = sklearn2pmml.make_pmml_pipeline(fitted_pipe)
sklearn2pmml.sklearn2pmml(fitted_pipe_pmml, path)

This gives the following error:

Exception in thread "main" java.lang.IllegalArgumentException: Python expression 'X[1][0:X[0]]' is either invalid or not supported
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:35)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:22)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:73)
	at sklearn.Transformer.encode(Transformer.java:69)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:212)
	at com.sklearn2pmml.Main.run(Main.java:84)
	at com.sklearn2pmml.Main.main(Main.java:62)
Caused by: org.jpmml.python.ParseException: Encountered unexpected token: "X" <NAME>
    at line 1, column 8.

Was expecting one of:

    "+"
    "-"
    "]"
    <INT>

	at org.jpmml.python.ExpressionTranslator.generateParseException(ExpressionTranslator.java:2152)
	at org.jpmml.python.ExpressionTranslator.jj_consume_token(ExpressionTranslator.java:2015)
	at org.jpmml.python.ExpressionTranslator.StringSlicingExpression(ExpressionTranslator.java:958)
	at org.jpmml.python.ExpressionTranslator.PrimaryExpression(ExpressionTranslator.java:628)
	at org.jpmml.python.ExpressionTranslator.UnaryExpression(ExpressionTranslator.java:588)
	at org.jpmml.python.ExpressionTranslator.MultiplicativeExpression(ExpressionTranslator.java:529)
	at org.jpmml.python.ExpressionTranslator.AdditiveExpression(ExpressionTranslator.java:485)
	at org.jpmml.python.ExpressionTranslator.ComparisonExpression(ExpressionTranslator.java:425)
	at org.jpmml.python.ExpressionTranslator.NegationExpression(ExpressionTranslator.java:380)
	at org.jpmml.python.ExpressionTranslator.LogicalAndExpression(ExpressionTranslator.java:350)
	at org.jpmml.python.ExpressionTranslator.LogicalOrExpression(ExpressionTranslator.java:329)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:310)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:303)
	at org.jpmml.python.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:297)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:33)
7 more

Thank you!

Compatibility with Joblib 1.2(.0)

I'm sorry for bumping this old and closed issue up, but strangely I got the same error now, which has never been occurred to me before. To make sure, I ran the LightGBM example from your blog entry. It resulted the same error.

Standard output is empty
Standard error:
Exception in thread "main" net.razorvine.pickle.InvalidOpcodeException: invalid pickle opcode: 0
	at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:366)
	at org.jpmml.python.CustomUnpickler.dispatch(CustomUnpickler.java:31)
	at org.jpmml.python.PickleUtil$1.dispatch(PickleUtil.java:64)
	at net.razorvine.pickle.Unpickler.load(Unpickler.java:109)
	at org.jpmml.python.PickleUtil.unpickle(PickleUtil.java:85)
	at com.sklearn2pmml.Main.run(Main.java:78)
	at com.sklearn2pmml.Main.main(Main.java:66)

Here is my environment:

python-3.10.5.amd64

sklearn2pmml         0.86.3
scikit-learn         1.1.2
pandas               1.5.0
sklearn-pandas       2.2.0
joblib               1.2.0
numpy                1.23.3

java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

Is there anything that I can do to investigate further? Thank you

EDIT:
Reverting joblib to 1.1.0 solves the problem.

Originally posted by @denmase in jpmml/sklearn2pmml#8 (comment)

FunctionTransformer has an unsupported value (Python class numpy.core._multiarray_umath.log)

I'm attempting to build a simple LinearRegression pipeline that performs some preprocessing. The code is roughly,

X = data.filter(items=['Width'])
y = data['Weight']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)


mapper = DataFrameMapper([
    (["Width"], [ContinuousDomain(), FunctionTransformer(np.log)])
])

model_pipeline = PMMLPipeline([
    ("mapper", mapper),
    ("model", LinearRegression())
])

clf = model_pipeline.fit(X_train, y_train);
sklearn2pmml(clf, 'model.pmml', with_repr=True, debug=True)

However I get the following error,

python: 3.8.8
sklearn: 0.24.1
sklearn2pmml: 0.69.0
joblib: 1.0.1
sklearn_pandas: 2.1.0
pandas: 1.2.3
numpy: 1.20.2
openjdk: 1.8.0_282
Executing command:
java -cp /opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/gson-2.8.6.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/guava-21.0.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/h2o-genmodel-3.32.0.4.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/h2o-logger-3.32.0.4.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/h2o-tree-api-0.3.17.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/istack-commons-runtime-3.0.11.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jakarta.activation-1.2.2.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jakarta.xml.bind-api-2.3.3.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jaxb-runtime-2.3.3.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jcommander-1.72.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jpmml-converter-1.4.6.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jpmml-h2o-1.1.4.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jpmml-lightgbm-1.3.6.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jpmml-python-1.0.11.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jpmml-sklearn-1.6.15.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/jpmml-xgboost-1.5.0.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/pickle-1.1.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/pmml-model-1.5.11.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/pmml-model-metro-1.5.11.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/serpent-1.30.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/slf4j-api-1.7.30.jar:/opt/conda/lib/python3.8/site-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.30.jar org.jpmml.sklearn.Main --pkl-pipeline-input /tmp/pipeline-q4tmq2kn.pkl.z --pmml-output fish-weight-model.pmml
Standard output is empty
Standard error:
Apr 05, 2021 5:09:27 AM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Apr 05, 2021 5:09:27 AM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 19 ms.
Apr 05, 2021 5:09:27 AM org.jpmml.sklearn.Main run
INFO: Converting PKL to PMML..
Apr 05, 2021 5:09:27 AM org.jpmml.sklearn.Main run
SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException: Attribute 'sklearn.preprocessing._function_transformer.FunctionTransformer.func' has an unsupported value (Python class numpy.core._multiarray_umath.log)
	at org.jpmml.python.CastFunction.apply(CastFunction.java:45)
	at org.jpmml.python.PythonObject.get(PythonObject.java:91)
	at org.jpmml.python.PythonObject.getOptional(PythonObject.java:101)
	at sklearn.preprocessing.FunctionTransformer.getFunc(FunctionTransformer.java:68)
	at sklearn.preprocessing.FunctionTransformer.encodeFeatures(FunctionTransformer.java:44)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
	at sklearn.Initializer.encodeFeatures(Initializer.java:48)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:212)
	at org.jpmml.sklearn.Main.run(Main.java:233)
	at org.jpmml.sklearn.Main.main(Main.java:151)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to org.jpmml.python.Identifiable
	at java.lang.Class.cast(Class.java:3369)
	at org.jpmml.python.CastFunction.apply(CastFunction.java:43)
	... 12 more

Exception in thread "main" java.lang.IllegalArgumentException: Attribute 'sklearn.preprocessing._function_transformer.FunctionTransformer.func' has an unsupported value (Python class numpy.core._multiarray_umath.log)
	at org.jpmml.python.CastFunction.apply(CastFunction.java:45)
	at org.jpmml.python.PythonObject.get(PythonObject.java:91)
	at org.jpmml.python.PythonObject.getOptional(PythonObject.java:101)
	at sklearn.preprocessing.FunctionTransformer.getFunc(FunctionTransformer.java:68)
	at sklearn.preprocessing.FunctionTransformer.encodeFeatures(FunctionTransformer.java:44)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
	at sklearn.Initializer.encodeFeatures(Initializer.java:48)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:212)
	at org.jpmml.sklearn.Main.run(Main.java:233)
	at org.jpmml.sklearn.Main.main(Main.java:151)
Caused by: java.lang.ClassCastException: Cannot cast net.razorvine.pickle.objects.ClassDictConstructor to org.jpmml.python.Identifiable
	at java.lang.Class.cast(Class.java:3369)
	at org.jpmml.python.CastFunction.apply(CastFunction.java:43)
	... 12 more

Any help would be appreciated.

Support for string length function

Hi ,

I want to have a feature based on the length of my data

in pandas it is something like this:

data['feature_length'] = data['feature'].apply(lambda a: len(a))==19)]

in the sklear2pandas i used the bellow code

recorder.features = recorder.features + [(
    [feature], 
    [      
#         CastTransformer(str),
        CategoricalDomain(dtype=str),
            SimpleImputer(missing_values=np.nan, strategy='constant', fill_value='Miss'),
#             SubstringTransformer(15, 20),
          Alias(ExpressionTransformer("0 if len(X[0]) ==19  else 1"),name='feature_Len'),
            Alias(CastTransformer(int), name="feature_Len")

    ], {'alias': "ki_hfcustomerext_mobileappretail_ach_vset_ne_"}
)]

and got an error

SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException: len
	at org.jpmml.python.ExpressionTranslator.translateFunction(ExpressionTranslator.java:158)
	at org.jpmml.python.ExpressionTranslator.FunctionInvocationExpression(ExpressionTranslator.java:634)
	at org.jpmml.python.ExpressionTranslator.PrimaryExpression(ExpressionTranslator.java:533)

what is the command i can use to fix this error?

Expression translator should support multi-dimensional array indexing syntax

Hi, I have a scanrio where I need to use an array as a input column to my pipeline.
I'd reduced a minimal example of the issue I'm having:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn2pmml.preprocessing import ExpressionTransformer

df = pd.DataFrame({'c1': [1, 2, 3], 'c2': [[1,2], [1,2], [3,1]]})

pipeline = make_pipeline(
    ColumnTransformer(
        transformers=[
            (f'get_item_0_from_c2_array', ExpressionTransformer('X["c2"][0]'), ['c2'])
        ]
    ),
    LogisticRegression(),
)
pipeline.fit(df, [0, 0, 1])
pipeline.predict(df)

The above pipeline works fine in my jupyter notebook. But converting it to a PMML gives an error:

import sklearn2pmml

pmml_pipeline = sklearn2pmml.PMMLPipeline(steps=[
    ('pipeline',pipeline)
])

sklearn2pmml.sklearn2pmml(pmml_pipeline, './pipeline.pmml', debug=True)

Gives the error:

java.lang.IllegalArgumentException: Python expression 'X["c2"][0]' is either invalid or not supported
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:36)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:23)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:51)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.compose.ColumnTransformer.encodeFeatures(ColumnTransformer.java:63)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn.Composite.encodeModel(Composite.java:135)
	at sklearn.pipeline.PipelineClassifier.encodeModel(PipelineClassifier.java:86)
	at sklearn.Estimator.encode(Estimator.java:103)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233)
	at org.jpmml.sklearn.Main.run(Main.java:217)
	at org.jpmml.sklearn.Main.main(Main.java:143)
Caused by: org.jpmml.python.ParseException: Encountered unexpected token: "]" "]"
    at line 1, column 10.

Was expecting one of:

    ":"

	at org.jpmml.python.ExpressionTranslator.generateParseException(ExpressionTranslator.java:2110)
	at org.jpmml.python.ExpressionTranslator.jj_consume_token(ExpressionTranslator.java:1973)
	at org.jpmml.python.ExpressionTranslator.StringSlicingExpression(ExpressionTranslator.java:956)
	at org.jpmml.python.ExpressionTranslator.PrimaryExpression(ExpressionTranslator.java:637)
	at org.jpmml.python.ExpressionTranslator.UnaryExpression(ExpressionTranslator.java:597)
	at org.jpmml.python.ExpressionTranslator.MultiplicativeExpression(ExpressionTranslator.java:538)
	at org.jpmml.python.ExpressionTranslator.AdditiveExpression(ExpressionTranslator.java:494)
	at org.jpmml.python.ExpressionTranslator.ComparisonExpression(ExpressionTranslator.java:434)
	at org.jpmml.python.ExpressionTranslator.NegationExpression(ExpressionTranslator.java:389)
	at org.jpmml.python.ExpressionTranslator.LogicalAndExpression(ExpressionTranslator.java:359)
	at org.jpmml.python.ExpressionTranslator.LogicalOrExpression(ExpressionTranslator.java:338)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:319)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:312)
	at org.jpmml.python.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:306)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:34)
	... 12 more

[Method feasibility]: Whether to support the use of pandas module python to convert into *.pmml file

Hello,
I'm trying to use jpmml/jpmml-python and I would like to ask a question about how to use java to implement a function based on the pandas module.

The function using pandas.DataFrame, like this:

import pandas as pd
def psi(data_dict):
    data = {'actucal':data_dict['actucal'].values(), 'expected':data_dict['expected'].values()}
    df = pd.DataFrame(data, index = data_dict['actucal'].keys())
    df['ind'] = (df['actucal'] - df['expected']) * numpy.log(df['actucal'] / df['expected'])
    psi = sum(df['ind'])
    return psi

Perhaps, can I use the command-line to convert? like this:

java -jar target/jpmml-python-executable-*.jar  #--{some_parameters}  file_name.pkl  --pmml-output file_name.pmml

Thank you in advance for taking the time to answer my question!
@vruusmann

Feature request: power ufunc

Hi Villu.
I notice in UFuncUtil.java that PMMLFunctions.POW already exists and is used for square.
Is it now possible to pass 2 params and implement this?

If elif in ExpressionTransformer

Hi,
I am creating a DataFramemapper in which for one of the columns ExpressionTransformer was used. So, I want to fill value based on multiple conditional statements and for that, I want to write if elif statements.

Currently, I am using the following syntax but sklearn2pmml throwing error.
ExpressionTransformer('100*X[1]/X[0] if X[0]>0 and X[0]<=90 and X[1]>0 and X[1]<=90 else X[1]+900 if X[1]>90 else 0')
Below is the error-

/usr/lib/python3.8/subprocess.py:848: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
/usr/lib/python3.8/subprocess.py:853: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
Standard output is empty
Standard error:
Mar 10, 2022 12:44:11 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Mar 10, 2022 12:44:12 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 100 ms.
Mar 10, 2022 12:44:12 PM org.jpmml.sklearn.Main run
INFO: Converting..
Mar 10, 2022 12:44:12 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Python expression '100*X[1]/X[0] if X[0]>0 and X[0]<=90 and X[1]>0 and X[1]<=90 else X[1]+900 if X[1]>90 else 0' is either invalid or not supported
	at org.jpmml.sklearn.ExpressionTranslator.translate(ExpressionTranslator.java:76)
	at org.jpmml.sklearn.ExpressionTranslator.translate(ExpressionTranslator.java:63)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:47)
	at sklearn2pmml.decoration.Alias.encodeFeatures(Alias.java:56)
	at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
	at sklearn.Initializer.encodeFeatures(Initializer.java:44)
	at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
	at sklearn.Composite.encodeFeatures(Composite.java:129)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:209)
	at org.jpmml.sklearn.Main.run(Main.java:228)
	at org.jpmml.sklearn.Main.main(Main.java:148)
Caused by: org.jpmml.sklearn.ParseException: Encountered unexpected token: "if" "if"
    at line 1, column 76.

Was expecting one of:

    "!="
    "%"
    "*"
    "+"
    "-"
    "/"
    "<"
    "<="
    "=="
    ">"
    ">="
    "and"
    "or"
    <EOF>

	at org.jpmml.sklearn.ExpressionTranslator.generateParseException(ExpressionTranslator.java:1558)
	at org.jpmml.sklearn.ExpressionTranslator.jj_consume_token(ExpressionTranslator.java:1426)
	at org.jpmml.sklearn.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:215)
	at org.jpmml.sklearn.ExpressionTranslator.translate(ExpressionTranslator.java:74)
	... 11 more

Exception in thread "main" java.lang.IllegalArgumentException: Python expression '100*X[1]/X[0] if X[0]>0 and X[0]<=90 and X[1]>0 and X[1]<=90 else X[1]+900 if X[1]>90 else 0' is either invalid or not supported
	at org.jpmml.sklearn.ExpressionTranslator.translate(ExpressionTranslator.java:76)
	at org.jpmml.sklearn.ExpressionTranslator.translate(ExpressionTranslator.java:63)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:47)
	at sklearn2pmml.decoration.Alias.encodeFeatures(Alias.java:56)
	at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
	at sklearn.Initializer.encodeFeatures(Initializer.java:44)
	at sklearn.Transformer.updateAndEncodeFeatures(Transformer.java:118)
	at sklearn.Composite.encodeFeatures(Composite.java:129)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:209)
	at org.jpmml.sklearn.Main.run(Main.java:228)
	at org.jpmml.sklearn.Main.main(Main.java:148)
Caused by: org.jpmml.sklearn.ParseException: Encountered unexpected token: "if" "if"
    at line 1, column 76.

Was expecting one of:

    "!="
    "%"
    "*"
    "+"
    "-"
    "/"
    "<"
    "<="
    "=="
    ">"
    ">="
    "and"
    "or"
    <EOF>```

Support for bitwise logical operators

I'm writing a feature generation preprocessing step which checks if a value is close to being a round factor of 10, using the ExpressionTransformer

feature_5 = DataFrameMapper([
    (["Amount_Usd"],
     [Alias(ExpressionTransformer("1 if ((X[0] % 10.) <= 0.1) | ((X[0] % 10.) >= 9.9) else 0"), 
            name="feature_5", prefit=True)],)
])
feature_5

DataFrameMapper(drop_cols=[],
                features=[(['Amount_Usd'],
                           [Alias(name='feature_5', prefit=True,
                                  transformer=ExpressionTransformer(expr='1 if '
                                                                         '((X[0] '
                                                                         '% '
                                                                         '10.) '
                                                                         '<= '
                                                                         '0.1) '
                                                                         '| '
                                                                         '((X[0] '
                                                                         '% '
                                                                         '10.) '
                                                                         '>= '
                                                                         '9.9) '
                                                                         'else '
                                                                         '0'))])])

The expression is valid python, but there seems to be an issue with the translation of | to LogicalOr?

> SEVERE: Failed to convert PKL to PMML
org.jpmml.python.TokenMgrException: Lexical error at line 1, column 28.  Encountered: "|" (124), after : ""
	at org.jpmml.python.ExpressionTranslatorTokenManager.getNextToken(ExpressionTranslatorTokenManager.java:619)
	at org.jpmml.python.ExpressionTranslator.jj_scan_token(ExpressionTranslator.java:1967)
	at org.jpmml.python.ExpressionTranslator.jj_3R_TrailerFunctionInvocationExpression_745_9_59(ExpressionTranslator.java:1182)
	at org.jpmml.python.ExpressionTranslator.jj_3R_PrimaryExpression_634_58_44(ExpressionTranslator.java:1386)
	at org.jpmml.python.ExpressionTranslator.jj_3R_PrimaryExpression_634_58_35(ExpressionTranslator.java:1377)
	at org.jpmml.python.ExpressionTranslator.jj_3R_PrimaryExpression_634_17_28(ExpressionTranslator.java:1581)
	at org.jpmml.python.ExpressionTranslator.jj_3R_PrimaryExpression_623_9_26(ExpressionTranslator.java:1633)
	at org.jpmml.python.ExpressionTranslator.jj_3R_UnaryExpression_605_17_25(ExpressionTranslator.java:1651)
	at org.jpmml.python.ExpressionTranslator.jj_3R_UnaryExpression_600_9_20(ExpressionTranslator.java:1697)
	at org.jpmml.python.ExpressionTranslator.jj_3R_MultiplicativeExpression_587_9_15(ExpressionTranslator.java:1732)
	at org.jpmml.python.ExpressionTranslator.jj_3R_AdditiveExpression_563_9_12(ExpressionTranslator.java:1124)
	at org.jpmml.python.ExpressionTranslator.jj_3_1(ExpressionTranslator.java:1174)
	at org.jpmml.python.ExpressionTranslator.jj_2_1(ExpressionTranslator.java:1069)
	at org.jpmml.python.ExpressionTranslator.ComparisonExpression(ExpressionTranslator.java:398)
	at org.jpmml.python.ExpressionTranslator.NegationExpression(ExpressionTranslator.java:387)
	at org.jpmml.python.ExpressionTranslator.LogicalAndExpression(ExpressionTranslator.java:357)
	at org.jpmml.python.ExpressionTranslator.LogicalOrExpression(ExpressionTranslator.java:336)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:317)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:310)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:321)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:310)
	at org.jpmml.python.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:304)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:34)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:23)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:52)
	at sklearn2pmml.decoration.Alias.encodeFeatures(Alias.java:56)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
	at sklearn.Initializer.encodeFeatures(Initializer.java:48)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.pipeline.FeatureUnion.encodeFeatures(FeatureUnion.java:45)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:211)
	at org.jpmml.sklearn.Main.run(Main.java:226)
	at org.jpmml.sklearn.Main.main(Main.java:143)

Nested ifelse is not working in Expression Transformer

I am using nested ifelse in expression transformer. While generating PMML I am getting below error,

'Python Expression is either invalid or not supported'

Kindly suggest how can we use nested ifelse in expression transformer.

Function 'builtins.int' is not supported?

Hello Villu,

I am having problems with the sklearn2pmml conversion

Standard output is empty
Standard error:
Exception in thread "main" java.lang.IllegalArgumentException: Function 'builtins.int' is not supported
	at org.jpmml.python.FunctionUtil.encodePythonFunction(FunctionUtil.java:103)
	at org.jpmml.python.FunctionUtil.encodeFunction(FunctionUtil.java:72)
	at org.jpmml.python.ExpressionTranslator.translateFunction(ExpressionTranslator.java:186)
	at org.jpmml.python.ExpressionTranslator.FunctionInvocationExpression(ExpressionTranslator.java:849)
	at org.jpmml.python.ExpressionTranslator.PrimaryExpression(ExpressionTranslator.java:646)
	at org.jpmml.python.ExpressionTranslator.UnaryExpression(ExpressionTranslator.java:594)
	at org.jpmml.python.ExpressionTranslator.MultiplicativeExpression(ExpressionTranslator.java:539)
	at org.jpmml.python.ExpressionTranslator.AdditiveExpression(ExpressionTranslator.java:495)
	at org.jpmml.python.ExpressionTranslator.ComparisonExpression(ExpressionTranslator.java:435)
	at org.jpmml.python.ExpressionTranslator.NegationExpression(ExpressionTranslator.java:390)
	at org.jpmml.python.ExpressionTranslator.LogicalAndExpression(ExpressionTranslator.java:373)
	at org.jpmml.python.ExpressionTranslator.LogicalOrExpression(ExpressionTranslator.java:339)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:320)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:313)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:324)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:313)
	at org.jpmml.python.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:307)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:33)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:22)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:73)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn.pipeline.PipelineTransformer.encodeFeatures(PipelineTransformer.java:65)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.pipeline.FeatureUnion.encodeFeatures(FeatureUnion.java:45)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn_pandas.DataFrameMapper.encodeFeatures(DataFrameMapper.java:67)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:153)
	at com.sklearn2pmml.Main.run(Main.java:91)
	at com.sklearn2pmml.Main.main(Main.java:66)

It seems that the int in the following code is not being used correctly

def make_modify_date_pipeline():
    return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 and int(X[0][0:8]) < 20221230 else '2022-12-30'"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))

Of course, we've talked about this before, and you give tips for better CastTransformer

you should be using the good old CastTransformer instead.

I have upgraded to the latest sklearn2pmml version. What you mean is to change the sklearn version? (this will be an impossible operation, because I am working on the company's notebook and it is not allowed to change the sklearn version!).

Fetch, is there any other form to complete this operation? The reason why I write this is because I cannot compare str with int, so int is needed. If it is pure Python, I have many ways to solve it, but in pipeline, I don't know how to handle it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.