Giter VIP home page Giter VIP logo

jpmml-sparkml-xgboost's Introduction

Java API for producing and scoring models in Predictive Model Markup Language (PMML).

IMPORTANT

This is a legacy codebase.

Starting from March 2014, this project has been superseded by [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) projects.

Features

Class model

  • Full support for PMML 3.0, 3.1, 3.2, 4.0 and 4.1 schemas:
    • Class hierarchy.
    • Schema version annotations.
  • Fluent API:
    • Value constructors.
  • SAX Locator information
  • [Visitor pattern] (http://en.wikipedia.org/wiki/Visitor_pattern):
    • Validation agents.
    • Optimization and transformation agents.

Evaluation engine

Installation

JPMML library JAR files (together with accompanying Java source and Javadocs JAR files) are released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.0.22 (17 February, 2014).

Class model

<!-- Class model classes -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-model</artifactId>
	<version>${jpmml.version}</version>
</dependency>
<!-- Class model annotations -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-schema</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Evaluation engine

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Usage

Class model

The class model consists of two types of classes. There is a small number of manually crafted classes that are used for structuring the class hierarchy. They are permanently stored in the Java sources directory /pmml-model/src/main/java. Additionally, there is a much greater number of automatically generated classes that represent actual PMML elements. They can be found in the generated Java sources directory /pmml-model/target/generated-sources/xjc after a successful build operation.

All class model classes descend from class org.dmg.pmml.PMMLObject. Additional class hierarchy levels, if any, represent common behaviour and/or features. For example, all model classes descend from class org.dmg.pmml.Model.

There is not much documentation accompanying class model classes. The application developer should consult with the [PMML specification] (http://www.dmg.org/v4-1/GeneralStructure.html) about individual PMML elements and attributes.

Example applications

Evaluation engine

A model evaluator class can be instantiated directly when the contents of the PMML document is known:

PMML pmml = ...;

ModelEvaluator<TreeModel> modelEvaluator = new TreeModelEvaluator(pmml);

Otherwise, a PMML manager class should be instantiated first, which will inspect the contents of the PMML document and instantiate the right model evaluator class later:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);
 
ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

Model evaluator classes follow functional programming principles. Model evaluator instances are cheap enough to be created and discarded as needed (ie. not worth the pooling effort).

It is advisable for application code to work against the org.jpmml.evaluator.Evaluator interface:

Evaluator evaluator = (Evaluator)modelEvaluator;

An evaluator instance can be queried for the definition of active (ie. independent), predicted (ie. primary dependent) and output (ie. secondary dependent) fields:

List<FieldName> activeFields = evaluator.getActiveFields();
List<FieldName> predictedFields = evaluator.getPredictedFields();
List<FieldName> outputFields = evaluator.getOutputFields();

The PMML scoring operation must be invoked with valid arguments. Otherwise, the behaviour of the model evaluator class is unspecified.

The preparation of field values:

Map<FieldName, FieldValue> arguments = new LinkedHashMap<FieldName, FieldValue>();

List<FieldName> activeFields = evaluator.getActiveFields();
for(FieldName activeField : activeFields){
	// The raw (ie. user-supplied) value could be any Java primitive value
	Object rawValue = ...;

	// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue activeValue = evaluator.prepare(activeField, rawValue);

	arguments.put(activeField, activeValue);
}

The scoring:

Map<FieldName, ?> results = evaluator.evaluate(arguments);

Typically, a model has exactly one predicted field, which is called the target field:

FieldName targetName = evaluator.getTargetField();
Object targetValue = results.get(targetName);

The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	Object primitiveValue = computable.getResult();
}

The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;
	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
	Entity winner = entities.get(hasEntityId.getEntityId());

	// Test for "probability" result feature
	if(targetValue instanceof HasProbability){
		HasProbability hasProbability = (HasProbability)targetValue;
		Double winnerProbability = hasProbability.getProbability(winner.getId());
	}
}
Example applications

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-sparkml-xgboost's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

jpmml-sparkml-xgboost's Issues

LOAD PMML ERROR

hello, I'm using jpmml-xgboost-1.3.11, and I convert pipelineModel(contains XGBclassificationModel) to PMML succefully. But when I try to load to PMML using java, I got an ERROR.

I find that it is because the following part in PMML file:
image
There are four lines of OutputField, the error occured because of the last two lines(where dataType=Float).
when I delete the last two lines, I can load the PMML success.
And the same error occured when load PMML model with LGBclassificationModel, But it's OK when loading LR/GBDT/RF Model.
So how can I figure this error?Please help~

【pyspark-xgboost-model2pmml】

I run this code in pyspark:

pmmlBuilder=PMMLBuilder(sc,train_data,model)\
        .putOption(None,sc._jvm.org.jpmml.sparkml.model.HasTreeOptions.OPTION_COMPACT,True)\
        .verify(train_data.sample(False,0.01))
    pmmlBuilder.buildFile("./xgb.pmml")

get this error:

File "/hdata1/yarn/nm/usercache/ai_user/appcache/application_1619767076230_493407/container_e12_1619767076230_493407_01_000001/ANACONDA/pyspark_py36/lib/python3.6/site-packages/pyspark2pmml/__init__.py", line 27, in buildFile
    javaFile = self.javaPmmlBuilder.buildFile(javaFile)
  File "/hdata1/yarn/nm/usercache/ai_user/appcache/application_1619767076230_493407/container_e12_1619767076230_493407_01_000001/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/hdata1/yarn/nm/usercache/ai_user/appcache/application_1619767076230_493407/container_e12_1619767076230_493407_01_000001/pyspark.zip/pyspark/sql/utils.py", line 137, in deco
  File "<string>", line 3, in raise_from
pyspark.sql.utils.IllegalArgumentException: Transformer class ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel is not supported。

pyspark-3.0.0,
xgboost4j-1.0.0.jar,
xgboost4j-spark-1.0.0.jar,
jpmml-sparkml-1.6.4.jar,
jpmml-sparkml-executable-1.6.4.jar,
jpmml-sparkml-xgboost-1.0-SNAPSHOT.jar

Jpmml Predicts different than SparkML for XGBoost

When I use XGBoost jpmml it gives different result than spark xgboost model for the same record.

"org.apache.spark" %% "spark-core" % "2.2.0",
"org.apache.spark" %% "spark-sql" % "2.2.0",
"org.apache.spark" %% "spark-mllib" % "2.2.0" exclude("org.jpmml", "pmml-model"),
"ml.dmlc" % "xgboost4j" % "0.7",
"ml.dmlc" % "xgboost4j-spark" % "0.7",
"org.jpmml" % "jpmml-sparkml" % "1.3.2",
"org.jpmml" % "jpmml-sparkml-xgboost" % "1.0-SNAPSHOT",
"org.jpmml" % "jpmml-xgboost" % "1.2-SNAPSHOT",
"org.jpmml" % "pmml-evaluator" % "1.3.8"

import java.io.ByteArrayOutputStream
import java.util
import javax.xml.bind.JAXBException
import ml.dmlc.xgboost4j.scala.spark.{XGBoostClassificationModel, XGBoostEstimator}
import org.apache.spark.ml.estimator.veon.RareMerger
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.feature.{OneHotEncoder, SQLTransformer, StringIndexer, VectorAssembler}
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.ml.{Pipeline, PipelineStage}
import org.apache.spark.sql.SparkSession
import org.dmg.pmml.{FieldName, PMML}
import org.jpmml.evaluator.{CategoricalValue, ContinuousValue, FieldValue, ModelEvaluatorFactory}
import org.jpmml.model.MetroJAXBUtil
import org.jpmml.sparkml.ConverterUtil
import scala.collection.mutable

object JPMMLPipeline {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().master("local[*]").appName("Trusted Payments Mleap").config("spark.driver.memory","12g").getOrCreate()
    var train = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("Train.csv").repartition(8)
    var test = spark.read
      .format("csv") .option("header", "true")  header.option("inferSchema", "true").load("Test.csv").repartition(8)
    val categorical_columns = Array("List of Categories")
    var indexers = mutable.ListBuffer[PipelineStage]()
    categorical_columns foreach(categoryColumn =>  {
      val indexer = new StringIndexer()
        .setInputCol(categoryColumn )
        .setOutputCol(categoryColumn + "_INDEX")
        .setHandleInvalid("keep")
      indexers += indexer
    })
    var encoderColumns = mutable.ListBuffer[String]()
    var encoders = mutable.ListBuffer[PipelineStage]()
    categorical_columns foreach(categoryColumn =>  {
      val encoder = new OneHotEncoder()
        .setInputCol(categoryColumn+"_INDEX")
        .setOutputCol(categoryColumn + "_VEC")
        .setDropLast(false)
      encoderColumns += categoryColumn + "_VEC"
      encoders += encoder
    })
    val features = train.columns.filter(!_.contains("label")).filter(!categorical_columns.contains(_))
    val assembler = new VectorAssembler().setInputCols(features ++ encoderColumns).setOutputCol("features")

    val pipeline = new Pipeline().setStages((indexers ++ encoders ++ Array(assembler)).toArray)
    val trainModel = pipeline.fit(train)
    val paramMap = List(
      "eta" -> 0.1,
      "max_depth" -> 7,
      "objective" -> "binary:logistic",
      "num_round" ->10,
      "eval_metric" -> "auc",
      "nworkers" -> 8).toMap
    val xgboostEstimator = new XGBoostEstimator(paramMap)
    val pipelineXGBoost = new Pipeline().setStages(Array(trainModel, xgboostEstimator))
    val cvModel = pipelineXGBoost.fit(train)
    println(cvModel.stages(1).asInstanceOf[XGBoostClassificationModel].booster.getModelDump(null,true,"text")(0))
    val pmml = ConverterUtil.toPMML(train.schema, cvModel)
    val evalTest = test.limit(1)
    evalTest.show(1, true)
    val modelResult = cvModel.transform(evalTest)
    modelResult.select("probabilities", "prediction").show(1,false)
    val modelEvaluatorFactory = ModelEvaluatorFactory.newInstance
    val jpmmlEvaluator = modelEvaluatorFactory.newModelEvaluator(pmml)
    val arguments = new util.LinkedHashMap[FieldName, FieldValue]()

    val sparkToJpmmlMap :Map[org.apache.spark.sql.types.DataType, org.dmg.pmml.DataType]= Map(
      org.apache.spark.sql.types.IntegerType -> org.dmg.pmml.DataType.INTEGER,
      org.apache.spark.sql.types.DoubleType -> org.dmg.pmml.DataType.DOUBLE,
      org.apache.spark.sql.types.StringType -> org.dmg.pmml.DataType.STRING)
    val evalRecord = evalTest.take(1)(0)
    evalRecord.schema.zipWithIndex.foreach { case (field, i) =>
      if(categorical_columns.contains(field.name)){
        arguments.put(new FieldName(field.name), CategoricalValue.create(sparkToJpmmlMap(field.dataType), evalRecord.get(i)))
      }else{
        arguments.put(new FieldName(field.name), ContinuousValue.create(sparkToJpmmlMap(field.dataType), evalRecord.get(i)))
      }
    }
    val pmmlResults = jpmmlEvaluator.evaluate(arguments)
    println(pmmlResults)
  }
}

export pmml file has duplicate OutputField which casue loading pmml errors in spark.

tail of xgb.pmml content

            <Segment id="2">
                <True/>
                <RegressionModel functionName="classification" normalizationMethod="logit" x-mathContext="float">
                    <MiningSchema>
                        <MiningField name="label" usageType="target"/>
                        <MiningField name="xgbValue"/>
                    </MiningSchema>
                    <Output>
                        <OutputField name="pmml(prediction)" optype="categorical" dataType="double" feature="predictedValue" isFinalResult="false"/>
                        <OutputField name="prediction" optype="categorical" dataType="double" feature="transformedValue">
                            <MapValues outputColumn="data:output" dataType="double">
<FieldColumnPair field="pmml(prediction)" column="data:input"/>
<InlineTable>
    <row> 
        <data:input>0</data:input>
        <data:output>0</data:output>
    </row>
    <row> 
        <data:input>1</data:input>
        <data:output>1</data:output>
    </row>
</InlineTable>
                            </MapValues>
                        </OutputField>
                        <OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0"/>
                        <OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1"/>
                        <OutputField name="probability(0)" optype="continuous" dataType="float" feature="probability" value="0"/>
                        <OutputField name="probability(1)" optype="continuous" dataType="float" feature="probability" value="1"/>
                    </Output>
                    <RegressionTable intercept="0.0" targetCategory="1">
                        <NumericPredictor name="xgbValue" coefficient="1.0"/>
                    </RegressionTable>
                    <RegressionTable intercept="0.0" targetCategory="0"/>
                </RegressionModel>
            </Segment>

I found that the OutputField has duplicate values. if run, then throw errors.

rg.jpmml.evaluator.InvalidElementException: Element OutputField is not valid
	at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:61)
	at org.jpmml.evaluator.IndexableUtil.buildMap(IndexableUtil.java:50)
	at org.jpmml.evaluator.ModelEvaluator$8.load(ModelEvaluator.java:1200)
	at org.jpmml.evaluator.ModelEvaluator$8.load(ModelEvaluator.java:1196)
	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
	at com.google.common.cache.LocalCache.get(LocalCache.java:3952)
	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
	at org.jpmml.evaluator.CacheUtil.getValue(CacheUtil.java:51)
	at org.jpmml.evaluator.ModelEvaluator.<init>(ModelEvaluator.java:183)
	at org.jpmml.evaluator.regression.RegressionModelEvaluator.<init>(RegressionModelEvaluator.java:64)
	at org.jpmml.evaluator.ModelEvaluatorFactory.createModelEvaluator(ModelEvaluatorFactory.java:118)

It must some error in the procedure of exporting pmml. It's really strange.

xgboost model pmml file only have one tree.

hi vruusmann:
First of all, thanks for your great job!
Recently, i'm trying to save pmml style xgboost-on-spark model with this plugin, but i found i can only get one tree's description in the pmml file. And the result which loading such file will have worse metrics behavior.
Am i missing something?
Best wishes!

support for spark 3.x

Hi,
I found jpmml-sparkml-xgboost only support spark 2.x, I'm not sure whether jpmml has already supported xgboost in spark 3.x without importing jpmml-sparkml-xgboost?

I tried to rebuild jpmml-sparkml-xgboost by modifing pom.xml to latest version as below:

                 <dependency>
			<groupId>ml.dmlc</groupId>
			<artifactId>xgboost4j-spark_2.12</artifactId>
			<version>1.5.2</version>
			<scope>provided</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.12</artifactId>
			<version>3.1.1</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-mllib_2.12</artifactId>
			<version>3.1.1</version>
		</dependency>
		<dependency>
			<groupId>org.jpmml</groupId>
			<artifactId>jpmml-sparkml</artifactId>
			<version>1.7.3</version>
			<scope>provided</scope>
		</dependency>
		<dependency>
			<groupId>org.jpmml</groupId>
			<artifactId>pmml-xgboost</artifactId>
			<version>1.6.3</version>
			<scope>provided</scope>
		</dependency>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.12</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.jpmml</groupId>
			<artifactId>pmml-evaluator</artifactId>
			<version>1.6.3</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.jpmml</groupId>
			<artifactId>pmml-evaluator-test</artifactId>
			<version>1.4.11</version>
			<scope>test</scope>
		</dependency>

And I got errors when I built project:

[ERROR] /Users/lcx/Desktop/project/jpmml-sparkml-xgboost/src/main/java/org/jpmml/sparkml/xgboost/XGBoostRegressionModelConverter.java:[28,8]
org.jpmml.sparkml.xgboost.XGBoostRegressionModelConverter is not abstract and does not override abstract method getNativeConfiguration() in org.jpmml.converter.HasNativeConfiguration.

[ERROR] /Users/lcx/Desktop/project/jpmml-sparkml-xgboost/src/main/java/org/jpmml/sparkml/xgboost/XGBoostClassificationModelConverter.java:[33,8]
org.jpmml.sparkml.xgboost.XGBoostClassificationModelConverter is not abstract and does not override abstract method getNativeConfiguration() in org.jpmml.converter.HasNativeConfiguration.

How to solve this problem

Exception in thread "main" java.lang.IllegalArgumentException: Transformer class ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel is not supported
	at org.jpmml.sparkml.ConverterFactory.newConverter(ConverterFactory.java:58)
	at org.jpmml.sparkml.PMMLBuilder.build(PMMLBuilder.java:105)
	at org.jpmml.sparkml.PMMLBuilder.buildByteArray(PMMLBuilder.java:248)
	at org.jpmml.sparkml.PMMLBuilder.buildByteArray(PMMLBuilder.java:244)
	at com.pactera.scala.MyXgBx$.main(MyXgBx.scala:37)

Support for `missing` attribute

Hi vruusmann,

Sorry to disturb again, i've been headache for the inconsistent problem about several months. after i checked the doc of xgboost4j, i see after version 0.9, they've made some fixes about the missing value problem. so i upgraded xgboost4j-spark to 1.2.0 with spark 3. but now i still get the inconsistent problem.

image

you can see i only have one categorical feature hour which doesn't contain missing values, but if i remove categorical feature and use only numeric features, then the predict is consistent.

do you have any clues?

XGBoostClassifier to PMML failed

  • hi vr, model here:

 val XGBooster = new XGBoostClassifier(
      Map("eta" -> 0.05f,
        "max_depth" -> 8,
        "objective" -> "binary:logistic",
        "eval_metric" -> "auc",
        "baseScore" -> 0.6,
        "maxBin" -> 16,
        "num_round" -> 5,   //500!!!!!
        "num_workers" -> 100
      )
    ).setFeaturesCol("features").setLabelCol("label")

    val pipeline = new Pipeline().setStages(Array(featureAssembler, XGBooster))
    val xgbmodel = pipeline.fit(train)

  1. : using the README
    val pmmlBytes = ConverterUtil.toPMMLByteArray(train.schema, xgbmodel)
scala>     val pmmlBytes = ConverterUtil.toPMMLByteArray(train.schema, xgbmodel)
java.lang.UnsupportedOperationException: Replace "org.jpmml.sparkml.ConverterUtil.toPMMLByteArray(schema, pipelineModel)" with "new org.jpmml.sparkml.PMMLBuilder(schema, pipelineModel).buildByteArray()"
  at org.jpmml.sparkml.ConverterUtil.toPMMLByteArray(ConverterUtil.java:51)
  at org.jpmml.sparkml.ConverterUtil.toPMMLByteArray(ConverterUtil.java:46)
  ... 75 elided
  1. from above,change:
    val pmml = new PMMLBuilder(train.schema, xgbmodel).buildByteArray()
 scala>     val pmml = new PMMLBuilder(train.schema, xgbmodel).buildByteArray()
java.lang.IllegalArgumentException: Transformer class ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel is not supported
  at org.jpmml.sparkml.ConverterFactory.newConverter(ConverterFactory.java:58)
  at org.jpmml.sparkml.PMMLBuilder.build(PMMLBuilder.java:105)
  at org.jpmml.sparkml.PMMLBuilder.buildByteArray(PMMLBuilder.java:249)
  at org.jpmml.sparkml.PMMLBuilder.buildByteArray(PMMLBuilder.java:245)
  ... 75 elided

confusing.....

Caused by: java.lang.ClassNotFoundException: ml.dmlc.xgboost4j.scala.ObjectiveTrait

i want use xgboost generate pmml in IDEA.

env:

spark 2.3.3
xgboost-spark-0.90
pmml-evaluator 1.5.15
jpmml-xgboost 1.3.15
jpmml-sparkml 1.4.18

i have read the other same question, the same question ,but i have added xgboost-spark-0.90 in my maven lib.

var mlmodel = new XGBoostClassifier(Map("eta" -> 0.05f,
        "max_depth" -> 8,
        "objective" -> "binary:logistic",
        "eval_metric" -> "auc",
        "baseScore" -> 0.6,
        "maxBin" -> 16,
        "num_round" -> 5,   //500!!!!!
        "num_workers" -> 100
      )).setFeaturesCol("features").setLabelCol("label")
mlmodel = mlmodel.set(mlmodel.numRound, 11)

Run get error

Exception in thread "main" java.lang.NoClassDefFoundError: ml/dmlc/xgboost4j/scala/ObjectiveTrait
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.privateGetPublicMethods(Class.java:2902)
	at java.lang.Class.getMethods(Class.java:1615)
	at org.apache.spark.ml.param.Params$class.params(params.scala:675)
	at org.apache.spark.ml.PipelineStage.params$lzycompute(Pipeline.scala:42)
	at org.apache.spark.ml.PipelineStage.params(Pipeline.scala:42)
	at ml.dmlc.xgboost4j.scala.spark.params.ParamMapFuncs$$anonfun$XGBoostToMLlibParams$2.apply(GeneralParams.scala:254)
	at ml.dmlc.xgboost4j.scala.spark.params.ParamMapFuncs$$anonfun$XGBoostToMLlibParams$2.apply(GeneralParams.scala:245)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
	at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
	at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
	at ml.dmlc.xgboost4j.scala.spark.params.ParamMapFuncs$class.XGBoostToMLlibParams(GeneralParams.scala:245)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.XGBoostToMLlibParams(XGBoostClassifier.scala:44)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.<init>(XGBoostClassifier.scala:57)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.<init>(XGBoostClassifier.scala:54)
	at OnlineService.spark.ModelBuild$.getProcessTransformer(ModelBuild.scala:68)
	at OnlineService.spark.Model2PMML$.run(Model2PMML.scala:136)
	at OnlineService.spark.Model2PMML$.main(Model2PMML.scala:25)
	at OnlineService.spark.Model2PMML.main(Model2PMML.scala)
Caused by: java.lang.ClassNotFoundException: ml.dmlc.xgboost4j.scala.ObjectiveTrait
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 21 more

hope your reply, thank you very much.

Exception in thread "main" java.lang.InstantiationError: org.dmg.pmml.tree.Node

I want to save the pipeline model trained by xgboost4j-spark as PMML by using jpmml-sparkml-xgboost, and report the following error, may I ask why

Exception in thread "main" java.lang.InstantiationError: org.dmg.pmml.tree.Node
	at org.jpmml.xgboost.RegTree.encodeTreeModel(RegTree.java:92)
	at org.jpmml.xgboost.ObjFunction.createMiningModel(ObjFunction.java:68)
	at org.jpmml.xgboost.MultinomialLogisticRegression.encodeMiningModel(MultinomialLogisticRegression.java:55)
	at org.jpmml.xgboost.GBTree.encodeMiningModel(GBTree.java:77)
	at org.jpmml.xgboost.Learner.encodeMiningModel(Learner.java:151)
	at org.jpmml.sparkml.xgboost.BoosterUtil.encodeBooster(BoosterUtil.java:67)
	at org.jpmml.sparkml.xgboost.XGBoostClassificationModelConverter.encodeModel(XGBoostClassificationModelConverter.java:22)
	at org.jpmml.sparkml.xgboost.XGBoostClassificationModelConverter.encodeModel(XGBoostClassificationModelConverter.java:10)
	at org.jpmml.sparkml.ModelConverter.registerModel(ModelConverter.java:172)
	at org.jpmml.sparkml.PMMLBuilder.build(PMMLBuilder.java:116)
	at example.XGboostTest$.main(XGboostTest.scala:106)
	at example.XGboostTest.main(XGboostTest.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:892)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

java.lang.IllegalArgumentException: Transformer class ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel is not supported

I ran codes in spark-yarn online,but got the following error message:

Exception in thread "main" java.lang.IllegalArgumentException: Transformer class ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel is not supported
    at org.jpmml.sparkml.ConverterFactory.newConverter(ConverterFactory.java:53)
    at org.jpmml.sparkml.PMMLBuilder.build(PMMLBuilder.java:109)
    at org.jpmml.sparkml.PMMLBuilder.buildByteArray(PMMLBuilder.java:278)
    at org.jpmml.sparkml.PMMLBuilder.buildByteArray(PMMLBuilder.java:274)
    ....

Codes for buliding pmml as follows:

 val pipelineModel = pipeline.fit(trainDF)
 val pmmlBytes = new PMMLBuilder(trainDF.schema, pipelineModel).buildByteArray()

I used Spark 2.2.0, scala 2.11.8, jpmml-sparkml 1.4.5, xgboost4j-spark 0.72.

compile error

hi, i tried to compile with this rep but this error:

...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:08 h
[INFO] Finished at: 2020-01-19T12:30:21+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project jpmml-sparkml-xgboost: Could not resolve dependencies for project org.jpmml:jpmml-sparkml-xgboost:jar:1.0-SNAPSHOT: The following artifacts could not be resolved: ml.dmlc:xgboost4j-spark:jar:0.72, ml.dmlc:xgboost4j:jar:0.72, com.typesafe.akka:akka-actor_2.11:jar:2.3.11, org.scala-lang:scala-compiler:jar:2.11.8: Could not transfer artifact ml.dmlc:xgboost4j-spark:jar:0.72 from/to central (https://repo.maven.apache.org/maven2): GET request of: ml/dmlc/xgboost4j-spark/0.72/xgboost4j-spark-0.72.jar from central failed: Read timed out -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

the error seems the dependency resolve problem, i add following dependency in pom.xml

<dependency>
  <groupId>ml.dmlc</groupId>
  <artifactId>xgboost4j-spark</artifactId>
  <version>latest_version_num</version>
</dependency>

but still got this error:

[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /Users/mvnpackages/jpmml-sparkml-xgboost/src/main/java/org/jpmml/sparkml/xgboost/HasXGBoostOptions.java:[21,25] 找不到符号
  符号:   类 HasOptions
  位置: 程序包 org.jpmml.sparkml
[ERROR] /Users/mvnpackages/jpmml-sparkml-xgboost/src/main/java/org/jpmml/sparkml/xgboost/HasXGBoostOptions.java:[23,44] 找不到符号
  符号: 类 HasOptions
[INFO] 2 errors 
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  1.680 s
[INFO] Finished at: 2020-01-19T13:26:29+08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project jpmml-sparkml-xgboost: Compilation failure: Compilation failure: 
[ERROR] /Users/mvnpackages/jpmml-sparkml-xgboost/src/main/java/org/jpmml/sparkml/xgboost/HasXGBoostOptions.java:[21,25] 找不到符号
[ERROR]   符号:   类 HasOptions
[ERROR]   位置: 程序包 org.jpmml.sparkml
[ERROR] /Users/mvnpackages/jpmml-sparkml-xgboost/src/main/java/org/jpmml/sparkml/xgboost/HasXGBoostOptions.java:[23,44] 找不到符号
[ERROR]   符号: 类 HasOptions
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

How to fix this?

XGBoostClassificationModel with continuous label may fail to encode

Hi,

I'm trying to convert a tree-label XGBoostClassificationModel to PMML.

jpmml-sparkml-xgboost: 0feb4d4e45e94cae868e108015f85bf64d2dea0a
jpmml-sparkml: 1.2.12
jpmml-xgboost: 1.3.2
jpmml-converter: 1.3.2
xgboost4j-spark: 0.7

The string label works well, but the double label meets the exception that the label.size < 3 throws an illegalArgumentException in jpmml.converter.mining.MiningModelUtils

Then I find the bug in jpmml.sparkml.ModelConverter, that for continuous label, the XGBoostClassificationModel can not detect the numClasses, I add the code below, and it works well, avoid the exception above.

if (model instanceof XGBoostClassificationModel) {
    XGBoostClassificationModel xgbModel = (XGBoostClassificationModel) model;
    numClasses = xgbModel.numClasses();
}

Hope it helps.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.