Comments (6)
I just pushed commit bc19ebc that affects class TreeModelEvaluator
.
The return type of method TreeModelEvaluator#evaluate(ModelEvaluationContext)
depends on the function type:
- Classification-type models return an instance of
NodeClassificationMap
. If you analyze the methodNodeClassificationMap#getResult()
, then it is easy to see that a non-nullscore
attribute takes priority over thevalue
attribute of the highest-probabilityScoreDistribution
element. - Regression-type models return an instance of
NodeScore
. The methodNodeScore#getResult()
returns the value of thescore
attribute (after it has been converted todouble
data type and post-processed as specified by theTarget
element).
The score
attribute is required. The PMML specification says the following: "it is not possible that the scoring process ends in a Node which does not have a score attribute".
Regarding the multipleModelMethod="average"
issue - are you working with a classification- or regression-type tree ensemble? Is it possible to attach a sample PMML file?
from jpmml-evaluator.
I've also left a comment over at https://groups.google.com/d/msg/jpmml/Du0QMIYyvko/BAq8n9rBgK4J concerning a separate but related question.
Please see attached model and test file at https://groups.google.com/d/msg/jpmml/Du0QMIYyvko/-bnXhyYblFUJ
I'm working with a classification-type tree ensemble. Regression-type tree ensembles do use score
with multipleModelMethod="average"
as appropriate.
I see the line you've quoted from the PMML spec and can only conclude that the spec is somewhat internally inconsistent. It does claim that the score
attribute is required at final Nodes, but also that ScoreDistribution is used to choose the predictedValue
if and only if the score
attribute is not provided.
from jpmml-evaluator.
Thank you for the extra input.
As you probably noticed, classification-type ensemble models perform aggregation using the org.jpmml.evaluator.HasProbability
interface. During aggregation, there is no distinction between "winner" and "loser" class labels, so the score
attribute can be safely ignored.
When speaking about classification-type tree models, then it is safe to say that a PMML document is inconsistent if the value of the score
attribute does not have a matching ScoreDistribution element. This inconsistency can be discovered using static analysis. In my opinion, it would be too wasteful to perform consistency checks on every NodeClassificationMap
instance in runtime.
Static analyzers can be implemented using the Visitor design pattern. Simply create a subclass of org.jpmml.evaluator.visitors.FeatureInspector
and apply it to your PMML class model object right after it is unmarshalled from the PMML document.
As for the quality of the PMML specification, then it is good/unambiguous enough 99% of time. The remaining 1% represents various edge- and corner cases that surface only when the spec is implemented in the actual application code. I hope that the JPMML implementation of such cases agrees with proprietary implementations.
from jpmml-evaluator.
Thank you for your responses, Villu. I am satisfied with this explanation and resolution. I will modify my code to handle the required score
attributes.
from jpmml-evaluator.
Actually, class NodeClassificationMap
needs some modification, because it does not implement the method org.jpmml.evaluator.CategoricalResultFeature#getCategoryValues()
correctly when there are no ScoreDistribution elements available.
The correct behaviour would be to return a singleton set that contains the score
attribute value.
This fix is in the works. I will push it to the repository later in the evening.
from jpmml-evaluator.
Commit d0b1dd8 makes sure that class NodeClassificationMap
implements the interface HasProbability
(and its superinterface CategoricalResultFeature
) correctly in a situation where the Node element does not have any ScoreDistribution child elements.
from jpmml-evaluator.
Related Issues (20)
- API for Shapley value estimation HOT 2
- How to work with an association rules model (`AssociationModel` element)? HOT 8
- JPMML is enforcing the definition of target fields while the spec says it is optional HOT 6
- jpmml-evaluator does not handle null values when used in java? HOT 1
- Ability to run multiple JPMML-Evaluator versions in parallel (inside the same JVM) HOT 1
- NumberFormatException for evaluate "Random Survival Forest Model" generated by "SoftwareAG PMML Generator" HOT 3
- Compatibility with GraalVM HOT 12
- Not happy with XGBoost evaluation performance HOT 3
- How to load pre-transpiled PMML service provider JAR files? HOT 2
- Model verification fails for XGBoost models HOT 1
- No class def found for jakarta/xml/bind/JAXBContext HOT 3
- java.lang.IllegalArgumentException: Name cannot be empty HOT 1
- java.lang.NoClassDefFoundError: Could not initialize class org.jpmml.evaluator.FieldUtil HOT 4
- If the version 1.6.4 could work on java 8? HOT 1
- Field pmml(pred) is not defined. HOT 5
- DuplicatedFieldValueException after loading PMML in Java generated by Nyoka HOT 8
- 1.6.4 cannot run on java 8
- Exception: Required attribute `Segmentation@multipleModelMethod` is not defined HOT 3
- Updating Guava dependency to latest HOT 4
- why create jar file create error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jpmml-evaluator.