Comments (8)
Your Output
element contains seven OutputField
elements, which are expected to yield the distance of the data record to seven clusters 1
to 7
. Contrary to expectations, all OutputField
elements yield the distance to the one and the same cluster, which is the winning cluster 5
.
This behaviour boils down to the OutputField@feature
attribute. According to the PMML specification the value of this attribute can be specified as clusterAffinity
, entityAffinity
or simply affinity
. In JPMML-Evaluator interpretation of the PMML specification, the former two return the distance to the winning entity, and only the latter one returns the distance to the specified entity (please see the Java source code of class org.jpmml.evaluator.OutputUtil
, lines 245 to 255).
So, I suspect your clustering model specifies the value of OutputField@feature
attribute as clusterAffinity
, but it should be affinity
instead. Is your clustering model generated by R's "pmml" package perhaps? Also, could you paste its Output
element here so that we could analyze it together?
from jpmml-evaluator.
Thanks for clarification.
You are correct in both cases. Output specifies clusterAffinity and it was generated by R's pmml package.
<Output>
<OutputField name="predictedValue" feature="predictedValue"/>
<OutputField name="clusterAffinity_1" feature="clusterAffinity" value="1"/>
<OutputField name="clusterAffinity_2" feature="clusterAffinity" value="2"/>
<OutputField name="clusterAffinity_3" feature="clusterAffinity" value="3"/>
<OutputField name="clusterAffinity_4" feature="clusterAffinity" value="4"/>
<OutputField name="clusterAffinity_5" feature="clusterAffinity" value="5"/>
<OutputField name="clusterAffinity_6" feature="clusterAffinity" value="6"/>
<OutputField name="clusterAffinity_7" feature="clusterAffinity" value="7"/>
</Output>
Thanks again,
Juraj
from jpmml-evaluator.
If you read the description of clusterAffinity
and affinity
feature values in the PMML specification, then do you agree with JPMML-Evaluator's interpretation or not? Maybe JPMML-Evaluator is simply too strict here.
Obviously, it would be straightforward to add a "helper" routine which detects the presence of a non-null OutputField@value
attribute, and then treats the clusterAffinity
feature value synonymously to the affinity
feature value. This might be worth doing, given that there are probably plenty of clustering model PMML documents that are broken in this specific way, and that it's impossible to get anything fixed with the R's "pmml" package.
from jpmml-evaluator.
For documentation purposes, JPMML-Evaluator evaluates the clusterAffinity
feature according to the following excerpt of the PMML specification (schema version 4.2.1):
clusterAffinity
is the value of the distance or the similarity .. to the cluster center given in clusterId
.
In this example, the value of the clusterId
is 5
. Therefore, all seven OutputField
elements yield the distance to the fifth cluster, which is 39.95722874558617
.
from jpmml-evaluator.
"This specification supports only the distance to the cluster center given in clusterId
, NOT the distance to the nearest center."
I do not think that the definition means clusterId
that is calculated. I think that it is the clusterId
that suppose to be in the value
attribute. Cause if it was the calculated clusterId
, then the value would be the same as the distance to the nearest center. And specifications clearly says that it is not the case.
from jpmml-evaluator.
I would suggest, and probably it doesn't go against the specification; if the value
is specified, then it is the similarity/distance to that cluster. And when there is no value
then it is the similarity/distance to the nearest cluster.
Output, with a specific cluster Id and some other value (belonging to a different cluster) is at least very misleading.
clusterAffinity_1 = 39.95722874558617
clusterAffinity_2 = 39.95722874558617
clusterAffinity_3 = 39.95722874558617
...
Last week when I saw output like this, I thought that there is something wrong with my model and I went through the whole research with a magnifying glass to find an error. And it was a huge relief today when I found this.
from jpmml-evaluator.
I agree with your interpretation that it should be possible to override the default cluster with the user-specified cluster using the OutputField@value
attribute. Otherwise it would be impossible to extract detailed affinity information from clustering models.
The fix is part of the JPMML-Evaluator library version 1.2.6, which should be available via the Maven Central repository starting tomorrow.
from jpmml-evaluator.
Awesome. Thanks for a great product.
from jpmml-evaluator.
Related Issues (20)
- JPMML is enforcing the definition of target fields while the spec says it is optional HOT 6
- jpmml-evaluator does not handle null values when used in java? HOT 1
- Ability to run multiple JPMML-Evaluator versions in parallel (inside the same JVM) HOT 1
- NumberFormatException for evaluate "Random Survival Forest Model" generated by "SoftwareAG PMML Generator" HOT 3
- Compatibility with GraalVM HOT 12
- Not happy with XGBoost evaluation performance HOT 3
- How to load pre-transpiled PMML service provider JAR files? HOT 2
- Model verification fails for XGBoost models HOT 1
- No class def found for jakarta/xml/bind/JAXBContext HOT 3
- java.lang.IllegalArgumentException: Name cannot be empty HOT 1
- java.lang.NoClassDefFoundError: Could not initialize class org.jpmml.evaluator.FieldUtil HOT 4
- If the version 1.6.4 could work on java 8? HOT 1
- Field pmml(pred) is not defined. HOT 5
- DuplicatedFieldValueException after loading PMML in Java generated by Nyoka HOT 8
- 1.6.4 cannot run on java 8
- Exception: Required attribute `Segmentation@multipleModelMethod` is not defined HOT 3
- Updating Guava dependency to latest HOT 4
- why create jar file create error HOT 1
- LoadingModelEvaluatorBuilder not usable on Android due to "java.lang.NoClassDefFoundError: Failed resolution of: Ljava/awt/Image" HOT 2
- JavaError: org.jpmml.evaluator.UndefinedResultException: Undefined result HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jpmml-evaluator.