Giter VIP home page Giter VIP logo

Comments (8)

vruusmann avatar vruusmann commented on August 18, 2024

Your Output element contains seven OutputField elements, which are expected to yield the distance of the data record to seven clusters 1 to 7. Contrary to expectations, all OutputField elements yield the distance to the one and the same cluster, which is the winning cluster 5.

This behaviour boils down to the OutputField@feature attribute. According to the PMML specification the value of this attribute can be specified as clusterAffinity, entityAffinity or simply affinity. In JPMML-Evaluator interpretation of the PMML specification, the former two return the distance to the winning entity, and only the latter one returns the distance to the specified entity (please see the Java source code of class org.jpmml.evaluator.OutputUtil, lines 245 to 255).

So, I suspect your clustering model specifies the value of OutputField@feature attribute as clusterAffinity, but it should be affinity instead. Is your clustering model generated by R's "pmml" package perhaps? Also, could you paste its Output element here so that we could analyze it together?

from jpmml-evaluator.

sfr avatar sfr commented on August 18, 2024

Thanks for clarification.

You are correct in both cases. Output specifies clusterAffinity and it was generated by R's pmml package.

<Output>
    <OutputField name="predictedValue" feature="predictedValue"/>
    <OutputField name="clusterAffinity_1" feature="clusterAffinity" value="1"/>
    <OutputField name="clusterAffinity_2" feature="clusterAffinity" value="2"/>
    <OutputField name="clusterAffinity_3" feature="clusterAffinity" value="3"/>
    <OutputField name="clusterAffinity_4" feature="clusterAffinity" value="4"/>
    <OutputField name="clusterAffinity_5" feature="clusterAffinity" value="5"/>
    <OutputField name="clusterAffinity_6" feature="clusterAffinity" value="6"/>
    <OutputField name="clusterAffinity_7" feature="clusterAffinity" value="7"/>
</Output>

Thanks again,
Juraj

from jpmml-evaluator.

vruusmann avatar vruusmann commented on August 18, 2024

If you read the description of clusterAffinity and affinity feature values in the PMML specification, then do you agree with JPMML-Evaluator's interpretation or not? Maybe JPMML-Evaluator is simply too strict here.

Obviously, it would be straightforward to add a "helper" routine which detects the presence of a non-null OutputField@value attribute, and then treats the clusterAffinity feature value synonymously to the affinity feature value. This might be worth doing, given that there are probably plenty of clustering model PMML documents that are broken in this specific way, and that it's impossible to get anything fixed with the R's "pmml" package.

from jpmml-evaluator.

vruusmann avatar vruusmann commented on August 18, 2024

For documentation purposes, JPMML-Evaluator evaluates the clusterAffinity feature according to the following excerpt of the PMML specification (schema version 4.2.1):
clusterAffinity is the value of the distance or the similarity .. to the cluster center given in clusterId.

In this example, the value of the clusterId is 5. Therefore, all seven OutputField elements yield the distance to the fifth cluster, which is 39.95722874558617.

from jpmml-evaluator.

sfr avatar sfr commented on August 18, 2024

"This specification supports only the distance to the cluster center given in clusterId, NOT the distance to the nearest center."

I do not think that the definition means clusterId that is calculated. I think that it is the clusterId that suppose to be in the value attribute. Cause if it was the calculated clusterId, then the value would be the same as the distance to the nearest center. And specifications clearly says that it is not the case.

from jpmml-evaluator.

sfr avatar sfr commented on August 18, 2024

I would suggest, and probably it doesn't go against the specification; if the value is specified, then it is the similarity/distance to that cluster. And when there is no value then it is the similarity/distance to the nearest cluster.

Output, with a specific cluster Id and some other value (belonging to a different cluster) is at least very misleading.

clusterAffinity_1 = 39.95722874558617
clusterAffinity_2 = 39.95722874558617
clusterAffinity_3 = 39.95722874558617
...

Last week when I saw output like this, I thought that there is something wrong with my model and I went through the whole research with a magnifying glass to find an error. And it was a huge relief today when I found this.

from jpmml-evaluator.

vruusmann avatar vruusmann commented on August 18, 2024

I agree with your interpretation that it should be possible to override the default cluster with the user-specified cluster using the OutputField@value attribute. Otherwise it would be impossible to extract detailed affinity information from clustering models.

The fix is part of the JPMML-Evaluator library version 1.2.6, which should be available via the Maven Central repository starting tomorrow.

from jpmml-evaluator.

sfr avatar sfr commented on August 18, 2024

Awesome. Thanks for a great product.

from jpmml-evaluator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.