Giter VIP home page Giter VIP logo

Comments (8)

vruusmann avatar vruusmann commented on August 18, 2024

Just look at your own data!

SkLearn2PMML/JPMML-SkLearn produces a LightGBM model that has the following schema:

  1. Sole target field y
  2. Two probability-type output fields probability(0) and probability(1)

Does your misbehaving model.pmml look like the above?

from jpmml-evaluator.

vruusmann avatar vruusmann commented on August 18, 2024

Closing as invalid - the user is attempting to evaluate invalid PMML documents (generated by N****, not SkLearn2PMML).

from jpmml-evaluator.

mbicanic avatar mbicanic commented on August 18, 2024

@vruusmann I used to use Nyoka, but I had other issues with it. I guarantee that this particular model.pmml was generated with sklearn2pmml. I really don't understand why the hostility and the certainty I used nyoka? Here is the first few lines of the generated PMML, directly copy-pasted:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
	<Header>
		<Application name="SkLearn2PMML package" version="0.91.0"/>
		<Timestamp>2023-03-31T09:57:19Z</Timestamp>
	</Header>
	<MiningBuildTask>
		<Extension name="repr">PMMLPipeline(steps=[('classifier', LGBMClassifier(class_weight={0: 0.05, 1: 0.95},
               learning_rate=0.07168998753077896, max_depth=18,
               min_data_in_leaf=418, n_estimators=364, num_leaves=58,
               objective='binary', reg_alpha=0.07558439164814572,
               reg_lambda=0.05483594313753313))])</Extension>
	</MiningBuildTask>

It explicitly says SkLearn2PMML package, so I am very confused how you got to the conclusion I used nyoka? Please, undo the change of the issue title, because it is dishonest. I wouldn't come here with this question if I generated the PMML with nyoka, as I am well aware there could be incompatibilities between them.

And once we agree that the PMML file has indeed been generated with sklearn2pmml, I would greatly appreciate an explanation or at least a helping hint regarding the duplication of OutputFields in the loaded model.

from jpmml-evaluator.

vruusmann avatar vruusmann commented on August 18, 2024

I guarantee that this particular model.pmml was generated with sklearn2pmml.

Your DuplicatedFieldValueException was raised when scoring a Nyoka-produced PMML document. The JPMML-Evaluator library is not renaming existing OutputField elements, and is not inventing new ones.

That's a hard fact. No point in arguing - open your model.pmml in text editor, and take a look into it.

I really don't understand why the hostility and the certainty I used nyoka?

Because Nyoka is generating invalid/irreproducible PMML documents, and then it is me who has to prove over and over again that JPMML software is correct.

I wouldn't come here with this question if I generated the PMML with nyoka

Please attach your model.pmml here (or send it to my e-mail), so that we can resolve this issue based on factual matters.

from jpmml-evaluator.

vruusmann avatar vruusmann commented on August 18, 2024

org.jpmml.evaluator.DuplicateFieldValueException: The value for field "probability_0" has already been defined

All JPMML conversion libraries name probability fields using a probability(<category>) pattern. Nyoka (and related stuff) uses a probability_<category> pattern.

Now, seeing that the duplicate output fields is called probability_0, which statement is likely the correct one?

  1. The PMML document was generated by SkLearn2PMML (based on JPMML-SkLearn)
  2. The PMML document was generated by Nyoka.

from jpmml-evaluator.

mbicanic avatar mbicanic commented on August 18, 2024

@vruusmann
I apologize, it was indeed my mistake. As I said, I used nyoka before, and had to migrate to sklearn2pmml and pmml-evaluator due to other issues.

The problem was that I am using MLflow to register models. I have a script that trains a model, saves it to PMML, and then registers the model together with the model.pmml artifact. Ever since I modified the script to use sklearn2pmml instead of nyoka, the connection to MLflow wasn't working properly, so even though the local PMML file created by the training script was indeed generated by sklearn2pmml, the "latest" MLflow model I was fetching in Java was still the one relying on a nyoka PMML.

Once again, I apologize for wasting your time and insisting I was correct, I was completely unaware of this problem. Nevertheless, I appreciate that you in the end explained why and how you know the file was Nyoka-generated - it was very helpful. Thank you for your time and effort!

from jpmml-evaluator.

vruusmann avatar vruusmann commented on August 18, 2024

Apology accepted!

The problem was that I am using MLflow to register models.

Do you have this MLflow integration project available somewhere? Pure Java, or Java-wrapped-into-Python?

I've meant to provide such integration myself, but haven't started yet.

from jpmml-evaluator.

mbicanic avatar mbicanic commented on August 18, 2024

Unfortunately, the project is not available publicly as it's a company project. However, it's not really an integration in the strict sense of the word, it's more of a bypass. I am normally registering the model as a Python sklearn model, and then additionally logging the PMML file as an artifact:

from lightgbm import LGBMClassifier
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline

def train_model(model: LGBMClassifier, params: dict, X_train: pd.DataFrame, Y_train: np.ndarray):
    pipe = PMMLPipeline([('classifier', LGBMClassifier(**params))])
    pipe.fit(X_train, Y_train)
    
    X_sample = X_train.sample(n=100, random_state=42)
    pipe.verify(X_sample)
    sklearn2pmml(pipe, "model.pmml", with_repr=True)
    
    return pipe['classifier']
    
def log_model(model: LGBMClassifier, X_data: pd.DataFrame):
    mlflow.sklearn.log_model(
        sk_model: model,
        artifact_path: "",
        registered_model_name: MODEL_NAME,
        signature: mlflow.models.signature.infer_signature(X_data)
    )
    mlflow.log_artifact("model.pmml")  # referring to the local file generated in train_model
    
X_train, Y_train = load_dataset(...)
model = train_model(LGBMClassifier(), X_train, Y_train)
log_model(model, X_train)

And then in Java, instead of loading the model, I just load the PMML artifact as a file and initialize the Evaluator class with it:

    private Evaluator loadModel(String modelName) throws Exception {
        try (MlflowClient client = new MlflowClient(MLFLOW_URI)) {
            ModelRegistry.ModelVersion version = client.getRegisteredModel(modelName).getLatestVersions(0);
            File artifactDir = client.downloadArtifacts(version.getRunId());
            File[] files = artifactDir.listFiles(f -> f.getName().equals("model.pmml"));
            Evaluator evaluator = new LoadingModelEvaluatorBuilder().load(files[0]).build();
            evaluator.verify();
            return evaluator;
        }
    }

It's a pretty simple process, all things considered, and surprisingly easy to use Python models in Java this way, while also leveraging MLflow.

from jpmml-evaluator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.