Giter VIP home page Giter VIP logo

Comments (5)

vruusmann avatar vruusmann commented on July 18, 2024

if I have a matrix of features in which the features names contains some particular characters (such as &) the package throws an exception connected to RExpParser.

Can you paste the full stack trace of this exception here?

Better yet, can you provide a reproducible example (a toy dataset and an R script) that I could play with?

from jpmml-r.

svazzole avatar svazzole commented on July 18, 2024

Here you have the output of the command.
As soon as possible I will give you the precise example.

D:\jpmml-r-master>java -Xms4G -Xmx16G -jar target/converter-executable-1.2-SNAPSHOT.jar --rds-input LibSVMAnomalyFormulaReq.rds --pmml-output model.pmml
set 19, 2017 4:59:39 PM org.jpmml.rexp.Main run
INFORMAZIONI: Parsing RDS..
Exception in thread "main" java.lang.StackOverflowError
        at java.io.DataInputStream.readInt(Unknown Source)
        at org.jpmml.rexp.XDRInput.readInt(XDRInput.java:62)
        at org.jpmml.rexp.RExpParser.readInt(RExpParser.java:481)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:67)
        at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
        at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
        at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
        at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
        at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
        at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
        at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
        at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
        at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
        at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
        at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)

from jpmml-r.

vruusmann avatar vruusmann commented on July 18, 2024

Very interesting - the RDS parser component appears to go into infinite loop.

A reproducible example would be much appreciated. Can you share your LibSVMAnomalyFormulaReq.rds RDS file, which is very nicely broken?

In your R script, can you temporarily work around this issue by escaping variable names? For example, try surrounding them with backticks as suggested here:
https://stackoverflow.com/questions/3574385/can-i-escape-characters-in-variable-names

from jpmml-r.

svazzole avatar svazzole commented on July 18, 2024

Ok, I will try to explain myself better.
Unfortunately I cannot send you the data (for privacy reasons).
I will try to build a toy model with the same errors.
What I can tell you is that the names of the features contains 4-grams of apache logs (so something like "GET ", "ET /", "T /g" and so on...).
I'm trying to do anomaly detection on the requests so I'm building a One-Class SVM (both in R and Python).
When I use Python there are no problems with the variable names while in R I had to use the following trick: I changed all the variables names to "X1X", "X2X", "X3X" and so on. This fixed the problem and the jpmml-r package performed correctly the conversion rds --> pmml. Then I changed again the variable names in the pmml file taking into account that "&" --> "&". This created the correct model and the results agreed with the Python one.
Here I have another question: I'm trying to use the pmmls created inside a scala program. While the results from R and Python agrees (as I said before), the results from the scala One-Class SVM model are quite different? Have you any ideas about this? Could this be an issue with scala (i'm thinking about machine precision) or something with the One-Class SVM (and libsvm)?
Thanks for your time.
Best,
Simon

from jpmml-r.

vruusmann avatar vruusmann commented on July 18, 2024

The PMML standard (and the JPMML implementation of it) does not have a concept of reserved symbols/keywords. For example, the string & would be a perfectly acceptable field name. There is no need of escaping it as \& or & - honey badger don't care.

The problem is specific to the R platform, because R has the concept of reserved symbols/keywords. The problem would probably be resolved by escaping variable names properly - did you try using backticks as suggested above? It is no wonder that the RDS parser gets confused when the RDS model file contains incorrect RDS strings. Sure, it would be nice if the RDS parser would be able to detect and recover in such a situation, but you as an R end user can prevent this situation from happening in the first place.

from jpmml-r.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.