jpmml / jpmml-cascading Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 2.0 81 KB

PMML evaluator library for the Cascading application framework (http://www.cascading.org/)

License: GNU Affero General Public License v3.0

Java 100.00%

jpmml-cascading's Introduction

Java API for producing and scoring models in Predictive Model Markup Language (PMML).

IMPORTANT

This is a legacy codebase.

Starting from March 2014, this project has been superseded by [JPMML-Model] (https://github.com/jpmml/jpmml-model) and [JPMML-Evaluator] (https://github.com/jpmml/jpmml-evaluator) projects.

Features

Class model

Full support for PMML 3.0, 3.1, 3.2, 4.0 and 4.1 schemas:
- Class hierarchy.
- Schema version annotations.
Fluent API:
- Value constructors.
SAX Locator information
[Visitor pattern] (http://en.wikipedia.org/wiki/Visitor_pattern):
- Validation agents.
- Optimization and transformation agents.

Evaluation engine

Full support for [DataDictionary] (http://www.dmg.org/v4-1/DataDictionary.html) and [MiningSchema] (http://www.dmg.org/v4-1/MiningSchema.html) elements:
- Complete data type system.
- Complete operational type system. For example, continuous integers, categorical integers and ordinal integers are handled differently in equality check and comparison operations.
- Detection and treatment of outlier, missing and invalid values.
Full support for [transformations] (http://www.dmg.org/v4-1/Transformations.html) and [functions] (http://www.dmg.org/v4-1/Functions.html):
- Built-in functions.
- User defined functions (PMML, Java).
Full support for [Targets] (http://www.dmg.org/v4-1/Targets.html) and [Output] (http://www.dmg.org/v4-1/Output.html) elements.
Fully supported model elements:
- [Association rules] (http://www.dmg.org/v4-1/AssociationRules.html)
- [Cluster model] (http://www.dmg.org/v4-1/ClusteringModel.html)
- [General regression] (http://www.dmg.org/v4-1/GeneralRegression.html)
- [Naive Bayes] (http://www.dmg.org/v4-1/NaiveBayes.html)
- [k-Nearest neighbors] (http://www.dmg.org/v4-1/KNN.html)
- [Neural network] (http://www.dmg.org/v4-1/NeuralNetwork.html)
- [Regression] (http://www.dmg.org/v4-1/Regression.html)
- [Rule set] (http://www.dmg.org/v4-1/RuleSet.html)
- [Scorecard] (http://www.dmg.org/v4-1/Scorecard.html)
- [Support Vector Machine] (http://www.dmg.org/v4-1/SupportVectorMachine.html)
- [Tree model] (http://www.dmg.org/v4-1/TreeModel.html)
- [Ensemble model] (http://www.dmg.org/v4-1/MultipleModels.html)
Fully interoperable with popular open source software:
- [R] (http://www.r-project.org/) and [Rattle] (http://rattle.togaware.com/)
- [KNIME] (http://www.knime.org/)
- [RapidMiner] (http://rapid-i.com/content/view/181/190/)

Installation

JPMML library JAR files (together with accompanying Java source and Javadocs JAR files) are released via [Maven Central Repository] (http://repo1.maven.org/maven2/org/jpmml/). Please join the [JPMML mailing list] (https://groups.google.com/forum/#!forum/jpmml) for release announcements.

The current version is 1.0.22 (17 February, 2014).

Class model

<!-- Class model classes -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-model</artifactId>
	<version>${jpmml.version}</version>
</dependency>
<!-- Class model annotations -->
<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-schema</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Evaluation engine

<dependency>
	<groupId>org.jpmml</groupId>
	<artifactId>pmml-evaluator</artifactId>
	<version>${jpmml.version}</version>
</dependency>

Usage

Class model

The class model consists of two types of classes. There is a small number of manually crafted classes that are used for structuring the class hierarchy. They are permanently stored in the Java sources directory /pmml-model/src/main/java. Additionally, there is a much greater number of automatically generated classes that represent actual PMML elements. They can be found in the generated Java sources directory /pmml-model/target/generated-sources/xjc after a successful build operation.

All class model classes descend from class org.dmg.pmml.PMMLObject. Additional class hierarchy levels, if any, represent common behaviour and/or features. For example, all model classes descend from class org.dmg.pmml.Model.

There is not much documentation accompanying class model classes. The application developer should consult with the [PMML specification] (http://www.dmg.org/v4-1/GeneralStructure.html) about individual PMML elements and attributes.

Example applications

Copying an existing org.dmg.pmml.PMML instance from one file to another file: [CopyExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/CopyExample.java)
Building a new org.dmg.pmml.PMML instance for the "golfing" decision tree model: [TreeModelBuilderExample.java] (https://github.com/jpmml/jpmml-example/blob/master/src/main/java/org/jpmml/example/TreeModelBuilderExample.java)

Evaluation engine

A model evaluator class can be instantiated directly when the contents of the PMML document is known:

PMML pmml = ...;

ModelEvaluator<TreeModel> modelEvaluator = new TreeModelEvaluator(pmml);

Otherwise, a PMML manager class should be instantiated first, which will inspect the contents of the PMML document and instantiate the right model evaluator class later:

PMML pmml = ...;

PMMLManager pmmlManager = new PMMLManager(pmml);
 
ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(null, ModelEvaluatorFactory.getInstance());

Model evaluator classes follow functional programming principles. Model evaluator instances are cheap enough to be created and discarded as needed (ie. not worth the pooling effort).

It is advisable for application code to work against the org.jpmml.evaluator.Evaluator interface:

Evaluator evaluator = (Evaluator)modelEvaluator;

An evaluator instance can be queried for the definition of active (ie. independent), predicted (ie. primary dependent) and output (ie. secondary dependent) fields:

List<FieldName> activeFields = evaluator.getActiveFields();
List<FieldName> predictedFields = evaluator.getPredictedFields();
List<FieldName> outputFields = evaluator.getOutputFields();

The PMML scoring operation must be invoked with valid arguments. Otherwise, the behaviour of the model evaluator class is unspecified.

The preparation of field values:

Map<FieldName, FieldValue> arguments = new LinkedHashMap<FieldName, FieldValue>();

List<FieldName> activeFields = evaluator.getActiveFields();
for(FieldName activeField : activeFields){
	// The raw (ie. user-supplied) value could be any Java primitive value
	Object rawValue = ...;

	// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
	FieldValue activeValue = evaluator.prepare(activeField, rawValue);

	arguments.put(activeField, activeValue);
}

The scoring:

Map<FieldName, ?> results = evaluator.evaluate(arguments);

Typically, a model has exactly one predicted field, which is called the target field:

FieldName targetName = evaluator.getTargetField();
Object targetValue = results.get(targetName);

The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable:

if(targetValue instanceof Computable){
	Computable computable = (Computable)targetValue;

	Object primitiveValue = computable.getResult();
}

The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature:

// Test for "entityId" result feature
if(targetValue instanceof HasEntityId){
	HasEntityId hasEntityId = (HasEntityId)targetValue;
	HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
	BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
	Entity winner = entities.get(hasEntityId.getEntityId());

	// Test for "probability" result feature
	if(targetValue instanceof HasProbability){
		HasProbability hasProbability = (HasProbability)targetValue;
		Double winnerProbability = hasProbability.getProbability(winner.getId());
	}
}

Example applications

Evaluating a PMML file interactively: [EvaluationExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/EvaluationExample.java)
Evaluating a PMML file non-interactively with CSV file input: [CsvEvaluationExample.java] (https://github.com/jpmml/jpmml-example/tree/master/src/main/java/org/jpmml/example/CsvEvaluationExample.java)

Additional information

Please contact [[email protected]] (mailto:[email protected])

jpmml-cascading's People

Contributors

Stargazers

Watchers

Forkers

ssshow16 mindis

jpmml-cascading's Issues

cascading.flow.planner.PlannerException: could not build flow from assembly: [unable to pack object: cascading.flow.hadoop.HadoopFlowStep]

I'm trying to run the sample data from https://github.com/Cascading/pattern/tree/wip-1.0/pattern-examples/data on this PMML evaluator library

Im able to get the predicted output for most of the files except the Random Forest Model PMML's (i.e sample.rf.xml & iris.rf.xml).

I get the below error:

Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [unable to pack object: cascading.flow.hadoop.HadoopFlowStep]
at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:578)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:286)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at org.jpmml.cascading.Main.main(Main.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: cascading.flow.FlowException: unable to pack object: cascading.flow.hadoop.HadoopFlowStep
at cascading.flow.hadoop.HadoopFlowStep.pack(HadoopFlowStep.java:200)
at cascading.flow.hadoop.HadoopFlowStep.getInitializedConfig(HadoopFlowStep.java:175)
at cascading.flow.hadoop.HadoopFlowStep.createFlowStepJob(HadoopFlowStep.java:208)
at cascading.flow.hadoop.HadoopFlowStep.createFlowStepJob(HadoopFlowStep.java:73)
at cascading.flow.planner.BaseFlowStep.getFlowStepJob(BaseFlowStep.java:768)
at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1253)
at cascading.flow.BaseFlow.initialize(BaseFlow.java:214)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:280)
... 9 more
Caused by: java.io.NotSerializableException: org.jpmml.evaluator.ModelEvaluatorFactory
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at java.util.HashMap.writeObject(HashMap.java:1129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at cascading.flow.hadoop.util.JavaObjectSerializer.serialize(JavaObjectSerializer.java:57)
at cascading.flow.hadoop.util.HadoopUtil.serializeBase64(HadoopUtil.java:282)
at cascading.flow.hadoop.HadoopFlowStep.pack(HadoopFlowStep.java:196)
... 16 more

Im running this from local mode using the below code:

hadoop jar example-1.1-SNAPSHOT-job.jar /home/prince/Desktop/pattern-wip-1.0/pattern-examples/data/iris.rf.xml file:///home/prince/Desktop/pattern-wip-1.0/pattern-examples/data/iris.rf.tsv /user/hadoop/part/

Time Series not supporting

Hi ,

Cascading pattern not supporting Time series PMML model. Please let me know when can have this in future/know me other way round solution to resolve time series PMML model

Output file having only RawScore,PredProb

Hi,
I tried to score one PMML model and in the output file i am getting RawScore,PredProb columns only.
I am not able to identify which records are having which RawScore,PredProb.

I mean in the output files how can i get the other columns as well which will have the input columns as well(or least some kind of key/unique id) to identify which record this RawScore,PredProb relates to.

Output of the job:

hadoop@h nitin]$ head -5 ./OutHadoopCmd/part-00000
RawScore,PredProb
0.30726660686508417,0.648974194605812
0.1761462727982202,0.5871734046360437
0.25373184558861994,0.6242117165224929
0.2877732151464677,0.6400419974818216

Can you please suggest how can i get the unique keys from my input as well so that i can identify the Raw Score and Predicted Probability for the same.

Issue while reading PMML generated from SAS e Miner

Hi,

I am getting below exception while trying to use this code while reading output PMML from SAS EM.

Kindly let me know if require more informaiton.

Regards,
Nikhil

JPMML - Cascading Issue

Hi,

When we execute on Apache 1.2.0 hadoop environment. It is working fine.

When we execute on Apache 2.X hadoop environment, we are getting node manage not able to execute.

How to set the Hadoop queue name in jpmml cascading.

I am trying to execute pmml model using cascading framework in jpmml cascading library provided in this project https://github.com/jpmml/jpmml-cascading

I have followed all the steps and was able to generate the example-1.2-SNAPSHOT-job.jar using mvn clean install command.

However when I am executing the same jar using the below command :

hadoop jar example-1.2-SNAPSHOT-job.jar /tmp/cascading/model.pmml file:///tmp/cascading/input.csv file:///tmp/cascading/output
I am getting below exceptions for not having the rights to submit the job on DEFAULT queue as default queue in our hadoop cluster is blocked for admin purpose only, normal user can not run the hadoop job without providing the queue name.

Exception:
16/01/06 04:41:37 ERROR ipc.FailoverRPC: FailoverProxy: Failing this Call: submitJob for error(RemoteException): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): User test cannot perform operation SUBMIT_JOB on queue default.
Please run "hadoop queue -showacls" command to find the queues you have access to .
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:179)
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:136)
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:113)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:4524)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:481)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2000)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1996)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1994)
I am not able to see where to provide the hadoop job queue in the repository.

Updated main class that i am trying for the last 24 hours to run on my MapR cluster.

public class Main {

    static
    public void main(String... args) throws Exception {

        if(args.length != 3){
            System.err.println("Usage: hadoop jar job.jar <PMML file> <HFS source> <HFS sink>");

            System.exit(-1);
        }

        Properties properties = new Properties();

/*Putting all the random job queue name setting property to handle in Hadoop 1 and as well as Hadoop 2
*/
        properties.put("mapreduce.job.queuename", "test2");
        properties.put("mapred.job.queue.name", "test2");
        properties.setProperty("mapreduce.job.queuename", "test2");
        properties.setProperty("mapred.job.queue.name", "test2");

/*Now trying to set the jobconf in HadoopPlanner */
        JobConf conf = new JobConf(Main.class);

        conf.set("mapred.job.queue.name", "test2");

        HadoopPlanner.copyJobConf(properties, conf);


        AppProps.setApplicationJarClass(properties, Main.class);

        FlowConnector connector = new HadoopFlowConnector(properties);

        PMML pmml;

        InputStream is = new FileInputStream(args[0]);

        try {
            Source source = ImportFilter.apply(new InputSource(is));

            pmml = JAXBUtil.unmarshalPMML(source);
        } finally {
            is.close();
        }

        pmml.accept(new LocatorTransformer());

        PMMLManager pmmlManager = new PMMLManager(pmml);

        ModelEvaluator<?> modelEvaluator = (ModelEvaluator<?>)pmmlManager.getModelManager(ModelEvaluatorFactory.getInstance());
        modelEvaluator.verify();

        FlowDef flowDef = FlowDef.flowDef();

        Tap source = new Hfs(new TextDelimited(true, ","), args[1]);
        flowDef = flowDef.addSource("source", source);

        Tap sink = new Hfs(new TextDelimited(true, ","), args[2]);
        flowDef = flowDef.addSink("sink", sink);

        PMMLPlanner pmmlPlanner = new PMMLPlanner(modelEvaluator);
        pmmlPlanner.setRetainOnlyActiveFields();

        flowDef = flowDef.addAssemblyPlanner(pmmlPlanner);

//Trying to print out all the property keys and values before setting the job to run.
        Map<Object,Object> propLast = connector.getProperties();
        if(propLast instanceof Properties) {
            Properties i$ = (Properties)propLast;
            Set entry = i$.stringPropertyNames();
            Iterator i$1 = entry.iterator();

            while(i$1.hasNext()) {
                String key = (String)i$1.next();
                System.out.println("key: " +key + "<- value: " + i$.getProperty(key)+"<-");
            }
        } else {
            Iterator i$2 = propLast.entrySet().iterator();

            while(i$2.hasNext()) {
                Map.Entry entry1 = (Map.Entry)i$2.next();
                if(entry1.getValue() != null) {
                    System.out.println("key: " + entry1.getKey().toString() + "<- value: " + entry1.getValue().toString() + "<-");
                }
            }
        }
        Flow<?> flow = connector.connect(flowDef);

        flow.complete();
    }
}

Exception in thread "main" cascading.flow.FlowException: step failed: (1/1)

Exception in thread "main" cascading.flow.FlowException: step failed: (1/1) /test/vittabai, with job id: job_1453567420128_0001, please see cluster logs for failure messages
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:261)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:162)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Log ...........................................

016-01-24 22:55:59,532 INFO [Thread-80] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is http://quickstart.cloudera:19888/jobhistory/job/job_1453185973876_0002
2016-01-24 22:55:59,554 INFO [Thread-80] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for application to be successfully unregistered.
2016-01-24 22:56:00,556 INFO [Thread-80] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:12 ContRel:4 HostLocal:2 RackLocal:0
2016-01-24 22:56:00,558 INFO [Thread-80] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory hdfs://quickstart.cloudera:8020 /tmp/hadoop-yarn/staging/cloudera/.staging/job_1453185973876_0002
2016-01-24 22:56:00,563 INFO [Thread-80] org.apache.hadoop.ipc.Server: Stopping server on 48947
2016-01-24 22:56:00,567 INFO [IPC Server listener on 48947] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 48947
2016-01-24 22:56:00,569 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2016-01-24 22:56:00,570 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted

Cascading Flow Exception

When I run this Program at First time with PMML file 'iris.nn.xml' and 'iris.nn.tsv' file It execute Successfully.But Second Time Execute with same PMML file and tsv file It give the Exception as "field name already exists: species" .
Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [field name already exists: species]
at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:578)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:286)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at org.jpmml.cascading.Main.main(Main.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: cascading.tuple.TupleException: field name already exists: species
at cascading.tuple.Fields.copyRetain(Fields.java:1397)
at cascading.tuple.Fields.appendInternal(Fields.java:1266)
at cascading.tuple.Fields.append(Fields.java:1215)
at org.jpmml.cascading.PMMLPlanner.resolveAssembly(PMMLPlanner.java:136)
at org.jpmml.cascading.PMMLPlanner.resolveTails(PMMLPlanner.java:96)
at cascading.flow.planner.FlowPlanner.resolveAssemblyPlanners(FlowPlanner.java:153)
at cascading.flow.planner.FlowPlanner.resolveTails(FlowPlanner.java:140)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:246)

Duplicate field name found

Also i tried to execute the jpmml-cascading and i am getting this exception:

Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [duplicate field name found: C]
at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:590)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:293)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at org.jpmml.cascading.Main.main(Main.java:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.IllegalArgumentException: duplicate field name found: C
at cascading.tuple.Fields.validate(Fields.java:835)
at cascading.tuple.Fields.(Fields.java:632)
at cascading.scheme.util.DelimitedParser.parseFirstLine(DelimitedParser.java:312)
at cascading.scheme.hadoop.TextDelimited.retrieveSourceFields(TextDelimited.java:975)
at cascading.tap.Tap.retrieveSourceFields(Tap.java:366)
at cascading.flow.BaseFlow.retrieveSourceFields(BaseFlow.java:230)
at cascading.flow.BaseFlow.(BaseFlow.java:196)
at cascading.flow.hadoop.HadoopFlow.(HadoopFlow.java:98)
at cascading.flow.hadoop.planner.HadoopPlanner.createFlow(HadoopPlanner.java:238)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:251)
... 8 more

My input file is having this first record:
y,unique_id,wgt,r_busb,prob_b,prob_rnk1,prob_rnk2,prob_rnk3,new_comm_bur_cnt_4,comm_inq_sum,comm_hit,unique_domain,pct_company,n_open,r_open_mer_pg_vis,act_sup,C,K,C,N,N,N,N,N,C,C,C,C,C,L,N,C

and then after all the input data records for scoring. Can you please check why the duplicates are not allowed.

Note: Same i tried to run through JPMML-Evaluator example code and it was through and was able to get the score.

Exception in thread "main" cascading.flow.FlowException: step failed: (1/1)

Hi,

We facing exception during implementation of Jpmml-cascading pattern on Hadoop

Regards,
Murali

jpmml / jpmml-cascading Goto Github PK

jpmml-cascading's Introduction

IMPORTANT

Features

Class model

Evaluation engine

Installation

Class model

Evaluation engine

Usage

Class model

Example applications

Evaluation engine

Example applications

Additional information

jpmml-cascading's People

Contributors

Stargazers

Watchers

Forkers

jpmml-cascading's Issues

Recommend Projects

Recommend Topics

Recommend Org