Giter VIP home page Giter VIP logo

Comments (12)

jakubhava avatar jakubhava commented on May 16, 2024 1

@Jason0812 To what is set your MASTER variable please ? Could you please send me H2O logs and in case you're running the application on YARN also YARN logs ?

@copoo Thanks for reporting this problem! On current sparkling water you can set spark configuration property spark.ext.h2o.repl.enabled to false. In command line by adding extra argument --conf spark.ext.h2o.repl.enabled=false or using sparkConf instance in application. It's just temporary solution which disables REPL - a feature which allows to write in Scala code in Flow UI ( our GUI on top of sparkling-water). If you aren't using this feature, than this solution should work at this moment perfectly.

It is on our plan to create proper fix and enable REPL support also for CDH 5.7 together with upcoming release for Spark 2.0.

Kuba

from sparkling-water.

jakubhava avatar jakubhava commented on May 16, 2024

Hi Jason,
thanks for reporting this! I have just a few questions since this is the first time we've seen this error.

  • Are you sure you are using Spark for Scala 2.10 ? So far we do not support Scala 2.11
  • Are you using official Spark distribution or some modified one ? Are you running only pure Spark with Sparkling Water ?
  • Can you post here the complete code you're trying to run and also the shell script which starts it ?
  • Have you downloaded the sparkling-water for Spark 1.6 from our page or have you built it on your own?

Just to understand what's happening there - In order to get our REPL ( interactive scala interpreter) working, we have to alter a little bit Spark environment using reflections and that's why we need to get field classServer from class SparkIMain. In official spark distribution ( Spark 1.6 ) SparkIMain class contains the field classServer.

Thanks, Kuba

from sparkling-water.

Jason0812 avatar Jason0812 commented on May 16, 2024

Hi Kuba,
i use the spark 1.6 for scala2.0, and spark distribution is CDH5.7-spark1.6.
the script is run sparkling water ./bin/run-example.sh.

Thanks
Jason.

from sparkling-water.

jakubhava avatar jakubhava commented on May 16, 2024

Thanks for the info.

So I found what the problem is. Actually CHD5.7-spark1.6 contains some commits which has been put to master in official spark 2.0 release.

It can be seen in the Cloudera change log.

commit e0d03eb30e03f589407c3cf37317a64f18db8257
Author: Marcelo Vanzin <[email protected]>
Date:   Thu Dec 10 13:26:30 2015 -0800

[SPARK-11563][CORE][REPL] Use RpcEnv to transfer REPL-generated classes.

This avoids bringing up yet another HTTP server on the driver, and
instead reuses the file server already managed by the driver's
RpcEnv. As a bonus, the repl now inherits the security features of
the network library.

There's also a small change to create the directory for storing classes
under the root temp dir for the application (instead of directly
under java.io.tmpdir).

Author: Marcelo Vanzin <[email protected]>

Closes #9923 from vanzin/SPARK-11563.

(cherry picked from commit 4a46b8859d3314b5b45a67cdc5c81fecb6e9e78c)

Since we so far don't support spark 2.0 ( and this change officially belongs to spark 2.0), corresponding changes are not implemented so far. I'll file up a JIRA and let you know about the progress.

from sparkling-water.

Jason0812 avatar Jason0812 commented on May 16, 2024

thanks.
i will move the office spark 1.6 to run sparkling.

from sparkling-water.

Jason0812 avatar Jason0812 commented on May 16, 2024

move to office spark 1.6 version.
got the following error:

any suggestion for this error?

Sparkling Water Context:

  • H2O name: sparkling-water-root_1007693659

  • number of executors: 1

  • list of used executors:

    (executorId, host, port)

    (1,tracing030,54321)

    Open H2O Flow in browser: http://172.168.0.19:54321 (CMD + click in Mac OSX)

Exception in thread "main" DistributedException from tracing030/172.168.0.30:54321, caused by java.lang.NullPointerException
at water.MRTask.getResult(MRTask.java:472)
at water.MRTask.doAll(MRTask.java:401)
at water.parser.ParseSetup.guessSetup(ParseSetup.java:222)
at water.parser.ParseSetup.guessSetup(ParseSetup.java:205)
at water.parser.ParseDataset.parse(ParseDataset.java:36)
at water.parser.ParseDataset.parse(ParseDataset.java:30)
at water.util.FrameUtils.parseFrame(FrameUtils.java:58)
at water.util.FrameUtils.parseFrame(FrameUtils.java:47)
at water.fvec.H2OFrame.(H2OFrame.scala:65)
at water.fvec.H2OFrame.(H2OFrame.scala:75)
at org.apache.spark.examples.h2o.DeepLearningDemo$.main(DeepLearningDemo.scala:47)
at org.apache.spark.examples.h2o.DeepLearningDemo.main(DeepLearningDemo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at water.parser.ParseSetup$GuessSetupTsk.map(ParseSetup.java:284)
at water.MRTask.compute2(MRTask.java:587)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1184)
at water.parser.ParseSetup$GuessSetupTsk$Icer.compute1(ParseSetup$GuessSetupTsk$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1180)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

from sparkling-water.

copoo avatar copoo commented on May 16, 2024

Hi Kuba,
We encounter the same issue(CDH 5.7.0 and spark version 1.6.0),is there any way to work around, except using office spark 1.6.1?

from sparkling-water.

copoo avatar copoo commented on May 16, 2024

Hi Kuba,
Thanks for your reply.

After adding the conf option as you say, The Exception java.lang.NoSuchFieldException: classServer has pass.However another exception occur , relative exception info below:

16/06/06 14:38:59 INFO spark.SparkContext: Created broadcast 9 from textFile at AirlinesWithWeatherDemo2.scala:53
Exception in thread "main" DistributedException from nd10.localdomain/192.168.0.10:54321, caused by java.lang.NullPointerException
        at water.MRTask.getResult(MRTask.java:472)
        at water.MRTask.doAll(MRTask.java:401)
        at water.parser.ParseSetup.guessSetup(ParseSetup.java:222)
        at water.parser.ParseSetup.guessSetup(ParseSetup.java:205)
        at water.parser.ParseDataset.parse(ParseDataset.java:36)
        at water.parser.ParseDataset.parse(ParseDataset.java:30)
        at water.util.FrameUtils.parseFrame(FrameUtils.java:58)
        at water.util.FrameUtils.parseFrame(FrameUtils.java:47)
        at water.fvec.H2OFrame.<init>(H2OFrame.scala:65)
        at water.fvec.H2OFrame.<init>(H2OFrame.scala:75)
        at org.apache.spark.examples.h2o.AirlinesWithWeatherDemo2$.main(AirlinesWithWeatherDemo2.scala:59)
        at org.apache.spark.examples.h2o.AirlinesWithWeatherDemo2.main(AirlinesWithWeatherDemo2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
        at water.parser.ParseSetup$GuessSetupTsk.map(ParseSetup.java:284)
        at water.MRTask.compute2(MRTask.java:587)
        at water.H2O$H2OCountedCompleter.compute1(H2O.java:1184)
        at water.parser.ParseSetup$GuessSetupTsk$Icer.compute1(ParseSetup$GuessSetupTsk$Icer.java)
        at water.H2O$H2OCountedCompleter.compute(H2O.java:1180)
        at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
        at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
        at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
        at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
        at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

The spark Cluster run as spark on yarn and Job submited as yarn-client mode.

192.168.0.10 is one of the cluster datanode/spark gateway.

from sparkling-water.

jakubhava avatar jakubhava commented on May 16, 2024

Hi @Jason0812 @copoo,
sorry for late response. I discussed that with my colleagues and the problems in both your cases are that you are probably trying to use Spark in distributed environment such as YARN. In that case you need to ensure that data from which you want to create H2OFrames are on some distributed filesystem such as HDFS, because from there all nodes can see the file.

AirlinesWeatherDemo2 is demo which is supposed to be run in local environment.

So before you start your spark application, be sure that you put all the necessary files on HDFS. They you can create H2OFrame based on URI pointing to the location of the file on HDFS like

new H2OFrame(new java.net.URI(hdfs://path/to/the/file))

from sparkling-water.

jakubhava avatar jakubhava commented on May 16, 2024

Hi @Jason0812 @copoo ,
since I haven't heard from you for some time, I'm closing this issue. Feel free to reopen this issue or open a new one if you have other problems/questions.

Thanks, Kuba

from sparkling-water.

Avkash avatar Avkash commented on May 16, 2024

I had same issue with Cloudera CDH 5.8 with Spark 1.6.1 and sparkling water 1.6.8.
Solution was to add the following as explained by Kuba:
--conf spark.ext.h2o.repl.enabled=false

from sparkling-water.

DINESHKUMARMURUGAN avatar DINESHKUMARMURUGAN commented on May 16, 2024

I'm having issues while the H2O is trying to convert the data back to spark RDD

Error: assertion failed: Should never be here, type is 3
	at scala.Predef$.assert(Predef.scala:170)
	at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx$$anonfun$4.apply(InternalReadConverterCtx.scala:74)
	at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx$$anonfun$4.apply(InternalReadConverterCtx.scala:73)
	at scala.collection.Map$WithDefault.default(Map.scala:53)
	at scala.collection.MapLike$class.apply(MapLike.scala:141)
	at scala.collection.AbstractMap.apply(Map.scala:59)
	at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx.string(InternalReadConverterCtx.scala:63)
	at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx.string(InternalReadConverterCtx.scala:29)
	at org.apache.spark.h2o.converters.ReadConverterCtx$$anonfun$ExtractorsTable$8.apply(ReadConverterCtx.scala:111)
	at org.apache.spark.h2o.converters.ReadConverterCtx$$anonfun$ExtractorsTable$8.apply(ReadConverterCtx.scala:111)
	at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx$$anonfun$returnOption$2.apply(InternalReadConverterCtx.scala:47)
	at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx$$anonfun$returnOption$2.apply(InternalReadConverterCtx.scala:46)
	at scala.Option$WithFilter.flatMap(Option.scala:208)
	at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx.returnOption(InternalReadConverterCtx.scala:46)
	at org.apache.spark.h2o.converters.ReadConverterCtx$$anonfun$org$apache$spark$h2o$converters$ReadConverterCtx$$OptionReadersMap$1$$anonfun$apply$1.apply(ReadConverterCtx.scala:119)
	at org.apache.spark.h2o.converters.ReadConverterCtx$$anonfun$org$apache$spark$h2o$converters$ReadConverterCtx$$OptionReadersMap$1$$anonfun$apply$1.apply(ReadConverterCtx.scala:119)
	at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator$$anonfun$5$$anonfun$apply$1.apply(H2ORDD.scala:128)
	at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator$$anonfun$6.apply(H2ORDD.scala:132)
	at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator$$anonfun$6.apply(H2ORDD.scala:132)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator.extractRow(H2ORDD.scala:132)
	at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator.readOneRow(H2ORDD.scala:188)
	at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator.hasNext(H2ORDD.scala:156)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1210)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1210)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1210)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1218)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1197)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

from sparkling-water.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.