Comments (12)
@Jason0812 To what is set your MASTER
variable please ? Could you please send me H2O logs and in case you're running the application on YARN also YARN logs ?
@copoo Thanks for reporting this problem! On current sparkling water you can set spark configuration property spark.ext.h2o.repl.enabled
to false. In command line by adding extra argument --conf spark.ext.h2o.repl.enabled=false
or using sparkConf
instance in application. It's just temporary solution which disables REPL - a feature which allows to write in Scala code in Flow UI ( our GUI on top of sparkling-water). If you aren't using this feature, than this solution should work at this moment perfectly.
It is on our plan to create proper fix and enable REPL support also for CDH 5.7 together with upcoming release for Spark 2.0.
Kuba
from sparkling-water.
Hi Jason,
thanks for reporting this! I have just a few questions since this is the first time we've seen this error.
- Are you sure you are using Spark for Scala 2.10 ? So far we do not support Scala 2.11
- Are you using official Spark distribution or some modified one ? Are you running only pure Spark with Sparkling Water ?
- Can you post here the complete code you're trying to run and also the shell script which starts it ?
- Have you downloaded the sparkling-water for Spark 1.6 from our page or have you built it on your own?
Just to understand what's happening there - In order to get our REPL ( interactive scala interpreter) working, we have to alter a little bit Spark environment using reflections and that's why we need to get field classServer
from class SparkIMain
. In official spark distribution ( Spark 1.6 ) SparkIMain
class contains the field classServer
.
Thanks, Kuba
from sparkling-water.
Hi Kuba,
i use the spark 1.6 for scala2.0, and spark distribution is CDH5.7-spark1.6.
the script is run sparkling water ./bin/run-example.sh.
Thanks
Jason.
from sparkling-water.
Thanks for the info.
So I found what the problem is. Actually CHD5.7-spark1.6 contains some commits which has been put to master in official spark 2.0 release.
It can be seen in the Cloudera change log.
commit e0d03eb30e03f589407c3cf37317a64f18db8257
Author: Marcelo Vanzin <[email protected]>
Date: Thu Dec 10 13:26:30 2015 -0800
[SPARK-11563][CORE][REPL] Use RpcEnv to transfer REPL-generated classes.
This avoids bringing up yet another HTTP server on the driver, and
instead reuses the file server already managed by the driver's
RpcEnv. As a bonus, the repl now inherits the security features of
the network library.
There's also a small change to create the directory for storing classes
under the root temp dir for the application (instead of directly
under java.io.tmpdir).
Author: Marcelo Vanzin <[email protected]>
Closes #9923 from vanzin/SPARK-11563.
(cherry picked from commit 4a46b8859d3314b5b45a67cdc5c81fecb6e9e78c)
Since we so far don't support spark 2.0 ( and this change officially belongs to spark 2.0), corresponding changes are not implemented so far. I'll file up a JIRA and let you know about the progress.
from sparkling-water.
thanks.
i will move the office spark 1.6 to run sparkling.
from sparkling-water.
move to office spark 1.6 version.
got the following error:
any suggestion for this error?
Sparkling Water Context:
-
H2O name: sparkling-water-root_1007693659
-
number of executors: 1
-
list of used executors:
(executorId, host, port)
(1,tracing030,54321)
Open H2O Flow in browser: http://172.168.0.19:54321 (CMD + click in Mac OSX)
Exception in thread "main" DistributedException from tracing030/172.168.0.30:54321, caused by java.lang.NullPointerException
at water.MRTask.getResult(MRTask.java:472)
at water.MRTask.doAll(MRTask.java:401)
at water.parser.ParseSetup.guessSetup(ParseSetup.java:222)
at water.parser.ParseSetup.guessSetup(ParseSetup.java:205)
at water.parser.ParseDataset.parse(ParseDataset.java:36)
at water.parser.ParseDataset.parse(ParseDataset.java:30)
at water.util.FrameUtils.parseFrame(FrameUtils.java:58)
at water.util.FrameUtils.parseFrame(FrameUtils.java:47)
at water.fvec.H2OFrame.(H2OFrame.scala:65)
at water.fvec.H2OFrame.(H2OFrame.scala:75)
at org.apache.spark.examples.h2o.DeepLearningDemo$.main(DeepLearningDemo.scala:47)
at org.apache.spark.examples.h2o.DeepLearningDemo.main(DeepLearningDemo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at water.parser.ParseSetup$GuessSetupTsk.map(ParseSetup.java:284)
at water.MRTask.compute2(MRTask.java:587)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1184)
at water.parser.ParseSetup$GuessSetupTsk$Icer.compute1(ParseSetup$GuessSetupTsk$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1180)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
from sparkling-water.
Hi Kuba,
We encounter the same issue(CDH 5.7.0 and spark version 1.6.0),is there any way to work around, except using office spark 1.6.1?
from sparkling-water.
Hi Kuba,
Thanks for your reply.
After adding the conf option as you say, The Exception java.lang.NoSuchFieldException: classServer
has pass.However another exception occur , relative exception info below:
16/06/06 14:38:59 INFO spark.SparkContext: Created broadcast 9 from textFile at AirlinesWithWeatherDemo2.scala:53
Exception in thread "main" DistributedException from nd10.localdomain/192.168.0.10:54321, caused by java.lang.NullPointerException
at water.MRTask.getResult(MRTask.java:472)
at water.MRTask.doAll(MRTask.java:401)
at water.parser.ParseSetup.guessSetup(ParseSetup.java:222)
at water.parser.ParseSetup.guessSetup(ParseSetup.java:205)
at water.parser.ParseDataset.parse(ParseDataset.java:36)
at water.parser.ParseDataset.parse(ParseDataset.java:30)
at water.util.FrameUtils.parseFrame(FrameUtils.java:58)
at water.util.FrameUtils.parseFrame(FrameUtils.java:47)
at water.fvec.H2OFrame.<init>(H2OFrame.scala:65)
at water.fvec.H2OFrame.<init>(H2OFrame.scala:75)
at org.apache.spark.examples.h2o.AirlinesWithWeatherDemo2$.main(AirlinesWithWeatherDemo2.scala:59)
at org.apache.spark.examples.h2o.AirlinesWithWeatherDemo2.main(AirlinesWithWeatherDemo2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at water.parser.ParseSetup$GuessSetupTsk.map(ParseSetup.java:284)
at water.MRTask.compute2(MRTask.java:587)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1184)
at water.parser.ParseSetup$GuessSetupTsk$Icer.compute1(ParseSetup$GuessSetupTsk$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1180)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
The spark Cluster run as spark on yarn and Job submited as yarn-client mode.
192.168.0.10
is one of the cluster datanode/spark gateway.
from sparkling-water.
Hi @Jason0812 @copoo,
sorry for late response. I discussed that with my colleagues and the problems in both your cases are that you are probably trying to use Spark in distributed environment such as YARN. In that case you need to ensure that data from which you want to create H2OFrames are on some distributed filesystem such as HDFS, because from there all nodes can see the file.
AirlinesWeatherDemo2
is demo which is supposed to be run in local environment.
So before you start your spark application, be sure that you put all the necessary files on HDFS. They you can create H2OFrame based on URI pointing to the location of the file on HDFS like
new H2OFrame(new java.net.URI(hdfs://path/to/the/file))
from sparkling-water.
Hi @Jason0812 @copoo ,
since I haven't heard from you for some time, I'm closing this issue. Feel free to reopen this issue or open a new one if you have other problems/questions.
Thanks, Kuba
from sparkling-water.
I had same issue with Cloudera CDH 5.8 with Spark 1.6.1 and sparkling water 1.6.8.
Solution was to add the following as explained by Kuba:
--conf spark.ext.h2o.repl.enabled=false
from sparkling-water.
I'm having issues while the H2O is trying to convert the data back to spark RDD
Error: assertion failed: Should never be here, type is 3
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx$$anonfun$4.apply(InternalReadConverterCtx.scala:74)
at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx$$anonfun$4.apply(InternalReadConverterCtx.scala:73)
at scala.collection.Map$WithDefault.default(Map.scala:53)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx.string(InternalReadConverterCtx.scala:63)
at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx.string(InternalReadConverterCtx.scala:29)
at org.apache.spark.h2o.converters.ReadConverterCtx$$anonfun$ExtractorsTable$8.apply(ReadConverterCtx.scala:111)
at org.apache.spark.h2o.converters.ReadConverterCtx$$anonfun$ExtractorsTable$8.apply(ReadConverterCtx.scala:111)
at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx$$anonfun$returnOption$2.apply(InternalReadConverterCtx.scala:47)
at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx$$anonfun$returnOption$2.apply(InternalReadConverterCtx.scala:46)
at scala.Option$WithFilter.flatMap(Option.scala:208)
at org.apache.spark.h2o.backends.internal.InternalReadConverterCtx.returnOption(InternalReadConverterCtx.scala:46)
at org.apache.spark.h2o.converters.ReadConverterCtx$$anonfun$org$apache$spark$h2o$converters$ReadConverterCtx$$OptionReadersMap$1$$anonfun$apply$1.apply(ReadConverterCtx.scala:119)
at org.apache.spark.h2o.converters.ReadConverterCtx$$anonfun$org$apache$spark$h2o$converters$ReadConverterCtx$$OptionReadersMap$1$$anonfun$apply$1.apply(ReadConverterCtx.scala:119)
at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator$$anonfun$5$$anonfun$apply$1.apply(H2ORDD.scala:128)
at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator$$anonfun$6.apply(H2ORDD.scala:132)
at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator$$anonfun$6.apply(H2ORDD.scala:132)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator.extractRow(H2ORDD.scala:132)
at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator.readOneRow(H2ORDD.scala:188)
at org.apache.spark.h2o.converters.H2ORDD$H2ORDDIterator.hasNext(H2ORDD.scala:156)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1210)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1210)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1210)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1218)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1197)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
from sparkling-water.
Related Issues (20)
- Sparkling Water not properly configuring RAM on Databricks HOT 1
- R docker build failing again
- h2o-pysparkling-3.x does not support pep517 builds HOT 4
- Install proper setuptools
- Scala 2.13 support - part 1 - investigation
- Scala 2.13 support - part 2 - implementation
- Use newer Ubuntu in test docker image
- Upgrade H2O to 3.44.0.3
- Can't install pysparkling after updating setuptools >= 69.0.0 HOT 2
- Quiet and Embedded arguments are not working in the last version 3.44.0.3 HOT 1
- libxgboost.so getting filled in /tmp HOT 8
- Error - Spark parameters on H2O Sparkling water SIG
- describe an h2oframe HOT 2
- describe an h2oframe
- RestApiCommunicationException: H2O node http://10.159.20.11:54321 responded with HOT 1
- Upgrade H2O to 3.46.0.1
- docs: out of date Spark version listings HOT 1
- AIC/Loglikelihood metrics generation problems
- when will sparkling-water 3.46.0.1 be released? HOT 1
- expose uuid for dai mojo
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparkling-water.