Giter VIP home page Giter VIP logo

docker-spark-livy's Introduction

Hi there ๐Ÿ‘‹

  • ๐Ÿ”ญ Iโ€™m currently working on big data and data science use cases in telco domain
  • ๐ŸŒฑ Iโ€™m currently learning cloud engineering on AWS, GCP, Azure and more
  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate on data engineering and data science engineering projects
  • ๐Ÿค” Iโ€™m looking for help with data engineering projects
  • ๐Ÿ’ฌ Ask me about big data architecture and implementing production ready data science algorithms
  • ๐Ÿ“ซ How to reach me: renien.com
  • ๐Ÿ˜„ Pronouns: Renien
  • โšก Fun fact: Dad to a ๐Ÿ‘ผ, Traveler ๐ŸŒŽ, Water Garden ๐ŸŒป, Fishkeeping ๐Ÿก, teacher and many other stuffs

docker-spark-livy's People

Contributors

diggzhang avatar renien avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

docker-spark-livy's Issues

pyspark jobs not running on Livy

Name and Version

docker-spark-livy

What steps will reproduce the bug?

Create pyspark session on Livy with sparkmagic extension or via Curl.

What is the expected behavior?

A Spark session should be created.

What do you see instead?

We've built your image. When we submit commands in a pyspark kernel it crashes the Livy session. The spark kernel is running ok with Scala.

The code failed because of a fatal error:
Session 0 unexpectedly reached final status 'error'. See logs:
stdout:

stderr:
2023-11-03 19:52:21,930 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2023-11-03 19:52:22,529 INFO driver.RSCDriver: Connecting to: 546f6dce20f7:10000
2023-11-03 19:52:22,530 INFO driver.RSCDriver: Starting RPC server...
2023-11-03 19:52:22,762 INFO rpc.RpcServer: Connected to the port 10001
2023-11-03 19:52:22,762 WARN rsc.RSCConf: Your hostname, 546f6dce20f7, resolves to a loopback address, but we couldn't find any external IP address!
2023-11-03 19:52:22,762 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
2023-11-03 19:52:23,416 INFO driver.RSCDriver: Received job request 75d2c76d-ed10-4978-8fd4-8f96a7aa6d86
2023-11-03 19:52:23,417 INFO driver.RSCDriver: SparkContext not yet up, queueing job request.
2023-11-03 19:52:26,826 INFO driver.SparkEntries: Starting Spark context...
2023-11-03 19:52:26,845 INFO spark.SparkContext: Running Spark version 2.4.7
2023-11-03 19:52:26,869 INFO spark.SparkContext: Submitted application: livy-session-0
2023-11-03 19:52:26,920 INFO spark.SecurityManager: Changing view acls to: root
2023-11-03 19:52:26,920 INFO spark.SecurityManager: Changing modify acls to: root
2023-11-03 19:52:26,920 INFO spark.SecurityManager: Changing view acls groups to:
2023-11-03 19:52:26,920 INFO spark.SecurityManager: Changing modify acls groups to:
2023-11-03 19:52:26,920 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2023-11-03 19:52:27,073 INFO util.Utils: Successfully started service 'sparkDriver' on port 34475.
2023-11-03 19:52:27,152 INFO spark.SparkEnv: Registering MapOutputTracker
2023-11-03 19:52:27,199 INFO spark.SparkEnv: Registering BlockManagerMaster
2023-11-03 19:52:27,202 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2023-11-03 19:52:27,204 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
2023-11-03 19:52:27,237 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-53840d8c-71af-4e27-a934-a94a92befa63
2023-11-03 19:52:27,272 INFO memory.MemoryStore: MemoryStore started with capacity 353.4 MB
2023-11-03 19:52:27,308 INFO spark.SparkEnv: Registering OutputCommitCoordinator
2023-11-03 19:52:27,410 INFO util.log: Logging initialized @6966ms
2023-11-03 19:52:27,479 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2023-11-03 19:52:27,499 INFO server.Server: Started @7056ms
2023-11-03 19:52:27,520 INFO server.AbstractConnector: Started ServerConnector@7156c9a1{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2023-11-03 19:52:27,520 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
2023-11-03 19:52:27,570 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2e34e0b1{/jobs,null,AVAILABLE,@spark}
2023-11-03 19:52:27,571 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bfc4707{/jobs/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,580 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e070ca1{/jobs/job,null,AVAILABLE,@spark}
2023-11-03 19:52:27,581 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@64ec5438{/jobs/job/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,581 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1d51f3dd{/stages,null,AVAILABLE,@spark}
2023-11-03 19:52:27,584 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b4586d2{/stages/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,584 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@122f5e2b{/stages/stage,null,AVAILABLE,@spark}
2023-11-03 19:52:27,585 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@418d369b{/stages/stage/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,586 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@293be687{/stages/pool,null,AVAILABLE,@spark}
2023-11-03 19:52:27,586 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1941bcf7{/stages/pool/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,587 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@552809c4{/storage,null,AVAILABLE,@spark}
2023-11-03 19:52:27,589 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d39c414{/storage/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,589 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bc00aea{/storage/rdd,null,AVAILABLE,@spark}
2023-11-03 19:52:27,590 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1ff2a961{/storage/rdd/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,590 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a01f572{/environment,null,AVAILABLE,@spark}
2023-11-03 19:52:27,591 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@392b8942{/environment/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,592 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32b743d7{/executors,null,AVAILABLE,@spark}
2023-11-03 19:52:27,592 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@38045d41{/executors/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,593 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3dab9556{/executors/threadDump,null,AVAILABLE,@spark}
2023-11-03 19:52:27,593 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58af6a38{/executors/threadDump/json,null,AVAILABLE,@spark}
2023-11-03 19:52:27,600 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@508e3301{/static,null,AVAILABLE,@spark}
2023-11-03 19:52:27,601 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@44174e4e{/,null,AVAILABLE,@spark}
2023-11-03 19:52:27,602 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@55395b0a{/api,null,AVAILABLE,@spark}
2023-11-03 19:52:27,602 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46ef0919{/jobs/job/kill,null,AVAILABLE,@spark}
2023-11-03 19:52:27,603 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f3a3496{/stages/stage/kill,null,AVAILABLE,@spark}
2023-11-03 19:52:27,605 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://546f6dce20f7:4040/
2023-11-03 19:52:27,618 INFO spark.SparkContext: Added JAR file:///usr/livy/rsc-jars/livy-api-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-api-0.7.0-incubating.jar with timestamp 1699041147618
2023-11-03 19:52:27,618 INFO spark.SparkContext: Added JAR file:///usr/livy/rsc-jars/livy-thriftserver-session-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-thriftserver-session-0.7.0-incubating.jar with timestamp 1699041147618
2023-11-03 19:52:27,618 INFO spark.SparkContext: Added JAR file:///usr/livy/rsc-jars/netty-all-4.0.37.Final.jar at spark://546f6dce20f7:34475/jars/netty-all-4.0.37.Final.jar with timestamp 1699041147618
2023-11-03 19:52:27,619 INFO spark.SparkContext: Added JAR file:///usr/livy/rsc-jars/livy-rsc-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-rsc-0.7.0-incubating.jar with timestamp 1699041147619
2023-11-03 19:52:27,619 INFO spark.SparkContext: Added JAR file:///usr/livy/repl_2.11-jars/livy-repl_2.11-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-repl_2.11-0.7.0-incubating.jar with timestamp 1699041147619
2023-11-03 19:52:27,619 INFO spark.SparkContext: Added JAR file:///usr/livy/repl_2.11-jars/livy-core_2.11-0.7.0-incubating.jar at spark://546f6dce20f7:34475/jars/livy-core_2.11-0.7.0-incubating.jar with timestamp 1699041147619
2023-11-03 19:52:27,619 INFO spark.SparkContext: Added JAR file:///usr/livy/repl_2.11-jars/commons-codec-1.9.jar at spark://546f6dce20f7:34475/jars/commons-codec-1.9.jar with timestamp 1699041147619
2023-11-03 19:52:27,748 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://master:7077...
2023-11-03 19:52:27,808 INFO client.TransportClientFactory: Successfully created connection to master/172.22.0.2:7077 after 34 ms (0 ms spent in bootstraps)
2023-11-03 19:52:28,067 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20231103195228-0000
2023-11-03 19:52:28,099 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46123.
2023-11-03 19:52:28,100 INFO netty.NettyBlockTransferService: Server created on 546f6dce20f7:46123
2023-11-03 19:52:28,101 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2023-11-03 19:52:28,116 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/0 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s)
2023-11-03 19:52:28,117 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/0 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM
2023-11-03 19:52:28,139 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/1 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s)
2023-11-03 19:52:28,140 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/1 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM
2023-11-03 19:52:28,142 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/2 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s)
2023-11-03 19:52:28,145 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/2 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM
2023-11-03 19:52:28,160 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/3 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s)
2023-11-03 19:52:28,160 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/3 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM
2023-11-03 19:52:28,160 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20231103195228-0000/4 on worker-20231101181808-172.22.0.3-8881 (172.22.0.3:8881) with 2 core(s)
2023-11-03 19:52:28,161 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20231103195228-0000/4 on hostPort 172.22.0.3:8881 with 2 core(s), 1024.0 MB RAM
2023-11-03 19:52:28,222 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/0 is now RUNNING
2023-11-03 19:52:28,232 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 546f6dce20f7, 46123, None)
2023-11-03 19:52:28,232 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/1 is now RUNNING
2023-11-03 19:52:28,242 INFO storage.BlockManagerMasterEndpoint: Registering block manager 546f6dce20f7:46123 with 353.4 MB RAM, BlockManagerId(driver, 546f6dce20f7, 46123, None)
2023-11-03 19:52:28,246 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 546f6dce20f7, 46123, None)
2023-11-03 19:52:28,249 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 546f6dce20f7, 46123, None)
2023-11-03 19:52:28,256 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/2 is now RUNNING
2023-11-03 19:52:28,260 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/3 is now RUNNING
2023-11-03 19:52:28,287 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20231103195228-0000/4 is now RUNNING
2023-11-03 19:52:28,340 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@38de00ce{/metrics/json,null,AVAILABLE,@spark}
2023-11-03 19:52:28,398 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
2023-11-03 19:52:28,437 INFO driver.SparkEntries: Spark context finished initialization in 1610ms
2023-11-03 19:52:28,564 INFO driver.SparkEntries: Created Spark session.
Exception in thread "Thread-24" java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveContext
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetPublicMethods(Class.java:2902)
at java.lang.Class.getMethods(Class.java:1615)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305).

JDK Error with Dockerfile

I wanted to add some more python packages to the container. I get a JDK error trying to build the docker image:

#7 0.820 Location: https://download.oracle.com/otn-pub/java/jdk/8u281-b09/89d678f2be164786b292527658ca1605/server-jre-8u281-linux-x64.tar.gz?AuthParam=1626148706_2c241669e917fcef97d88ab93c94a8f2 [following]
#7 0.820 --2021-07-13 03:56:26--  https://download.oracle.com/otn-pub/java/jdk/8u281-b09/89d678f2be164786b292527658ca1605/server-jre-8u281-linux-x64.tar.gz?AuthParam=1626148706_2c241669e917fcef97d88ab93c94a8f2
#7 0.820 Connecting to download.oracle.com (download.oracle.com)|23.205.72.81|:443... connected.
#7 0.860 HTTP request sent, awaiting response... 404 Not Found
#7 3.486 2021-07-13 03:56:29 ERROR 404: Not Found.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.