Giter VIP home page Giter VIP logo

itsjafer / jupyterlab-sparkmonitor Goto Github PK

View Code? Open in Web Editor NEW
91.0 8.0 25.0 4.18 MB

JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook

Home Page: https://krishnan-r.github.io/sparkmonitor/

License: Apache License 2.0

JavaScript 47.63% CSS 6.86% Python 29.10% Jupyter Notebook 2.08% Scala 13.59% Dockerfile 0.23% Makefile 0.51%
jupyter jupyterlab jupyterlab-extension spark jupyter-lab apache-spark pyspark

jupyterlab-sparkmonitor's Introduction

Spark Monitor - An extension for Jupyter Lab

This project was originally written by krishnan-r as a Google Summer of Code project for Jupyter Notebook. Check his website out here.

As a part of my internship as a Software Engineer at Yelp, I created this fork to update the extension to be compatible with JupyterLab - Yelp's choice for sharing and collaborating on notebooks.

About

+ =
SparkMonitor is an extension for Jupyter Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself.

jobdisplay

Requirements

  • At least JupyterLab 3
  • pyspark 3.X.X or newer (For compatibility with older pyspark versions, use jupyterlab-sparkmonitor 3.X)

Features

  • Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
  • A table of jobs and stages with progressbars
  • A timeline which shows jobs, stages, and tasks
  • A graph showing number of active tasks & executor cores vs time
  • A notebook server extension that proxies the Spark UI and displays it in an iframe popup for more details
  • For a detailed list of features see the use case notebooks
  • Support for multiple SparkSessions (default port is 4040)
  • How it Works

Quick Start

To do a quick test of the extension

This docker image has pyspark and several other related packages installed alongside the sparkmonitor extension.

docker run -it -p 8888:8888 itsjafer/sparkmonitor

Setting up the extension

pip install jupyterlab-sparkmonitor # install the extension

# set up ipython profile and add our kernel extension to it
ipython profile create --ipython-dir=.ipython
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  .ipython/profile_default/ipython_config.py

# run jupyter lab
IPYTHONDIR=.ipython jupyter lab --watch

With the extension installed, a SparkConf object called conf will be usable from your notebooks. You can use it as follows:

from pyspark import SparkContext

# start the spark context using the SparkConf the extension inserted
sc=SparkContext.getOrCreate(conf=conf) #Start the spark context

# Monitor should spawn under the cell with 4 jobs
sc.parallelize(range(0,100)).count()
sc.parallelize(range(0,100)).count()
sc.parallelize(range(0,100)).count()
sc.parallelize(range(0,100)).count()

If you already have your own spark configuration, you will need to set spark.extraListeners to sparkmonitor.listener.JupyterSparkMonitorListener and spark.driver.extraClassPath to the path to the sparkmonitor python package path/to/package/sparkmonitor/listener.jar

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', 'venv/lib/python3.7/site-packages/sparkmonitor/listener.jar')\
        .getOrCreate()

# should spawn 4 jobs in a monitor bnelow the cell
spark.sparkContext.parallelize(range(0,100)).count()
spark.sparkContext.parallelize(range(0,100)).count()
spark.sparkContext.parallelize(range(0,100)).count()
spark.sparkContext.parallelize(range(0,100)).count()

Changelog

  • 1.0 - Initial Release
  • 2.0 - Migration to JupyterLab 2, Multiple Spark Sessions, and displaying monitors beneath the correct cell more accurately
  • 3.0 - Migrate to JupyterLab 3 as prebuilt extension
  • 4.0 - pyspark 3.X Compatibility; no longer compatible with PySpark 2.X or under

Development

If you'd like to develop the extension:

make all # Clean the directory, build the extension, and run it locally

jupyterlab-sparkmonitor's People

Contributors

abdealiloko avatar ben-epstein avatar dependabot[bot] avatar dolfinus avatar dougtrajano avatar itsjafer avatar krishnan-r avatar lydian avatar prantaaho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jupyterlab-sparkmonitor's Issues

jupyterlab_sparkmonitor is not a valid npm package errore

While trying to load jupyterlab extension manager I'm getting this error (*)
Do you have any clue?

(*)

SPARKMONITOR_SERVER: Loading Server Extension
[E 08:43:41.398 NotebookApp] Uncaught exception GET /lab/api/extensions?1589273014548 (127.0.0.1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/lab/api/extensions?1589273014548', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/tornado/web.py", line 1703, in _execute
        result = await result
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/handlers/extension_manager_handler.py", line 222, in get
        extensions = yield self.manager.list_extensions()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/handlers/extension_manager_handler.py", line 222, in get
        extensions = yield self.manager.list_extensions()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/handlers/extension_manager_handler.py", line 92, in list_extensions
        pkg_info = yield self._get_pkg_info(name, data)
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/handlers/extension_manager_handler.py", line 92, in list_extensions
        pkg_info = yield self._get_pkg_info(name, data)
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/handlers/extension_manager_handler.py", line 161, in _get_pkg_info
        outdated = yield self._get_outdated()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/handlers/extension_manager_handler.py", line 161, in _get_pkg_info
        outdated = yield self._get_outdated()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
        yielded = self.gen.throw(*exc_info)  # type: ignore
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/handlers/extension_manager_handler.py", line 195, in _load_outdated
        app_options=self.app_options
      File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
        value = future.result()
      File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 428, in result
        return self.__get_result()
      File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
        raise self._exception
      File "/opt/conda/lib/python3.7/concurrent/futures/thread.py", line 57, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/commands.py", line 550, in get_latest_compatible_package_versions
        return handler.latest_compatible_package_versions(names)
      File "/opt/conda/lib/python3.7/site-packages/jupyterlab/commands.py", line 1675, in latest_compatible_package_versions
        raise ValueError(msg % keys)
    ValueError: "['@jupyter-widgets/[email protected]', '[email protected]']" is not a valid npm package

Error creating PySpark Job

I'm trying to start a PySpark job on an AWS EMR instance with all other configuration parameters set to default. I've also installed the jupyterlab-sparkmonitor extension.
Note: The spark cluster setup works fine otherwise

config = {'spark.extraListeners': 'sparkmonitor.listener.JupyterSparkMonitorListener',
                'spark.driver.extraClassPath': 'some/path/miniconda3/envs/py_env/lib/python3.7/site-packages/sparkmonitor/listener.jar'}
config = SparkConf().setAll([(param, value) for param, value in config.items()])
    
spark_session = SparkSession \
         .builder.master("yarn") \
         .config(conf = config) \
        .appName(appName) \
        .getOrCreate()

This is the error that results:

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Exception when registering SparkListener
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2398)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: sparkmonitor.listener.JupyterSparkMonitorListener
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2682)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)

Should I be referencing the Listener class differently? I'm not entirely sure how to work towards a fix here. Could you please help me investigate this?
Thanks in advance!

Error when creating spark session

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/Users/skothari44/dev/jupyter-test/.venv/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 125, in run
    self.onrecv(msg)
  File "/Users/skothari44/dev/jupyter-test/.venv/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 144, in onrecv
    'msg': msg
  File "/Users/skothari44/dev/jupyter-test/.venv/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 226, in sendToFrontEnd
    monitor.send(msg)
  File "/Users/skothari44/dev/jupyter-test/.venv/lib/python3.7/site-packages/sparkmonitor/kernelextension.py", line 56, in send
    self.comm.send(msg)
AttributeError: 'ScalaMonitor' object has no attribute 'comm'

pyspark 3.0.0 support?

noticed this in setup.py

  install_requires=[
          'bs4',
          'tornado',
          'pyspark<3.0.0',
          'jupyterlab>=2.0.0'
      ],

Is there any reason why you've pinned to < 3.0.0 for pyspark? I'm in a 3.0.0 env.

Great tool ... hope I get to use it! Thank you! :-)

'conf' is not defined for version 4.1.0

I am getting the following error for version 4.1.0 (jupyterlab 3.1.9, pyspark 3.1.2):

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_139706/319767949.py in <module>
      2 
      3 # start the spark context using the SparkConf the extension inserted
----> 4 sc=SparkContext.getOrCreate(conf=conf) #Start the spark context

NameError: name 'conf' is not defined

However, this works fine in the following scenarios:

  • in your docker container (jupyterlab 3.0.16, pyspark 2.4.5, jupyterlab-sparkmonitor 3.0.1)
  • in my own conda environment (jupyterlab 3.1.9, pyspark 2.4.8, jupyterlab-sparkmonitor 3.1.0)

I wonder if pyspark 3.1 is not supported yet?

Exception when registering SparkListener

Hi,

I'm giving a try to this interesting extension, but I'm failing to initialize the SparkContext ( * ). I'm wondering if you have any idea to debug this ( ** ). Do you have any idea why it still don't find the proper class? I have just followed the instructions here for the installation ( *** )

Thanks in advance.

(*)
from pyspark.sql import SparkSession
spark = SparkSession.builder
.config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')
.config('spark.driver.extraClassPath', '/usr/local/spark/jars/listener.jar')
.getOrCreate()

(**)
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Exception when registering SparkListener
at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2398)
at org.apache.spark.SparkContext.(SparkContext.scala:555)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: sparkmonitor.listener.JupyterSparkMonitorListener
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2682)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)
... 13 more

(***)
https://github.com/itsjafer/jupyterlab-sparkmonitor#setting-up-the-extension

JupyterHub on k8s Integration

Has anyone had success installing this extension on JupyterHub singleuser lab notebooks?

Using:
jupyterlab-sparkmonitor version 4.1.0
pyspark==3.1.1
jupyterlab==3.1.9

and doing a copy of the ipython directory in this repo to /home/.ipython and setting IPYTHONDIR=/home/.ipython

I'm attempting to install via pip and have managed to get the extension to show as installed. The 'conf' variable exists and is usable, but when I start a Spark context no UI elements display. It does however, write out to a sparkmonitor_kernelextension.log

I'm wondering if maybe there's some necessary configuration I need to provide that I haven't.

Can this extension be integrated with user kernels in JupyterHub ?

Hi,

The app works fine with plain python3 kernel but doesn't work with user kernels (pyspark) spawned by JupyterHub, is the app only for plain python3 kernel or can this be integrated with user kernels in JupyterHub ?
Any help on this will be greatly appreciated.

Monitoring appears under wrong cell

When I use shortcut Shift + Enter to run the current cell (C0) and select below one (C1), monitoring appears under currently selected cell (C1), but I expect to see it under C0.

fails when trying to be installed in jupyterlab 2.x

Are the any plans of updating this extension to work with jupyterlab 2.x?

Currently the installation fails with a version conflict.

ValueError: The extension "jupyterlab_sparkmonitor" does not yet support the current version of JupyterLab.

Monitor does not with Spark-3.2.0

Hi,

when using the monitor using spark-3.2.0 the monitor fails to load/start properly.

here is the error message:
21/11/15 11:55:45 ERROR Utils: uncaught error in thread spark-listener-group-shared, stopping SparkContext
java.lang.NoSuchMethodError: 'org.json4s.JsonDSL$JsonAssoc org.json4s.JsonDSL$.pair2Assoc(scala.Tuple2, scala.Function1)'
at sparkmonitor.listener.JupyterSparkMonitorListener.onApplicationStart(CustomListener.scala:119)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
21/11/15 11:55:45 ERROR Utils: throw uncaught fatal error in thread spark-listener-group-shared
java.lang.NoSuchMethodError: 'org.json4s.JsonDSL$JsonAssoc org.json4s.JsonDSL$.pair2Assoc(scala.Tuple2, scala.Function1)'
at sparkmonitor.listener.JupyterSparkMonitorListener.onApplicationStart(CustomListener.scala:119)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
Exception in thread "spark-listener-group-shared" java.lang.NoSuchMethodError: 'org.json4s.JsonDSL$JsonAssoc org.json4s.JsonDSL$.pair2Assoc(scala.Tuple2, scala.Function1)'
at sparkmonitor.listener.JupyterSparkMonitorListener.onApplicationStart(CustomListener.scala:119)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:55)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
21/11/15 11:55:45 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:1142)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:231)
at org.apache.spark.SparkContext.(SparkContext.scala:646)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)

Exception sending socket message while running jobs

The plugin seems to behave as expected getting port from the environment ( * ). But when using the socket ( ** ) it gets an error Broken pipe (Write failed) e.g. ( *** ).

Be patient :) could you help me to investigate this?

Thanks in advance.

( * )

SPARKMONITOR_LISTENER: Started SparkListener for Jupyter Notebook
SPARKMONITOR_LISTENER: Port obtained from environment: 37836
SPARKMONITOR_LISTENER: Application Started: KubernetesSpark ...Start Time: 1589273651632

( ** )

case exception: Throwable => println("\nSPARKMONITOR_LISTENER: Exception sending socket message:" + exception + "\n")

( *** )

SPARKMONITOR_LISTENER: --------------Sending Message:------------------
{
  "msgtype" : "sparkJobStart",
  "jobGroup" : "null",
  "jobId" : 0,
  "status" : "RUNNING",
  "submissionTime" : 1589273663329,
  "stageIds" : [ 0 ],
  "stageInfos" : {
    "0" : {
      "attemptId" : 0,
      "name" : "count at <ipython-input-1-a5fef8d63630>:1",
      "numTasks" : 2,
      "completionTime" : -1,
      "submissionTime" : -1
    }
  },
  "numTasks" : 2,
  "totalCores" : 1,
  "appId" : "KubernetesSpark",
  "numExecutors" : 1,
  "name" : "count at <ipython-input-1-a5fef8d63630>:1"
}
SPARKMONITOR_LISTENER: -------------------------------------------------


SPARKMONITOR_LISTENER: Exception sending socket message:java.net.SocketException: Broken pipe (Write failed) 

Not working with Jupyter docker stacks

I'm trying to build a custom image using the jupyter docker stacks.

# Using jupyter/pyspark-notebook:spark-3.1.2
FROM jupyter/pyspark-notebook@sha256:22908e014eacdbb86d4cda87d4c215d0b2354d88f29a5fbc510d7c642da10851

USER root

RUN pip install jupyterlab-sparkmonitor

RUN mkdir /home/ipython \
    && ipython profile create --ipython-dir /home/ipython \
    && chown -R jovyan:users /home/ipython \
    && echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> /home/ipython/profile_default/ipython_kernel_config.py

# RUN jupyter nbextension install sparkmonitor --py
RUN jupyter serverextension enable --py --system sparkmonitor 

ENV IPYTHONDIR /home/ipython/ipython

USER ${NB_UID}

When I run it I'm getting the error

WARN: Jupyter Notebook deprecation notice https://github.com/jupyter/docker-stacks#jupyter-notebook-deprecation-notice.
/usr/local/bin/start-notebook.sh: running hooks in /usr/local/bin/before-notebook.d
/usr/local/bin/start-notebook.sh: running /usr/local/bin/before-notebook.d/spark-config.sh
/usr/local/bin/start-notebook.sh: done running hooks in /usr/local/bin/before-notebook.d
Executing the command: jupyter notebook
[I 17:39:36.585 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
[W 2021-11-19 17:39:37.661 LabApp] 'ip' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2021-11-19 17:39:37.661 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2021-11-19 17:39:37.661 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2021-11-19 17:39:37.661 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[I 2021-11-19 17:39:37.675 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.9/site-packages/jupyterlab
[I 2021-11-19 17:39:37.675 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 17:39:37.800 NotebookApp] Serving notebooks from local directory: /home/jovyan
[I 17:39:37.800 NotebookApp] Jupyter Notebook 6.4.0 is running at:
[I 17:39:37.800 NotebookApp] http://0a683f496ccb:8888/?token=2feb3628fe68b399bf2dc2e69b01ccba3007f2e6cb45168b
[I 17:39:37.800 NotebookApp]  or http://127.0.0.1:8888/?token=2feb3628fe68b399bf2dc2e69b01ccba3007f2e6cb45168b
[I 17:39:37.800 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:39:37.805 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/nbserver-8-open.html
    Or copy and paste one of these URLs:
        http://0a683f496ccb:8888/?token=2feb3628fe68b399bf2dc2e69b01ccba3007f2e6cb45168b
     or http://127.0.0.1:8888/?token=2feb3628fe68b399bf2dc2e69b01ccba3007f2e6cb45168b
[I 17:39:40.570 NotebookApp] 302 GET /?token=2feb3628fe68b399bf2dc2e69b01ccba3007f2e6cb45168b (172.17.0.1) 1.340000ms
[I 2021-11-19 17:39:48.215 LabApp] Build is up to date
[I 17:39:51.086 NotebookApp] Creating new notebook in 
[I 17:39:51.123 NotebookApp] Writing notebook-signing key to /home/jovyan/.local/share/jupyter/notebook_secret
[I 17:39:51.473 NotebookApp] Kernel started: 074d5f0e-e798-4106-bc1a-5f9251674ac5, name: python3
[W 17:39:55.156 NotebookApp] Got events for closed stream None
[IPKernelApp] ERROR | No such comm target registered: SparkMonitor
[IPKernelApp] ERROR | No such comm target registered: SparkMonitor
[I 17:40:11.661 NotebookApp] Starting buffering for 074d5f0e-e798-4106-bc1a-5f9251674ac5:e9836e60-7c01-42dd-9c4d-c28798e5d9ba
[I 17:40:41.307 NotebookApp] Shutting down on /api/shutdown request.
[I 17:40:41.308 NotebookApp] Shutting down 1 kernel
[I 17:40:41.510 NotebookApp] Kernel shutdown: 074d5f0e-e798-4106-bc1a-5f9251674ac5
[I 17:40:41.512 NotebookApp] Shutting down 0 terminals
SPARKMONITOR_SERVER: Loading Server Extension

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.