Skip to content

Ben-Epstein/sparkmonitor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

Spark Monitor Fork - A fork of SparkMonitor that works with multiple Spark Sessions

About

+ =
SparkMonitor is an extension for Jupyter Notebook that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself.

jobdisplay

Features

  • Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
  • A table of jobs and stages with progressbars
  • A timeline which shows jobs, stages, and tasks
  • A graph showing number of active tasks & executor cores vs time
  • A notebook server extension that proxies the Spark UI and displays it in an iframe popup for more details
  • For a detailed list of features see the use case notebooks
  • How it Works

Build from source

npm version: 5.6.0 yarn version: 1.22.4 sbt version: 1.3.2

cd sparkmonitor/extension
#Build Javascript
yarn install # Only need to run the first time
yarn run webpack
#Build SparkListener Scala jar
cd scalalistener/
sbt package

Run Locally

docker build -t sparkmonitor .
docker run -it -p 8888:8888 sparkmonitor

Deploy New Version

cd sparkmonitor/extension
vi VERSION # bump version number
python setup.py sdist
twine upload --repository-url https://upload.pypi.org/legacy/ dist/*

If twine upload step fails, run rm -rf dist/*, bump the VERSION number, and rerun steps above.

Quick Installation

pip install sparkmonitor-s
jupyter nbextension install sparkmonitor --py --user --symlink 
jupyter nbextension enable sparkmonitor --py --user            
jupyter serverextension enable --py --user sparkmonitor
ipython profile create && echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

For more detailed instructions click here

Integration with ROOT and SWAN

At CERN, the SparkMonitor extension would find two main use cases:

  • Distributed analysis with ROOT and Apache Spark using the DistROOT module. Here is an example demonstrating this use case.
  • Integration with SWAN, A service for web based analysis, via a modified container image for SWAN user sessions.

About

Monitor Apache Spark from Jupyter Notebook

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 40.6%
  • Jupyter Notebook 27.7%
  • Scala 13.9%
  • CSS 7.1%
  • Python 7.1%
  • HTML 3.3%
  • Dockerfile 0.3%