- Jupyter Notebook 3.2
- Conda Python 2.7.x
- pyspark, pandas, matplotlib, scipy, seaborn, scikit-learn pre-installed
- Spark 1.6.0 for use in local mode
- Unprivileged user
jovyan
(uid=1000, configurable, see options) in groupusers
(gid=100) with ownership over/home/jovyan
and/opt/conda
The following command starts a container with the Notebook server listening for HTTP connections on port 8888 without authentication configured.
docker run -d -p 8888:8888 -p 4040:4040 noleto/pyspark-jupyter
This configuration is nice for using Spark on small, local data.
- Run the container as shown above.
- Open a Python 2 notebook.
SparkContext
is already configured for local mode.
For example, the first few cells in a Python 2 notebook might read:
# do something to prove it works
rdd = sc.parallelize(xrange(1000))
rdd.takeSample(False, 5)
See base image page Minimal Jupyter Notebook Stack