Day 4 – Lab1:
Docker container for Kafka - Spark streaming - Cassandra
This Dockerfile sets up a complete streaming environment for experimenting with Kafka, Spark streaming (PySpark), and Cassandra. It installs
- Kafka 0.10.2.1
- Spark 2.1.1 for Scala 2.11
- Cassandra 3.7
It additionnally installs
- Anaconda distribution 4.4.0 for Python 2.7.10
- Jupyter notebook for Python
Quick start-up guide
Run container using DockerHub image
docker run -p 4040:4040 -p 8888:8888 -p 23:22 -ti --privileged yannael/kafka- sparkstreaming-cassandra
Note that any changes you make in the notebook will be lost once you exit the
- container. To keep the changes, it is necessary to put your notebooks in a folder on your
host, that you share with the container, using for example
docker run -v `pwd`:/home/guest/host -p 4040:4040 -p 8888:8888 -p 23:22 -ti -- privileged yannael/kafka-sparkstreaming-cassandra
Note:
- The "-v pwd:/home/guest/host" shares the local folder (i.e. folder containing
Dockerfile, ipynb files, etc...) on your computer - the 'host') with the container in the '/home/guest/host' folder.