This is a simple set up for test driving PySpark scripts. It is fairly portable, as all you need to use it are:
docker
version 17.06.1-ce or greaterdocker-compose
version 1.14.0 or greater- PyCharm Ultimate edition 2017.2 or greater (need ultimate for Docker based interpreter setup)
Run tests and build the test/dependency docker
docker-compose build test
As you make changes, run tests via that docker
docker-compose build test
-or- run tests inside of the PyCharm Ultimate IDE
- Configure PyCharm to talk to Docker
- Configure a remote interpreter for this project and have it use
pyspark-fun-test
as the docker that provides the interpreter
You can then run the Jupyter notebook with the python files exposed as
docker-compose up -d run