Giter VIP home page Giter VIP logo

pyspark-interactive-lecture's Introduction

Pyspark lecture

Interactive Spark lecture using RISE and PySpark.

Prerequisites

  • Anaconda
  • (Optional) Spark 2+ referenced inside a SPARK_HOME environment variable for Spark Streaming.

The following guide has Windows users in mind.

Install development environment

Python

We provide you with a environment.yml which Anaconda can use to create a Python environment named pyspark-interactive-lecture.

conda create -n pyspark-interactive-lecture python=3.7 pip-tools
conda activate pyspark-interactive-lecture
pip install -r requirements.txt

After installation, you should get access to invoke in a GNU terminal, and be able to see all defined tasks for the project with invoke -l.

Documentation on a task with invoke <task> -h.

Node

If you wish to use decktape for exporting your slides as a Reveal.js slidedeck with decktape, you need to prepare your environment. Node.js + npm is required for the following.

npm install

(Optional) Spark

This section is needed if you plan to use native Spark Streaming.

Download Spark with invoke downloadSpark, this will download the archive in bin/ then uncompress it in bin/spark.

In conf/log4j.properties, set log4j.rootCategory=WARN, console.

In conf/spark-defaults.conf, set

spark.sql.shuffle.partitions   4

Invoke task list

$ invoke -l
Available tasks:

  clean                  Clean irrelevant directories
  decktape               Specialized export of RISE notebook to a PDF file under the build/ directory
  downloadSpark          Download Spark archive in bin/
  launchSparkStreaming   Launch Spark Streaming structured_network_wordcount.py example
  nbconvert              Convert your lecture notebook to a HTML file, stored in the static/ directory.
  notebook               Launch jupyter notebook to edit notebook files. Ideal for modifying pyspark.ipynb

Run notebook for editing

Run a Jupyter Notebook session : invoke notebook.

If you need to pass a string of arguments : invoke notebook -a "--port=9000"

The invoke command will automatically send bin/spark as the SPARK_HOME environment variablen so you need to have downloaded Spark inside bin/spark before, which is normally easily done in the previous section. If you wish to change that use the --spark_home flag : invoke notebook -s path/to/spark.

Export slidedeck

nbconvert

We use a personalized nbconvert template pyspark-interactive-lecture.tpl to generate a correct Reveal.js HTML file from the notebook file.

$ invoke nbconvert -h
Usage: inv[oke] [--core-opts] nbconvert [--options] [other tasks here ...]

Docstring:
  Convert your lecture notebook to a HTML file, stored in the src/ directory. With -s/--serve argument, the HTML file is served by a local server as a Reveal.js slideshow.

Options:
  -a STRING, --transition=STRING
  -f STRING, --font-awesome-url=STRING
  -r STRING, --reveal-url-prefix=STRING
  -s, --serve
  -t STRING, --theme=STRING

invoke nbconvert will convert the notebook to a HTML file inside the static/ directory. You can then visualize them by double-clicking on the file, or with python -m http.server.

invoke nbconvert --serve to launch the HTML file with a local server for serving the slides as a Reveal.js slideshow.

decktape

Run the following command to run decktape on a background Jupyter notebook: invoke decktape

pyspark-interactive-lecture's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

wlscarthage

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.