Giter VIP home page Giter VIP logo

dataprocspawner's Introduction

DataprocSpawner

DataprocSpawner enables JupyterHub to spawn single-user Jupyter notebooks that run on Dataproc clusters. This provides users with ephemeral clusters for data science without the pain of managing them.

  • Product Documentation
  • DISCLAIMER: DataprocSpawner only supports zonal DNS names. If your project uses global DNS names, click this for instructions on how to migrate.

Quick Start

In order to use this library, you first need to go through the following steps:

  1. Select or create a Cloud Platform project.
  2. Enable billing for your project.
  3. Enable the Google Cloud Dataproc API.
  4. Setup Authentication.

Installation

Supported Python Versions

Python >= 3.6

Linux

git clone https://github.com/GoogleCloudPlatform/dataprocspawner
cd dataprocspawner && pip install .

Configuration

  1. Generate jupyterhub_config.py for JupyterHub

    jupyterhub --generate-config
  2. Within jupyterhub_config.py, set the spawner and GCP project. The project must be set.

    c.JupyterHub.spawner_class = 'dataprocSpawner.DataprocSpawner'
    c.DataprocSpawner.project = '{GCP project ID}'

Start JupyterHub

To start JupyterHub, run the command:

jupyterhub

Visit https://localhost:8000 in your browser, and sign in with your unix credentials.

  • Running JupyterHub locally is likely to run into issues due to firewall rules, use at your own risk/configuration.

Using Google Compute Engine (GCE)

To lessen the headache of setup, use the spawner by running JupyterHub within a Docker container that lives in a GCE VM.

Consider storing images on GCR for ease of deployment through GCE.

Starting a GCE VM

The VM must allow full access to all Cloud APIs for the spawner to work.

Google Cloud SDK

  1. Install gcloud, part of the Google Cloud SDK
  2. Create a VM on GCE
  • If using a Docker image hosted on GCR:

    gcloud beta compute instances create-with-container {VM name} --container-image={image URL} --container-arg="--DataprocSpawner.project={GCP project ID}" --scopes=cloud-platform --zone us-central1-a
    • The spawned notebook by default listens to port 12345 for a connection from the hub. To set a custom port, include this in the gcloud command:

      --container-args="--Spawner.port={port number}"
  • If manually building the Docker image:

    gcloud beta compute instances create {VM name} --image-family=cos-stable --image-project=cos-cloud --scopes=cloud-platform --zone us-central1-a

Google Cloud Platform Console

  1. Visit the Google Cloud Platform Console for GCE

  2. Create an instance

    images/create.png
  • Check the box under 'Container'

    images/checkbox.png
  • Provide the URL to the container image and set the project

    images/config.png
    • A custom port can also be set by adding another command argument
  • Set the access scopes

    images/scope.png
  • Hit create!

Configuration

  1. SSH into the VM
gcloud compute ssh {VM name}

Existing Docker Image

JupyterHub will be running once the VM has been created. No additional commands are necessary. The following is for configuring JupyterHub.

  • Find the name of the running Docker container

    docker ps
  • Run bash in the running container

    docker exec -it {container name} bash
  • Make changes as desired to jupyterhub_config.py (vim, cat, etc.) and exit the container

    • Installing vim while inside the conainer:

      apt-get update
      apt-get install vim
  • Restart the container for changes in jupyterhub_config.py to take effect

    docker restart {container name}
  • Check JupyterHub's logs to ensure changes took effect

    docker logs -f {container name}

Manual Docker Image

  • Clone the DataprocSpawner repo, includes a Dockerfile and jupyterhub_config.py
    git clone https://github.com/GoogleCloudPlatform/dataprocspawner
  • Add additional configurations to either file, do not change the existing contents.

  • If using a GCE instance running a container-optimized OS, allow connections from JupyterHub's REST API (defaults to port 8081)
    sudo iptables -w -A INPUT -p tcp --dport 8081 -j ACCEPT
  • Build a Docker image from the Dockerfile
    docker build -t jupyterhub .
  • Run a Docker container using the image
    docker run -it --net=host jupyterhub
    • The project can be passed as a container argument to Docker instead of setting it within jupyterhub_config.py.
      docker run -it --net=host jupyterhub --DataprocSpawner.project={GCP project ID}
    • If the Docker image will be used repeatedly, consider pushing the image to GCR.

Cloning the repo, building a Docker image, and pushing it to GCR can be done on a local machine. Follow the instructions for an existing Docker image from above to then use the pushed image on GCE.

Notes

  • DataprocSpawner defaults to port 12345, the port can be set within jupyterhub_config.py. More info in JupyterHub's documentation.

    c.Spawner.port = {port number}
  • The region default is us-central1 for Dataproc clusters. The zone default is us-central1-a. Using global is currently unsupported. To change region, pick a region and zone from this list and include the following lines in jupyterhub_config.py:

    c.DataprocSpawner.region = '{region}'
    c.DataprocSpawner.zone = '{zone that is within the chosen region}'

Disclaimer

This is not an official Google product.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.