Giter VIP home page Giter VIP logo

spark-standalone-cluster's Introduction

About the repo

This repo contains only the skeleton for running a spark standalone cluster extracted from this repo.

Running the code (Spark standalone cluster)

You can run the spark standalone cluster by running:

make run

or with 3 workers using:

make run-scaled

You can submit Python jobs with the command:

make submit app=dir/relative/to/spark_apps/dir

e.g. if you have ex6.py in your spark_apps folder:

make submit app=ex6.py

There are a number of commands to build the standalone cluster, you should check the Makefile to see them all. But the simplest one is:

make build

Web UIs

The master node can be accessed on: localhost:9090. The spark history server is accessible through: localhost:18080.

Fixing the links on the Spark master UI

Since we are running the spark cluster on docker, the worker related links do not work on the UI. To fix this I created a generate-docker-compose script that generates the docker compose file (called docker-compose.generated.yml) with the desired number of workers where each worker has assigned and exposed port number.

To bring up this cluster, you can just run:

make run-generated

By default, the command will launch a Spark cluster with a master, history server and 3 worker nodes.

Jupyter lab

After some time, I decided to add a jupyterlab service. For reference see this GitHub repo.

Jupyterlab will run on port 8888. There is a small example notebook on how to get started.

Stories published on Medium

  1. Setting up a standalone Spark cluster can be found here.
  2. Setting up Hadoop Yarn to run Spark applications can be found here.
  3. Using hostnames to access Hadoop resources can be found here.

spark-standalone-cluster's People

Contributors

mrn-aglic avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.