Giter VIP home page Giter VIP logo

vagrant-pyspark's Introduction

Vagrant-PySpark

Vagrant-PySpark is a Vagrant box that can be provisioned with any Spark version, ready to run Spark jobs (included PySpark) and unit testing for PySpark.

It is intended to be used only for development and testing with small data sets.

Set up

To start and provision the vagrant box you must set a file (ansible/variables.yml) with required variables:

  • Scala version
  • Spark version
  • Hadoop version

Versions must match with the one provided here:

Variable file should contain following variables:

scala:
  version: 2.11.8
spark:
  version: 2.1.0
hadoop:
  version: 2.7
  

You can find examples for Spark 1.6.3 and 2.1.0 in this repo:

You can create a symbolic link to use them:

ln -s vars/vars_spark_2.1.0.yml ansible/variables.yml

If you use other versions, PRs are welcome with your version setup.

Required

How to use

Clone your projects

Set up the Vagrant box and clone your projects inside to run your jobs and tests.

Sync you projects folder

You can fork this repo and extend the Vagrant file to sync your projects folder in the Vagrant box. It will allow you to have all your changes immediately available to run in the Vagrant box.

config.vm.synced_folder "/Project/path/in/host/machine", "/Destination/in/vagrant/box"

Copy this project

You can copy this project inside your Spark project and have all together.

PySpark Unit Testing

You can find good explanation and examples here

vagrant-pyspark's People

Contributors

javibravo avatar vincetse avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

vincetse

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.