Giter VIP home page Giter VIP logo

sparktan's Introduction

Spartkan

Your First Project

You can create a new project using:

> sparktan quickstart my_project

Creating your my_project sparktan project...
create    my_project
create    my_project/.gitignore
create    my_project/requirements.txt
create    my_project/config.json
create    my_project/README.md
create    my_project/main.py

Done.

Try running your project with:

	cd my_project
	sparktan run

Project Structure

filename description
config.json Configuration of the EMR cluster
requirements.txt requirements.txt of the virtual environment
main.py Your awesome Spark script
wheels/ [optional] Your local wheels package

Fetch info about existing cluster

> sparktan list

2015-10-23 11:02:58,694 [INFO] botocore.credentials - Found credentials in environment variables.
2015-10-23 11:02:58,927 [INFO] botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): elasticmapreduce.us-east-1.amazonaws.com
{u'Id': u'j-27KUKY8IY12Y0',
'MasterPrivateIpAddress': u'10.21.0.177',
'MasterPublicDnsName': u'ec2-54-172-110-197.compute-1.amazonaws.com',
u'Name': u'my_project',
u'NormalizedInstanceHours': 19200,
u'Status': {u'State': u'WAITING',
	         u'StateChangeReason': {u'Message': u'Waiting for steps to run'},
    	     u'Timeline': {u'CreationDateTime': datetime.datetime(2015,10, 19, 10, 50, 3, 501000, tzinfo=tzlocal()),
              u'ReadyDateTime': datetime.datetime(2015, 10, 19, 11, 6, 13, 23000, tzinfo=tzlocal())}}}

Terminate your cluster

> sparktan terminate j-27KUKY8IY12Y0

Dependencies

Public Python package dependencies should be placed into the requirements.txt file. Sparktan will create a dedicated virtualenv on each node with the proper packages installed.

Altough it's possible to install python packages from private Github repo, I recommend pushing those packages to the node using wheels files.

You can build wheels package from your private github repos using:

> python setup.py sdist bdist_wheel

and place those wheels files directly in the "wheels" folder of you Sparktan project.

Credentials

In other to create your cluster, you need the proper AWS credentials:

  • A AWS user with enough permissions. Fox example: AmazonElasticMapReduceFullAccess, AmazonElasticMapReduceRole, AmazonS3FullAccess, etc
  • Your aws_access_key_id, aws_secret_access_key and region in ~/.aws/config (boto3)

sparktan's People

Contributors

mlaprise avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.