Giter VIP home page Giter VIP logo

emr-autoscaling's Introduction

Description

Scale your AWS Elastic MapReduce Cluster by automatically adding or removing Task Instances. Every 5 minutes an AWS Cloudwatch Rule triggers an AWS Lambda Function which checks AWS Cloudwatch Metrics to decide whether to scale up or down.

Scaling Rules

Scaling is only initiated when no scaling is currently in progress. In addition downscaling is not performed during office hours. Apart from that the following rules are used to decide whether to scale or not.

  • scaling up
    • at least 1 YARN container has been pending during the past 5 minutes
    • at least 1 task instance group is not running its maximum of configured instances
  • scaling down
    • average memory consumption by YARN is below a given threshold for the last hour
    • at least 1 task instance group is running above its minimum of configured instances
    • the current time is not in office hours on a week day

Instance Group Selection

Currently only task instance groups are eligible for scaling and only those with a spot bid price. If the cluster has more than one task instance group it sorts all groups by their bid price in descending order and then selects the first eligible group for scaling.

Build

This project is built using PyBuilder. To setup your build environment simply do the following:

virtualenv -p python2.7 venv
source venv/bin/activate
pip install pybuilder
pyb install_dependencies

To perform a build, i.e. execute unit tests and package the zip file for AWS Lambda:

pyb -X package_lambda_code

Deployment

automatic way

deploy changes to AWS

committing changes triggers a teamcity build

Link to teamcity build

update lambda function code

after the teamcity build has finished, run

aws lambda update-function-code --function-name insights-cluster-AutoscalingStack-ScalingFunction-<CURRENT_ID> --region eu-west-1 --s3-bucket is24-data-pro-artifacts --s3-key emr/lambda_autoscaling/latest/emr-autoscaling.zip

on aws cli

semi-manual way

Upload Lambda Function to S3

To upload the lambda Function to S3, run the following command with your S3 bucket name:

pyb -X -P bucket_name=<S3-bucket-name> upload_zip_to_s3 lambda_release

The upload_zip_to_s3 part of the above command loads the zip file which has been packaged previously into the S3 bucket as specified with the bucket_name parameter. The key is emr/lambda_autoscaling/<project-version>/emr-autoscaling.zip. The lambda_release part copies the uploaded file from emr/lambda_autoscaling/<project-version>/emr-autoscaling.zip to /emr/lambda_autoscaling/latest/emr-autoscaling.zip.

(Re-)Deploy Cloudformation Stack

The Cloudformation Stacks are deployed using cfn-sphere. Since you cannot update lambda functions with Cloudformation (i.e. with new code), it is neccessary to recreate the stack.

You can delete an already deployed stack with the following statement:

cf delete -c src/main/resources/cfn/stacks.yaml

To deploy the stack - and thus make sure that it uses the latest version of the lambda function - you can do the following (replace with your own parameter values):

cf sync \
  -c \
  --parameter "emr-autoscaling.scalingFunctionCodeBucket=<S3-bucket-name>" \
  --parameter "emr-autoscaling.emrJobFlowId=<EMR-cluster-id>" \
  src/main/resources/cfn/stacks.yaml

The function offers a few parameters to customize its behaviour. These are described in the next section. You can override the defaults simply by adding --parameter "<parameter-name>=<parameter-value>" snippets to the above command.

Parameters

Mandatory Parameters

  • scalingFunctionCodeBucket
    • S3 Bucket into which the scaling function is uploaded
    • prefix is /emr/lambda_autoscaling/<project-version>/emr-autoscaling.zip
    • in addition the latest version is copied to /emr/lambda_autoscaling/latest/emr-autoscaling.zip
  • emrJobFlowId
    • ID of the EMR cluster which is to be scaled

Optional Parameters

  • emrDownScalingMemoryAllocationThreshold
    • when the average memory consumption by YARN drops below this value a downscaling is triggered
    • floating point in range [0.0, 1.0]
    • defaults to 0.6
  • emrScalingMinInstances
    • minimum number of instances that has to be kept for each task instance group
    • integer >= 0
    • defaults to 0
  • emrScalingMaxInstances
    • maximum number of instances that is allowed for each task instance group
    • integer >= 0
    • defaults to 20
  • officeHoursStart
    • begin of office hour range during which no downscaling will be initiated
    • integer between 0 and 24
    • defaults to 8
  • officeHoursEnd
    • end of office hour range during which no downscaling will be initiated
    • integer between 0 and 24
    • defaults to 18

emr-autoscaling's People

Contributors

apelivan avatar christiandietze avatar jincejames avatar muhammadshaker avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.