Giter VIP home page Giter VIP logo

presto-on-aws's Introduction

Presto on AWS

This is a cloudformation template for deploying Presto on AWS. It deploys coordinators and workers in an autoscaling group.

Features

  • Graceful shutdown of workers using Autoscaling lifecycle management. Presto worker will not shutdown until all the queries finish on that worker.
  • Highly available coordinator nodes.
  • Autoscaling of presto workers based on presto's memory and CPU usage.
  • A cloudwatch and prometheus agent which runs inside presto coordinator to push presto's metrics such as input data, CPU usage, running/blocked/failed queries.
  • A query logger which pushes completed queries and its stats to ElasticSearch.
  • A presto AMI creation packer script to easily update presto version.
  • Logs of presto coordinator and workers available in Cloudwatch.
  • Health check in Presto workers to remove unhealthy workers

Architecture

Screen Shot 2020-06-04 at 8.52.37 PM

Pre-requisites

  • A VPC and subnet
  • A user with following permissions. // TODO: Add permissions

Modiying the presto configuration

  • New connectors: To add presto connectors (like hive connector, postgres connector etc) configuration to the deployment create a directory with following structure. Add properties file for each connector, zip the directory. Add the connector file copying command into the boostrap script in CFT.

      ├── catalog 
      │   ├── hive.properties
      │   ├── jmx.properties
      │   ├── tpcds.properties
      │   └── tpch.properties
      └── config.properties
    
  • To modify the core configuration such as enabling spill or reserved pool disabling/enabling modify the config.properties file mentioned above. Memory based configurations like JVM memory, max memory per node is automatically handled based on selected instances.

Add the URL of above directory as zip file in AdditionalConfigsUri parameter in CFT.

Creating Presto AMI using Packer

  • Go inside packer directory and change the parameters of .atlanrc file. The presto version is 330 by default. Source AMI is Amazon Linux 2 in the region you want to create the AMI in.
  • Run the following command
  make build_presto_image
  • Change the presto.json to modify the AMI further.
  • To use this AMI add the AMI ID in the mapping in presto.yaml with AMI's region.

Deployment

The CFT requires following parameters for deployment

  • VPC ID: VPC to deploy Presto cluster
  • Subnet ID: Subnet to deploy Presto cluster
  • Security groups ID: SGs to attach to presto coordinators and workers
  • Keyname: Private key to use to launch presto machines
  • Coordinator Instance type: EC2 Instance type for coordinator
  • Coordinator Instance Count: For HA Coordinator deployment set it to 2 else set it to 1.
  • Min workers count: Minimum numbers of EC2 machines in Presto workers ASG
  • Max workers count: Maximum numbers of EC2 machines in Presto workers ASG
  • Workers instance type: EC2 Instance type for workers
  • Presto Version: Presto version, required for compatibility before and after version 330
  • EC2 Root volume size: EBS Volume size (GB) for presto workers and coordinators. Increase the value to few hundred GBs if you have disk spill based workload.
  • Hive IP: Format thrift://<ip>:9083
  • Elasticsearch Host: Elasticsearch host for query logger to push SQL queries into.
  • Elasticsearch Port: Elasticsearch port for query logger to push SQL queries into
  • Environment: Identifier for Dev, Production presto clusters.

Create the AMI and provide the ID with region in CFT. Now deploy the CFT by following the guide from AWS.

Configuring autoscaling of workers

You can configure presto workers autoscaling based on metrics from presto like running queries, heap usage etc. These metrics gets pushed into Cloudwatch by presto coordinator. You can configure the Cloudwatch alarams and autoscaling based on these Cloudwatch Metrics.

Limitations/Future work

  • Add support for TLS in the deployment.
  • Graceful shutdown lambda only waits for 1 hour for queries to finish. Add feature to wait to terminate the worker until all the queries finish on that worker.
  • High availibility feature only switches between standby and live coordinator but doesn't restart the failed coordinator.
  • No retention policy configuration for presto logs in Cloudwatch

Contribute

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

presto-on-aws's People

Contributors

arpit1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.