Giter VIP home page Giter VIP logo

lambda-delta-optimize's Introduction

Delta Optimize Lambda

This Lambda function can be used with a periodic trigger to optimize a configured Delta Lake table. Consult the deployment.tf file for an example of how to provision the function in AWS.

Building

Building and testing the Lambda can be done with cargo: cargo test build.

In order to deploy this in AWS Lambda, it must first be built with the cargo lambda command line tool, e.g.:

cargo lambda build --release --output-format zip

This will produce the file: target/lambda/lambda-delta-optimize/bootstrap.zip

Infrastructure

The deployment.tf file contains the necessary Terraform to provision the function, a DynamoDB table for locking, and IAM permissions. This Terraform does not provision an S3 bucket to optimize.

After configuring the necessary authentication for Terraform, the following steps can be used to provision:

cargo lambda build --release --output-format zip
terraform init
terraform plan
terraform apply
ℹī¸

Terraform configures the Lambda to run with the smallest amount of memory allowed. For a sizable table, this may not be sufficient for larger tables.

Environment variables

The following environment variables must be set for the function to run properly

Name Value Notes

DATALAKE_LOCATION

s3://my-bucket-name/databases/bronze/http

The s3:// URL of the desired table to optimize.

AWS_S3_LOCKING_PROVIDER

dynamodb

This instructs the deltalake crate to use DynamoDB for locking to provide consistent writes into s3.

OPTIMIZE_DS

yesterday

Only apply optimizations to the ds partition (YYYY-mm-dd), the yesterday value will use the previous day UTC

Licensing

This repository is intentionally licensed under the AGPL 3.0. If your organization is interested in re-licensing this function for re-use, contact me via email for commercial licensing terms: [email protected]

lambda-delta-optimize's People

Contributors

rtyler avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.