Giter VIP home page Giter VIP logo

aws-glue-pipeline-demo-cdk's Introduction

AWS Glue Pipeline Demo using CDK

This project aims to create a simple data Pipeline using AWS Glue and CDK. We are going to populate sample data into a source bucket read it through the pipeline and spill it out into another bucket.

Source Data

The source CSV data comes from Kaggle and it's already been copied into the source-data folder.

Bootstrap CDK

This pipeline was created using AWS CDK. The first thing you need to do is to bootstrap your AWS account.

I created a small helper script to allow you to customize the names of the bootstrap resources. It will update the qualifier inside cdk.json as well:

# ./bootstrap.sh [qualifier]
./bootstrap.sh teststack

Example of output:

$ ./bootstrap.sh teststack
teststack
 ⏳  Bootstrapping environment aws://NNNNNNNN/us-west-1...
Trusted accounts for deployment: (none)
Trusted accounts for lookup: (none)
Using default execution policy of 'arn:aws:iam::aws:policy/AdministratorAccess'. Pass '--cloudformation-execution-policies' to customize.
cdk-teststack-toolkit: creating CloudFormation changeset...
cdk-teststack-toolkit |  0/12 | 20 h 32 min 51 s | REVIEW_IN_PROGRESS   | AWS::CloudFormation::Stack | cdk-teststack-toolkit User Initiated
...
cdk-teststack-toolkit |  0/12 | 20 h 33 min 02 s | CREATE_IN_PROGRESS   | AWS::IAM::Role          | ImagePublishingRole 
cdk-teststack-toolkit |  0/12 | 20 h 33 min 02 s | CREATE_IN_PROGRESS   | AWS::S3::Bucket         | StagingBucket 
cdk-teststack-toolkit |  0/12 | 20 h 33 min 02 s | CREATE_IN_PROGRESS   | 
...
AWS::IAM::Role          | DeploymentActionRole 
cdk-teststack-toolkit | 12/12 | 20 h 33 min 45 s | CREATE_COMPLETE      | AWS::CloudFormation::Stack | cdk-teststack-toolkit 
 ✅  Environment aws://NNNNNNNN/us-west-1 bootstrapped.

Pipeline Architecture

The app is divided into 3 stacks:

architecture

  • Stage Stack: Creates the Staging Bucket with notification to SQS Queue.
  • Glue Demo Stack: Creates the AWS glue infra (database, crawler, job, triggers), an EventBridge rule to trigger the workflow and the destination bucket to save data into.
  • Import File Stack: Deploys a local CSV file into the staging bucket thus triggering the pipeline.

You can run the stacks one at a time:

cdk deploy teststack-StageStack --require-approval never
cdk deploy teststack-AwsGlueDemoStack --require-approval never
cdk deploy teststack-ImportFileStack --require-approval never

Reference

aws-glue-pipeline-demo-cdk's People

Contributors

felipempda avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.