Giter VIP home page Giter VIP logo

architect-serverless-data-lake-on-aws's Introduction

Architect Serverless Data Lake on AWS

This sample will show you how to build a serverless data lake on AWS with minium effort to build and connect the workflow. Includes topics as following:

  1. Data Ingestion with Kinesis Firehose
  2. AWS Glue for data catalog
  3. Amazon Athena to query and analyze data on S3

Data Ingestion with Kinesis Firehose

Kinesis Data Firehose delivery streams continuously collect, transform, and load streaming data into the destinations that you specify.

Kinesis -> Kinesis Firehose -> create delivery stream 螢幕快照 2021-03-16 下午1 13 09

Enter Delivery stream name 螢幕快照 2021-03-16 下午1 13 36

Chose S3 as your target, and provide your S3 bucket name, and prefix 螢幕快照 2021-03-16 下午1 14 03

Your delivery stream created successfully 螢幕快照 2021-03-16 下午1 19 55

Click it to detail information page 螢幕快照 2021-03-16 下午1 20 07

Test it with demo data 螢幕快照 2021-03-16 下午1 20 15

Sending demo data 螢幕快照 2021-03-16 下午1 20 30

Wait for 5 min and go to S3 bucket and check data is generated and load to S3 螢幕快照 2021-03-16 下午1 36 49

AWS Glue for data catalog

The AWS Glue Data Catalog is an index to the location, schema, and runtime metrics of your data. You can use a crawler to populate the AWS Glue Data Catalog with tables. This is the primary method used by most AWS Glue users.

Glue -> Crawler -> Add Crawler 螢幕快照 2021-03-16 下午1 32 01

Provide Crawler name 螢幕快照 2021-03-16 下午1 35 20

Includes S3 path you want to crawl 螢幕快照 2021-03-16 下午1 38 56

Choose an IAM role 螢幕快照 2021-03-16 下午1 39 28

Configure your output database 螢幕快照 2021-03-16 下午1 42 01

Check crawler is created, and run it! 螢幕快照 2021-03-16 下午1 42 30

Crawler completed and made the following changes. 螢幕快照 2021-03-16 下午1 50 29

Check table is created in database 螢幕快照 2021-03-16 下午1 52 02

Check detail information in table 螢幕快照 2021-03-16 下午1 52 26

Amazon Athena to query and analyze data on S3

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

Athena -> chose database which you created table 螢幕快照 2021-03-16 下午1 55 01

Run SQL "SELECT * FROM "sampledb"."raw" limit 10;" 螢幕快照 2021-03-16 下午1 55 44

See results 螢幕快照 2021-03-16 下午1 55 49

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.