Giter VIP home page Giter VIP logo

ids721-proj4's Introduction

IDS721-Proj4

This is my study for AWS DynamoDB, SQS, S3, AWS Lambda, AWS Comprehend to build Data Engineering Pipeline

Overview WorkFlow

serverless

Set Up

To build this serverless architecture on AWS from scratch, follow these steps:

Step 1: Create DynamoDB table, SQS queue, S3 bucket & working environment

  • Open AWS DynamoDB console and create a table called "fang". For the primary key, use "name" with type "string".

  • Click on the newly created table and add items with names like "Apple", "Google", etc.

  • Open AWS SQS console and create a standard queue called "readDB" .

  • Open AWS S3 console and create a bucket called "fangsentimentsforids".

  • Open AWS Cloud9 console, open the IDE of your working environment, create a new directory for this project, and cd into it.

Step 2: Create producer-side Lambda function (triggered by a CloudWatch timer)

This function reads data from the DynamoDB table and puts messages into the SQS queue.

2.1. Create a SAM application

  • In the Cloud9 IDE, click on "AWS", then right-click on "Lambda", and choose to "Create Lambda SAM Application".

  • For settings of the SAM app, select Python 3.7/3.8 as the runtime, select "AWS SAM Hello World" as the SAM application template, choose the directory to store the files, and name your app "readDBValue".

  • In the Cloud9 IDE, click on "Environment" and check the file hierarchy. Find your Lambda function's folder and open the sub-folder "hello-world".

  • Replace "app.py" with the file lambda-producer/hello_world/app.py from this repository.

  • Modify app.py, change the name of the DynamoDB table and the name of the SQS queue.

  • Replace "requirements.txt" with the file lambda-serverless/hello_world/requirements.txt from this repository.

2.2. Build and deploy

Build the application using SAM:

sam build --use-container
sam deploy --guided

Step 3: Add Permission

  • Access the AWS Lambda console and locate the new app "sam-app" and a new Lambda function named "sam-app-HelloWorldFunction-xxxx". Click on the function, then navigate to "Configuration" > "Permission", and click on the existing role to be directed to the IAM console.

  • In the new window, click "Attach Policies". On the subsequent page, find the "AdministratorAccess" policy and attach it.

  • Return to the previous Lambda function page in the AWS Lambda console and select "Triggers". Remove the "API Gateway" trigger.

  • Add a new EventBridge (CloudWatch Events) trigger. For its configuration, create a new rule and name it "OneMinuteTimer" (or any desired name). For the "schedule expression", input "rate(1 minute)".

Step 4: Test lambda-producer

  • You can now enable the trigger and view the messages in the SQS queue.

  • In the AWS SQS console, click on the "readDB" queue, followed by "Send and receive messages", and then "poll for messages".

  • In the AWS Lambda console, on the specific function page, click on "Monitor" to find more details about the activity. You can also select "View Logs in CloudWatch".

You have the option to disable the trigger and purge the SQS queue at any time.

Step 5: Create a second SAM application named "lambda-consumer" like the previous one

  • Replace "app.py" with the file lambda-consumer/hello_world/app.py from this repo.

  • Modify app.py to update the REGION and bucket name.
    ATTENTION: you need select REGION as us-east-1, otherwise, the AWS Comprehend will not be supported in other area like us-west-1

  • Replace "requirements.txt" with the file lambda-checkSQS/hello_world/requirements.txt from this repo.

Step 6: Build and Deploy

  • Run sam build --use-container.
  • Run sam deploy --guided.

Step 7: Add Permission & Trigger to lambda-consumer function

  • Add permission and remove the "API Gateway" trigger as done previously.

  • Add a new SQS trigger with your queue selected and a batch_size of 1.

  • Modify Lambda Function Timeout.

To successfully write results to S3, modify the timeout for the second Lambda function:

  • Open the second Lambda function's page, click on "Configuration", and then click on "General configuration".
  • Set the timeout to 1 minute.

Step 8: Test Overall Pipeline

You can now activate both triggers for the two functions and view the output in the S3 bucket, where you will find a csv file with the sentiment analysis output

Clear the activity by disabling the triggers and purging the SQS queue

projresult

Demo Video

Demo.mp4

References

Source code

ids721-proj4's People

Contributors

june-rains avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.