Giter VIP home page Giter VIP logo

address-enrichment-and-caching-using-stepfunctions's Introduction

Address Enrichment and Caching Using AWS Step Functions by Leveraging Amazon Location Service

Traditional methods of performing address enrichment on geospatial datasets can be expensive and time consuming.

Using Amazon Location Service with AWS Step Functions for orchestration and with Amazon DynamoDB for caching in a serverless data processing pipeline, you may achieve significant performance improvements and cost savings on address enrichment jobs that use geospatial data.

This sample is an evolution to the already available sample, which only uses Lambda functions (can be found here).

Some of the improvements in this project includes:

The repository contains a SAM tempalte for deploying a Serverless Address Enrichment pipeline using:

It also uses sample data sourced from publicly available datasets that you can deploy and use to test the application.

This project addresses the concerns from the customers, how they can improve the performance of their application and at the same time optimize their costs.

Highlevel Architecture

image

  1. The Scatter Lambda function takes a data set from the S3 bucket labeled input and breaks it into equal sized shards.
  2. The Process Lambda function takes each shard from the pre-processed bucket and performs Address Enrichment in parallel calling the Amazon Location Service Places API and storing
  3. The Gather Lambda function takes each shard from the post-processed bucket and appends them into a complete dataset with additional address information.

Deploying the Project

Prerequistes:

To use the SAM CLI, you need the following tools:

This Sample Includes:

  • template.yaml: Contains the AWS SAM template that defines you applications AWS resources, which includes a Place Index for Amazon Location Service
  • statemachine/location_service_scatter_gather.asl.yaml: Contains the Step Functions ASL definition
  • functions/scatter/: Contains the Lambda handler logic behind the scatter function and its requirements
  • functions/process/: Contains the Lambda handler logic for the processor function which calls the Amazon Location Service Places API to perform address enrichment
  • functions/gather/: Contains the Lambda handler logic for the gather function which appends all of processed data into a complete dataset
  • tests/: TBD - Needs to contain test cases (Unit and Integration Tests)

Deploy the Sam-App:

  1. Use git clone https://github.com/aws-samples/address-enrichment-and-caching-using-stepfunctions to clone the repository to your environment where AWS SAM and python are installed.
  2. Use cd address-enrichment-and-caching-using-stepfunctionsto change into the project directory containing the template.yaml file SAM uses to build your application.
  3. If you have Docker installed, you can use sam build --use-container, otherwise, you can use sam build to build your application using SAM. You should see:
Build Succeeded

Built Artifacts  : .aws-sam/build
Built Template   : .aws-sam/build/template.yaml

Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Test Function in the Cloud: sam sync --stack-name {stack-name} --watch
[*] Deploy: sam deploy --guided
  1. Use sam deploy --guided to deploy the application to your AWS account. Enter responses based on your environment:
Configuring SAM deploy
======================

        Looking for config file [samconfig.toml] :  Not found

        Setting default arguments for 'sam deploy'
        =========================================
        Stack Name [sam-app]: address-enrichment
        AWS Region [us-west-2]: us-east-1
        #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
        Confirm changes before deploy [y/N]: Y
        #SAM needs permission to be able to create roles to connect to the resources in your template
        Allow SAM CLI IAM role creation [Y/n]: Y
        #Preserves the state of previously provisioned resources when an operation fails
        Disable rollback [y/N]: N
        Save arguments to configuration file [Y/n]: Y
        SAM configuration file [samconfig.toml]: 
        SAM configuration environment [default]: 

Testing the Application

Download the below samples locally, unzip the files, and upload the CSV to your input S3 bucket to trigger the adddress enrichment pipeline.

Geocoding: City of Hartford, CT Business Listing Dataset

Reverse Geocoding: Miami Housing Dataset

Cleanup

In order to avoid incurring any charges, this section talks about cleaning up the AWS resources, which got created when following through this sample.

Pre-req:

Make sure you empty the following S3 buckets before deleting the Cloud Formation Stack (as the deletion will fail for non-empty buckets):

  • input-stack-name-aws-region-aws-accountnumber
  • raw-stack-name-aws-region-aws-accountnumber
  • processed-stack-name-aws-region-aws-accountnumber
  • destination-stack-name-aws-region-aws-accountnumber

Method 1:

To delete the resources you created as part of this sample, you can run sam delete:

sam delete                                                                                                                                                     
        Are you sure you want to delete the stack address-enrichment in the region us-east-1 ? [y/N]: y
        Are you sure you want to delete the folder address-enrichment in S3 which contains the artifacts? [y/N]: y
        - Deleting S3 object with key address-enrichment/c2710045fb8c4c4d77e47fba2f9754e4
        - Deleting S3 object with key address-enrichment/c5ca75d7c52419e4077a3c030d76d812
        - Deleting S3 object with key address-enrichment/04c2cdceeee06f8998eccf77fc6ffb9b
        - Deleting S3 object with key address-enrichment/f1e2091b2a434fd87f023b603e23fe10
        - Deleting S3 object with key address-enrichment/5a46e427cf72552a09e714f3a5c16461.template
        - Deleting Cloudformation stack address-enrichment

Deleted successfully

Method 2:

Alternatively, you can delete the AWS CloudFormation Stack by logging in to your AWS Console and navigating to AWS CloudFormation service. Then select Stacks. After selecting the Stack you want to delete, click on Delete button on top right.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

address-enrichment-and-caching-using-stepfunctions's People

Contributors

amazon-auto avatar syedair avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.