Giter VIP home page Giter VIP logo

etl-operation_glue_s3's Introduction

ETL-Operation_Glue_S3_Athena_AWS QuickSight

Overview:

Perform an ETL operation on CSV Data using AWS Glue residing on S3 Bucket, run crawler and create a glue catalog for querying using Athena and finally create visualization using Quicksight

glue drawioc

Task Details

Sign in to AWS Management Console

  1. Upload CSV file to a Bucket on S3

  2. Create a Glue crawler

  3. Create IAM role to be used by crawler

  4. Create a Glue Job

  5. Run the crawler, to create a table

  6. Use Athena to query the tables or to create views out of tables.

  7. Use Athena Tables to create visualization using QuickSight

Glue:

Glue is a fully managed extract, transform and load service that makes it simple and cost effective to categorize data.

And glue is completely serverless. You don't have to manage any infrastructure yourself.

Glue has five components :

→ First one is Central metadata repository, which is known as AWS Glue Data Catalog.

→ Second is Glue Crawlers that scans data and populate the data catalog.

→ Third one is ETL Engine that automatically Generates Python or scala code for all of your transformations or the data enrichment process.

→ The fourth one is the Glue Triggers. Which acts as scheduler for all your jobs. You can trigger any crawler or job on completion of another job.

→ The last one is Glue Workflows by which you can orchestrate the different steps of your ETL jobs meta Glue Crawlers.

Crawler is responsible for scan data. Data could be in S3 or DynamoDB or any relational DB. And after scanning the data, it creates a metadata tables in Central Glue data Catalog.
The table that is created in the data catalog can be used by other AWS services such as Athena, Redshift, Spectrum and ETL Jobs.

Three major use case of Athena

→ Analyze unstructured, semi-structured, and structured data stored in Amazon S3. Examples include CSV, JSON, or columnar data formats such as Apache Parquet and Apache ORC.

→ Athena integrates with Amazon QuickSight for easy data visualization. You can use Athena to generate reports or to explore data with business intelligence.

→ Athena integrates with the AWS Glue Data Catalog, which offers a persistent metadata store for your data in Amazon S3.

AWS QuickSight

Amazon QuickSight is a fully managed , fast, cloud-powered business intelligence service. It lets you easily create and publish interactive dashboards that include machine learning (ML) insights and delivers to everyone in your organization

Steps to Follow

  1. Create a bucket to S3 and upload CSV data to the bucket
  2. Create IAM role to be used by crawler
  3. Create a glue crawler by AWS glue service
  4. Crawler will go and extract the data from the S3 bucket and create table in Glue Data Catalogue. These tables access by Athena from where we can run the Athena query
  5. Create another folder to S3 where Athena will keep save the query
  6. Run a few queries to test Athena is successfully able to query the database
  7. Load Data from Athena to Qucksight to Create Visualization
  8. QuckSight will help create diffrent types Visualization as per own Choice

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.