Giter VIP home page Giter VIP logo

data-analysis-using-aws-services-athena-glue-s3-iam-quicksight's Introduction

Data-analysis-using-AWS-services-Athena-Glue-S3-IAM-Quicksight

This is an end-to-end simple data analytics solution using AWS services. From uploading the csv file to S3 bucket to visualizing results in Quicksight.The dataset used in this project is the data science job salaries from kaggle 'kaggle.com/datasets/ruchi798/data-science-job-salaries'.

Objective

The main objective of this project is to identify the top 5 popular data science salary in US based on job titlte, experience level, employment type, and remote job ratio by job title.

Dataset

The dataset contains variables which include work_year, experience_level, enployement_type, job_type, salary, salary_currency, salary_in_usd, employee_residence, remote_ratio, company_location, company_size.The data analysis process will be done as follows:

Step 1- First, we create an IAM user to grant access permission to s3. In the search bar we type IAM > users > add users

iam8

1a- Set user details and access type then next permissions

iam9

1b- We proceed by choosing attach existing policies directly since we already have a policy set up.

Capture 2

1c-Review all the details and create user iam13

1d- User successfully created

iam14(hide account   access key)

Step 2- S3 buckets are created. The data-science-salaries-bucket will hold the raw file, while the data-science-salaries-bucket-result will hold the query results from Athena. Capture1

2a- The csv file is uploaded to the data-science-salaries-bucket

Capture9

Step 3- Moving on to Athena. Before we can create our table we need to choose the bucket where the output query will be sent. In the Athena query editor we select settings > manage > browse s3 to choose the appropriate bucket.

3a- In Athena data catalogue, we select >create table > AWS glue crawler > add crawler to retrieve data information schema automatically. Capture12

3b- Crawler succesfully created

Capture (crawler was created) 14

Step 4- Data query is performed in Athena, then results are loaded to data-science-bucket-result athena_queries 1PNG

Step 5- Now quicksight needs to access S3 to build report. But before quicksight can read the s3 bucket, we have to make sure it has permission to do so. We navigate the account section by clicking on top right > manage quicksight > security & permissions > manage > select s3 bucket.

quicksight_bucket_permission

5a- Next we set up a new data source to access S3 from quicksight new analysis > new dataset > S3 > upload Json manifest file > importe to spice

5b- After the data is imported to spice, we create a report in Quicksight. Our interest was to identify top 5 popular data science salary in US based on job titlte, experience level, employment type, and remote job ratio by job title. quicksight__

Conclusion

Aws provides a suite of powerful tools to analyze data effectively. By using Athena, Glue, IAM, and Quicksight, businesses can gained valuable insights into their data, make informed decisions and optimize processes.

data-analysis-using-aws-services-athena-glue-s3-iam-quicksight's People

Contributors

sigrid242 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

anjaliravip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.