Giter VIP home page Giter VIP logo

more-ganesh07 / videos-analytics-with-etl-integration-on-aws Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 35 KB

Designed and executed a secure data pipeline for managing and analyzing semi-structured data

Home Page: https://github.com/more-ganesh07/Videos-Analytics-with-ETL-Integration-on-AWS

Python 83.09% Shell 16.91%
athena aws aws-glue aws-iam aws-lambda etl-pipeline quicksight s3-bucket cloud-computing data-pipeline

videos-analytics-with-etl-integration-on-aws's Introduction

Data Engineering YouTube Analysis with AWS

Overview:

This repository focuses on securely managing, streamlining, and analyzing structured and semi-structured YouTube video data, with a specific emphasis on video categories and trending metrics.

Project Goals:

  1. Data Ingestion: Develop a mechanism for ingesting data from diverse sources
  2. ETL System: Transform raw data into the appropriate format.
  3. Data Lake: Establish a centralized repository for storing data from multiple sources.
  4. Scalability: Ensure the system scales seamlessly with increasing data size.
  5. Cloud Integration (AWS): Utilize AWS for processing large amounts of data.
  6. Reporting Dashboard: Build a dashboard to extract insights from the data.

Services Used:

  1. Amazon S3: Object storage service offering manufacturing scalability, data availability, security, and performance.
  2. AWS IAM: Identity and access management for secure access to AWS services and resources.
  3. QuickSight: Scalable, serverless, embeddable, machine learning-powered BI service for cloud-based business intelligence.
  4. AWS Glue: Serverless data integration service facilitating data discovery, preparation for analytics and application development.
  5. AWS Lambda: Computing service enabling code execution without server management.
  6. AWS Athena: Interactive query service for S3, allowing data querying without the need to load it.

Dataset:

The repository includes a Kaggle dataset containing daily statistics (CSV files) of up to 200 trending YouTube videos across multiple months and locations. Each region has its file, including video title, channel title, publication time, tags, views, likes, dislikes, description, and comment count. The associated JSON file contains a region-specific category_id field. This dataset serves as a comprehensive source for understanding the dynamics of trending YouTube videos.

YT_ETL_Data_Pipeline_Project.mp4

To experience improved visuals and faster loading times, please click the following link to watch the project video. https://drive.google.com/drive/folders/1q-1zNENstn8pjONrLIjXF7xAuotsgE8X?usp=drive_link

videos-analytics-with-etl-integration-on-aws's People

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

psivaramps

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.