Giter VIP home page Giter VIP logo

data-engineering-portfolio's Introduction

๐Ÿ”ฅData-Engineering-Resources and Projects

Data Engineering Workflow

Screenshot 2024-01-26 at 12 50 44โ€ฏAM

Big Data Architecture

Screenshot 2024-01-26 at 12 53 23โ€ฏAM

๐Ÿ“šBooks

๐ŸงฐTools for Data Engineers

  • Basic Skills: Linux, Git & GitHub, Computer Networking, Cloud Computing, Network & Security, Agile Development

  • Advanced Skills (Good to Know): Data Lake & Data WareHouse Concepts, REST APIs, Databases(SQL & NoSQL)

  • Programming Languages: Python, SQL, Java, Scala

  • Databases: PostgreSQL, MongoDB, Neo4j, Redis, Cassandra, Apache HBase, Snowflake, InfluxDB

  • Data Ingestion: Apache Kafka, Flume, Logstash, Airbyte, Apache Spark, Talend, Informatica

  • Data Tranformation: Python, Pandas, SQL, Apache Spark, Hive, dbt, Matillion, Pig

  • Data Preprocessing: Apache Spark, Apache Hadoop, Apache Flink

  • Data Orchestration: Apache Airflow, Luigi

  • Data Storage: Data Lake: AWS S3, Azure Blob Storage, Google Cloud Storage, Data Warehouse: Snowflake, Google BigQuery, Amazon Redshift, Apache Hive

  • Data Visualization: Tableau, PowerBI, Looker

  • DataOps: Docker, Kubernetes, Jenkins

Here's what's on the menu: ๐Ÿ‘‡

  • ๐Ÿ Python,
  • ๐Ÿ“Š SQL,
  • ๐Ÿ› ๏ธ MySQL,
  • ๐ŸŒณ MongoDB,
  • ๐Ÿ”ฅ PySpark,
  • ๐ŸŽˆ Bash,
  • ๐ŸŒฌ๏ธ Airflow,
  • โ˜• Apache Kafka,
  • ๐Ÿ™ Git,
  • ๐Ÿˆ GitHub,
  • โš™๏ธ CICD basics,
  • ๐Ÿฌ Data Warehousing,
  • ๐Ÿ› ๏ธ DBT,
  • ๐ŸŒŠ Data Lakes,
  • ๐Ÿ“˜ DataBricks,
  • โ˜๏ธ Azure Databricks,
  • โ„๏ธ Snowflake,
  • ๐ŸŒช๏ธ Apache NiFi,
  • ๐ŸŒ Debezium
  1. Master Python: https://lnkd.in/d-pZPyf5

  2. Learn SQL: https://lnkd.in/dzAiRF-x

  3. Get hands-on with MySQL: https://lnkd.in/ddpSkUhc

  4. Dive into MongoDB: https://lnkd.in/dHQ4VC2E

  5. Master PySpark: https://lnkd.in/d7fgs7dE

  6. Discover Bash, Airflow & Kafka: https://lnkd.in/dDhuEqQE

  7. Master Git & GitHub: https://lnkd.in/dqJ7J3kN

  8. Understand CICD basics: https://lnkd.in/dcfKBmCa

  9. Decode Data Warehousing: https://lnkd.in/dPVRDJT5

  10. Learn DBT: https://lnkd.in/eG9eaEuE

  11. Understand Data Lakes: https://lnkd.in/dtZKJ4d6

  12. Explore DataBricks: https://lnkd.in/dCBiQXPR

  13. Learn Azure Databricks: https://lnkd.in/dzmwBs4Y

  14. Master Snowflake: https://lnkd.in/dDBeddVy

  15. Explore Apache NiFi: https://lnkd.in/de7bvnSt

๐Ÿ“™Projects

Sr. No. Projects Description Tech Stack Tags Code Link
01. Build ETL Pipeline Using AWS Cloud
02. Covid Data Analysis Project
03. Twitter Data Pipeline using Airflow and AWS
04. YouTube Data Analysis (End-To-End Data Engineering Project)
05. Olympic Data Analytics: End-To-End Azure Data Engineering Project
06. Uber Data Analytics Project On GCP
07. Data Ingestion and ETL Pipeline using Azure
08. Indian Stock Market Real-Time Data Processing, Analysis & Visualization using Azure Stream Analytics
09. Simple Stock Market ETL Process with SQL

๐Ÿ”ถ Free Learning Resources

Tools Link Used for Official Docs Youtube
DBMS - MySQL - MongoDB
SQL https://lnkd.in/dzAiRF-x
Python https://lnkd.in/d-pZPyf5
Linux
Data Warehouse & Lake Concepts - Data Warehouse - Data Lakes
Data Pipelines
DBT https://lnkd.in/eG9eaEuE
PySpark https://lnkd.in/d7fgs7dE
Kafka
Apache Nifi https://lnkd.in/de7bvnSt
Airflow
Databricks https://lnkd.in/dCBiQXPR
Snowflake https://lnkd.in/dDBeddVy
Cloud Computing Concepts
Distributed Systems fundamentals
AWS
Azure
GCP
Git & GitHub https://lnkd.in/dqJ7J3kN
CI/CD https://lnkd.in/dcfKBmCa
Jenkins
Github Actions
Terraform
Sonarqube
Docker
Kubernetes
Power BI
Tableau
Apache Superset
Prometheus
Graphana
Datadog

๐Ÿ’ผ Read Real-World Case Studies -> Tech Blogs

  1. Netflix - https://netflixtechblog.medium.com/
  2. AWS - https://aws.amazon.com/solutions/case-studies/
  3. GCP - https://cloud.google.com/customers
  4. Azure - https://azure.microsoft.com/en-us/resources/customer-stories/
  5. Spotify - https://engineering.atspotify.com/category/data/
  6. MongoDB - https://www.mongodb.com/blog/all
  7. Swiggy - https://bytes.swiggy.com/the-swiggy-delivery-challenge-part-one-6a2abb4f82f6 - https://bytes.swiggy.com/swiggy-distance-service-9868dcf613f4 - https://bytes.swiggy.com/the-tech-that-brings-you-your-food-1a7926229886
  8. Zomato - https://blog.zomato.com/

data-engineering-portfolio's People

Contributors

cybergeekgyan avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.