Giter VIP home page Giter VIP logo

rawg-etl's Introduction

Data Engineering Project - Latest Games from Rawg.io API

Screen-Shot-2023-07-26-at-20-54-11.png

Objectives

This is a data engineering project that focuses on analyzing the latest six months of game data from the website Rawg.io using their provided API. The data obtained from the API is then processed and stored in a MySQL database. The data stored in the database is then used for data analysis using Tableau.

Project architecture

Project-Diagram.png

  • Data Source: API Rawg.io
  • Data Lake: Local Storage, saved as JSON files
  • Data Transformation: Python with Pandas
  • Data Warehouse: MySQL
  • Data Visualization: Tableau
  • Workflow Management: Apache Airflow

Result - Data Visualization

TBA

Setup and Running

Setting up and Activating the Virtual Environment

Set up a virtual environment using virtualenv and install the required version of Python. Don't forget to create a new venv.

pyenv install 3.8.10
pyenv virtualenv 3.8.10 airflow
pyenv activate airflow

Installing Apache Airflow

Install Apache Airflow by following the instructions in the official documentation. I used Airflow version 2.6.3.

Installing requirements.txt

pip install -r requirements.txt

Make sure MySQL is installed

Make sure MySQL is installed. Then, don't forget to change the database configuration in the load_data_to_db.py file according to the MySQL configuration on your computer.

# Define the MySQL database connection parameters
host = 'CHANGE'
user = 'CHANGE'
password = 'CHANGE'
database = 'rawg_db'

Run pipelines

  1. Make sure Airflow is running smoothly.
  2. Move the contents of the dags folder in this repository to the dags folder in your Airflow installation.
cp -a dags/ $AIRFLOW_HOME/dags/
  1. Open the Airflow UI and activate the rawg_etl DAG.
  2. Wait until the DAG has finished running.

Future Work

  • Packaging all ETL processes using Docker
  • Tidying up data in the Data Warehouse using dbt
  • Building infrastructure using Terraform

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.