This is a data engineering project that focuses on analyzing the latest six months of game data from the website Rawg.io using their provided API. The data obtained from the API is then processed and stored in a MySQL database. The data stored in the database is then used for data analysis using Tableau.
- Data Source: API Rawg.io
- Data Lake: Local Storage, saved as JSON files
- Data Transformation: Python with Pandas
- Data Warehouse: MySQL
- Data Visualization: Tableau
- Workflow Management: Apache Airflow
TBA
Set up a virtual environment using virtualenv and install the required version of Python. Don't forget to create a new venv.
pyenv install 3.8.10
pyenv virtualenv 3.8.10 airflow
pyenv activate airflow
Install Apache Airflow by following the instructions in the official documentation. I used Airflow version 2.6.3
.
pip install -r requirements.txt
Make sure MySQL is installed. Then, don't forget to change the database configuration in the load_data_to_db.py
file according to the MySQL configuration on your computer.
# Define the MySQL database connection parameters
host = 'CHANGE'
user = 'CHANGE'
password = 'CHANGE'
database = 'rawg_db'
- Make sure Airflow is running smoothly.
- Move the contents of the
dags
folder in this repository to thedags
folder in your Airflow installation.
cp -a dags/ $AIRFLOW_HOME/dags/
- Open the Airflow UI and activate the
rawg_etl
DAG. - Wait until the DAG has finished running.
- Packaging all ETL processes using Docker
- Tidying up data in the Data Warehouse using dbt
- Building infrastructure using Terraform