This project demonstrates the ability to extract, transform, and load (ETL) data for use in data analysis.
Languages and libraries used:
- Python
- Pandas
- SQLAlchemy
- Postgres
Clone the repository to your desktop and do the following:
-
Navigate to the cloned folder and launch a GitBash (Windows) or Terminal (Mac).
-
Type
source activate PythonData
and hit ENTER. -
Launch Jupyter Notebook and open the file, etl_project.ipynb. This file is both the Python code and the required Technical Report.
-
Observe the Data Extraction section that shows the 2 CSV files were successfully imported.
-
Observe the Data Cleanup section which shows the data cleanup by removing unwanted columns, renaming columns, and deleting unwanted rows. This results in 2 cleaned dataframes.
-
In the Jupyter Notebook file, create the database connection by inserting username and password into this this code:
conn = "<insert user name>:<insert password>@localhost:5432/etl_5"
. Verify that the connection was made showing two tables. -
Launch and use pgAdmin4 to open the file, queries.sql to perform the queries on the tables, world_happiness and annual_work_hours .