Giter VIP home page Giter VIP logo

quantumudit / analyzing-yell-cafes Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 1.13 MB

This project focuses on scraping data related to cafes and coffee shops in London, England from the Yellow Pages (Yell.com) website; performing necessary transformations on the scraped data and then analyzing & visualizing it using Jupyter Notebook and Power BI.

License: Other

Python 100.00%
python data-analysis data-visualization jupyter-notebook power-bi webscraping data-transformation etl data-science

analyzing-yell-cafes's Introduction

Project Logo


Scraping & Analyzing top café & coffee shops in London from Yell.com website with Python and Power BI

built-with-love powered-by-coffee cc-nc-sa

OverviewPrerequisitesArchitectureDemoSupportLicense

Overview

This project focuses on scraping top café & coffee shops in London and their associated metrics from Yell.com, performing exploratory data analysis to generate insights and visualize them with the help of Power BI.

The repository directory structure is as follows:

Analyzing-Yell-Cafes
├─ 01_WEBSCRAPING
├─ 02_ETL
├─ 03_DATA
├─ 04_ANALYSIS
├─ 05_DASHBOARD
├─ 06_RESOURCES

The type of content present in the directories is as follows:

01_WEBSCRAPING

This directory contains the python script to scrape data from the website along with flat file that has the scraped data.

02_ETL

This directory contains the ETL script that takes the scraped dataset as input, transforms it and exports an analysis-ready dataset into the 03_DATA directory.

03_DATA

This directory contains the data that can be directly used for exploratory data analysis and data visualization purposes.

04_ANALYSIS

This directory contains the python notebooks that analyzes the clean dataset to generate insights

05_DASHBOARD

This directory contains the python notebook with an embedded Power BI report that visualizes the data. The Power BI dashboard contains slicers, cross-filtering and other advance capabilities that end user can play with to visualize a specific facet of the data or, to get additional insights.

06_RESOURCES

This directory contains images, icons, layouts, etc. that are used in this project

Prerequisites

The major skills that are required as prerequisite to fully understand this project are as follows:

  • Basics of Python
  • Python libraries: Requests-HTML, Pandas, DateTime, concurrent_futures
  • Basics of Python Notebooks
  • Basics of Power BI

In order to complete the project, I've used the following applications and libraries

  • Python
  • Python libraries mentioned in requirements.txt file
  • Jupyter Notebook
  • Visual Studio Code
  • Microsoft Power BI

The choice of applications & their installation might vary based on individual preferences & system settings.

Architecture

The project architecture is quite straight forward and can be explained through the below image:

Process Architecture

As per the above workflow suggests; we are first scraping the data from the website using the Python script and collecting the same in a flat file which is then processed and cleaned with another ETL specific Python script.

Finally; we leverage the clean & analysis-ready dataset for some exploratory data analysis (EDA) using Jupyter Notebook and creating an insightful report using Power BI

Demo

The below graphic shows scraping of data from the website:

Scraping Graphic

From this graphic we can observe a significant reduce in the time of scraping due to the use of multi-threading.

Support

If you have any doubts, queries or, suggestions then, please connect with me in any of the following platforms:

Linkedin Badge Twitter Badge

If you like my work then, you may support me at Patreon:

become-a-patreon

License

by-nc-sa

This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.