Giter VIP home page Giter VIP logo

business_scraping's Introduction

Web Scraping Project - 'https://www.marocannuaire.org/'

Description

This project is a web scraping project that aims to extract data from the website 'https://www.marocannuaire.org/' and save it in a json file.

Installation

To install the project, you need to clone the repository and install the requirements:

git clone https://github.com/Soufiane-Majdar/Business_Scraping.git

cd Business_Scraping

pip install -r requirements.txt

Usage

To use the project, you need to run the get_business_urls.py file first to get the urls of all the businesses in the website, then run the get_business_data.py file to get the data of each business and save it in a json file.

get_business_urls.py url explanation

The get_business_urls.py need a url to start scraping from, the url must be in this format like the following example: "https://www.marocannuaire.org/Annuaire/activite_ville.php?pageNum_re_aff_dernier_anscri_index=0&totalRows_re_aff_dernier_anscri_index=80&activite=Restaurants&ville=RABAT%20SALE"

in this url, we have to change the following parameter to get the urls of all pages of the website: index=0 -> index=1 -> index=2 -> ... -> index=namber of pages

so we need to make the url in line 23 look like this: "https://www.marocannuaire.org/Annuaire/activite_ville.php?pageNum_re_aff_dernier_anscri_index={i}&totalRows_re_aff_dernier_anscri_index=80&activite=Restaurants&ville=RABAT%20SALE"

then just run the get_business_urls.py file

    python get_business_urls.py

run the get_business_data.py file

you can run the get_business_urls.py file withou any changes

    python get_business_data.py

finaly run the clean_json.py file

this file will clean the json file and remove characters that are not text like \u00e9 , \u00e0 ...

    python clean_json.py

business_scraping's People

Contributors

soufiane-majdar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.