Statewise stats of COVID-19 cases from the official website of MHRD India.
-
Beautifulsoup — A library for pulling data out of html and xml files.
-
Requests — A library for making HTTP requests in python.
-
GeoPandas — A library for working with geospatial data in python.
-
PrettyTable — quick and easy to represent tabular data in visually appealing ASCII tables. and other regular packages like Pandas, Matplotlib and Seaborn.
(Note that Geopandas further depends on fiona for file access and descartes and matplotlib for plotting)
Run pip install -r requirements.txt to install the packages in your local machine.
Here is a basic idea about web scraping if you want to scale this project to some similar use.
-
Sending an HTTP GET request to the URL of the webpage that you want to scrape, which will respond with the HTML content. We can do this by using the Request library of Python.
-
Analyzing the HTML tags and their attributes, such as class, id, and other HTML tag attributes. Also, identifying your HTML tags where your content lives.
-
Fetching and parsing the data using Beautifulsoup library and maintain the data in some data structure such as Dictionary or List.
-
Output data in any file format such as csv, xlsx, json, etc. or use this tabulated data to make visualizations using Seaborn/Matplotlib libraries.