-
Python
Any python 3 derivative
-
sqlite
No need of any explicit installation, comes preinstalled with python as a standard library
-
curses
It is a python library used to create interactive and rich looking Terminal User Interface (TUI). Comes preinstalled a standard library.
-
sqlite browser (optional)
Sqlite browser is an open-source software to browse and manipukate sqlite databases.
The whole project follows TUI but still if user wants to have a Graphical User Interface (GUI) they can install
sqlite-browser
from their github repo given below -
Pandas
We have used pandas for organising and formatting the collected data into dataframes so that it would be easy to convert them into database tables.
Pandas can be installed as
pip install pandas
-
Alacritty (A cross platform terminal emulator)
While this should not be a dependancy to run the app and the app should run just fine on any terminal emulator, I have used alacritty to perform some admin functions such as
add_admin.py
,remove_admin.py
andverify_admin.py
and it won't work if the alcritty is not installed.I was not able to find any generic solution for running those functions such that, the script will automatically prompt for the default terminal emulator present on the machine, and need not to be dependant on
Alacritty
explicitly, but as soon as I get this issue resolved, I will update the scripts accordingly -
BeautifulSoup
BeautifulSoup
library is used to scrape data from web and to get well formatted html pages.It provides us with a rich arsenal of various methods and function by which we can get the required data very easily from any web-page.
Without
BeautifulSoup
we would have to manually format the brokenhtml
pages from the web and also have to filter the page for the required data.Though we could do that using regular expressions by importing python standard module
re
but that can sometimes be a really tedious job so we useBeautifulSoup
BeautifulSoup
can be installed as
pip install beautifulsoup4
-
A html parser ( preferably
lxml
)Beautiful soup provides us a default html parser but that's not very efficient and can provide unwanted parsed data for some large and complexly written html pages
It can even fail to detect the data if the html page is broken, So I suggest to install another parser named
lxml
I have used
lxml
parser in thescraper.ipynb
file which basically scrapes the data and stores it into a sqlite data base, you can just remove thelxml
parameter given tobeautifulsoup
and leave the place as blank;beautifulsoup
will use the default parser i.e.,html
parser that comes preinstalled with it
for url in urls:
page = urllib.request.urlopen(url)
soup = bs(page,"lxml")
lxml
can be installed as
pip install lxml
There are total 5 python scripts and 1 python notebook.
-
The python notebook by name
scraper.ipynb
contains the code to scrape official python documentation. Once you run all the cells of this notebook you will get a database in your current working directory by namepython_doc.db
-
tui.py
is the frontend code for our desktop app. Pythoncurses
module is used to design our terminal frontend -
traverse.py
contains the code to connect our backend with our frontend. It contains functions which returns data corresponding to the parameters they get. -
There are three files related to admin functionalities namely add an admin,remove an admin and verify a user as admin as
add_admin.py
,remove_admin.py
andverify_admin.py
respectively.
There is no need of installation for this app, simply clone the repository and you are good to go.
clone the repository
git clone https://github.com/Raj-Pansuriya/scrape2database.git
change your pwd
to the cloned repo.
cd scrape2database
to run the program, you will first have to run each and every cell of scraper.ipynb
so that you have a python_doc.db
ready to be traversed.
once you do that, run tui.py
.
If you want to enjoy the admin privilege functions there are few credentials already present in the admin.db
database, out of which, you can use user=user
and password=password
to get the admin privilages. Once you get that power, you can add your name and password and then you can login using those credentials.
Enter
orReturn
to get into a menuBackspace
to go to previous menu
The app has some linux specifc and environment specific dependancies so some of the functions may not work as expected on a windows machine.....(at least for now)