Web Crawler that crawls/scrapes machine information at Vultr and Hostgator.
This project is the result of a technical challenge described in this document.
Ensure you have Python installed on your machine. If you can run the following command without problems, it means that your machine already has python installed and will show you the python version it is using.
python3 --version
Python 3.10 was used for development, and it is recommended that you use this version or a newer one.
It´s also recommended to have a specific virtualenv for this project. You can create your virtual environment using the tool of your choice or create one by running:
python3 -m venv .venv
and then activating it
source .venv/bin/activate # macos or linux
Follow the step by step to configure the project locally:
Clone the project
git clone https://github.com/luandadantas/jus-crawler-challenge.git
Go to the project directory
cd jus-crawler-challenge
Install dependencies. Be sure to activate your virtualenv before giving this command
pip install -r requirements.txt
Run the project
python main.py
If you just want to print Vultr or Hostgator individually, run each script separately using
python scrape/hostgator.py
or
python scrape/vultr.py
To run tests, run the following command
python -m pytest tests/ --disable-socket -vv
- Scrape data and collect content from Vultr
- Write/Print on screen the data collected from Vultr
- Previous step + save content from Vultr website in JSON file
- Previous step + save content from Vultr website in CSV file
- Create all previous steps for Hostgator website.
- Improve/Move Vultr and Hostgator to class objects.
If you have any feedback, please reach out at [email protected]