Web Scraper is a Django application that allows you to get all the information of a given page and gets a list of all of the links in that page.
-
Clone the repository:
git clone https://github.com/atiliopereira/webscraper.git
-
Install pipenv system-wide or locally but outside a virtualenv. Alternatively, follow these commands:
pip install -U pip pip install pipenv
-
Navigate to the project directory:
cd webscraper
-
Install the project dependencies using pipenv:
pipenv install
-
Activate the pipenv shell:
pipenv shell
-
Run the migrations:
python manage.py migrate
-
Start the server:
python manage.py runserver
The application will be available at http://127.0.0.1:8000/.
To access the user registration and login pages, navigate to http://127.0.0.1:8000/accounts/.
[Optional]: You can create a superuser that will allow you additional functions like:
- Create, Update and Delete Pages, Links and Users.
- See all the Pages scraped by all the users.
python manage.py createsuperuser
This project uses pytest for testing.
To run the tests, use the following command:
pytest
- pytest: Testing
- beautifulsoup4: Scraping
- allauth: Additional authentication functionality
- pipenv: Dependencies management