This repository contains an executble script to create a Docker container of selenium/standalone-chrome
image, providing a Selenium server with the Chrome browser. Additionally, it includes configurations for creating a virtual environment to perform web scraping tasks and format the extracted data for convenient handling with pandas.
Follow these steps to set up and work with this repository:
Click on the "Fork" button at the top right corner of the repository to create a copy in your GitHub account.
Clone the forked repository to your local machine.
Using the terminal, move to the exec directory an run
./setup.sh
Manually create a python environment:
cp requirements.txt ../jupyter-notebooks
cd ..
cd jupyter-notebooks
python3 -m venv .env-web-scrap
source .env-web-scrap/bin/activate"
pip install -r requirements.txt
python -m ipykernel install --user --name="$VENV_NAME"
rm -r requirements.txt
Manually start the docker container:
docker run -d --name "sel-docker" -p 4444:4444 --shm-size=2g \
-e SE_NODE_MAX_SESSIONS=6 \
-e SE_NODE_SESSION_TIMEOUT=1200 \
-e SE_VNC_NO_PASSWORD=1 \
selenium/standalone-chrome
Hint: docker ps
5. Work with the notebooks
Follow the instructions on the corresponding jupyter-notebooks