Giter VIP home page Giter VIP logo

docker_selenium_jupyter's Introduction

Dockerized Jupyter Lab with Selenium

Author: Douglas Strodtman (SaMo)

This directory contains everything you need to get Selenium up and running in Google Chrome. This will allow you to quickly create identical environments for automating tasks and scraping Javascript enabled pages, and make it easy to run Selenium on AWS.

Files Contained

Other than this README, each of the files in this directory is necessary to successfully build out your environment. Hopefully the descriptions below will allow you to update these to your specific needs.

Dockerfile

Dockerfiles allow you to define custom Docker images. This file inherits the CURRENT jupyter/scipy-notebook image and then pip installs a number of additional packages through the requirements.txt file. Note that currently the versions are not specified for any of these files--if you run into version errors, you may want to update these specifications.

requirements.txt

This file is essentially just a list of packages to be pip installed. In the current iteration, versions are not specified, and the following packages are appended to those included in the jupyter/scipy-notebook image (base packages here):

  • selenium
  • scrapy
  • psycopg2

docker-compose.yml

This file does all the magic. Chrome, Selenium, and Jupyter are each installed in separate Docker containers that are networked to one another. The Jupyter image is custom-built from the included Dockerfile. Note that current versions of the Selenium Docker images are hard-coded (to ensure compatability) and can be updated later. Here's Selenium's official docker-compose instructions, in case you break this.

Note that a number of custom options are being specified here:

  • The Jupyer container name is set as jupyter_selenium.
  • The base directory for the Jupyter container is set to the computer's home directory (~/).
  • Jupyter Lab is enabled (this will open Lab by default and prevent you from opening Jupyter Notebook; you can comment out this option to revert to Jupyter notebooks).
  • The Jupyter instance is ported to 8888 (if you're trying to run this alongside another Jupyter instance, you'll need to change this port to avoid conflicts).

Instructions for Use

To set up all 3 of these containers, you'll simply need to navigate to this directory and type

docker-compose up -d

This method will also contain an internal network linking the containers; the IP address of each is mapped to the service name.

Specifically, this is important because the IP address of the Selenium hub container is mapped to the alias hub, which allows us to therefore access our remote Selenium instance through the web address http://hub:4444/wd/hub.

Remote Selenium

The following code demonstrates how easy it is to use a remote webdriver. Here we specify that we want to use a headless Chrome browser (a browser without a visual display) for our driver, which is hosted on port 4444 of our hub service:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Remote(
    command_executor='http://hub:4444/wd/hub',
    desired_capabilities=DesiredCapabilities.CHROME)

If this is your first time, I recommend doing a simple test. Here, we'll go to Google and just print the title. Don't forget to close your driver after completing your task.

driver.get('https://www.google.com')
print(driver.title)
driver.quit()

Stopping

To stop and remove these instances (which will not delete your files or images, only the containers), simply run

docker-compose down

docker_selenium_jupyter's People

Contributors

dstrodtman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.