Giter VIP home page Giter VIP logo

autobrowser's Introduction

autobrowser

Code Style

Webrecorder's high-fidelity browser-based crawler

Configuration

Configuration of autobrowser is done primarily through the environment variables listed below.

Their values are read by the application in autobrowser/automation/details.py

Note: boolean values (flag/switches) have the following format

  • true: 1, true, yes, y, ok, on
  • false: 0, false, no, n, nok, off, env var does not exist

General

REDIS_URL

  • The URL to be used when connecting to redis (string)
  • Defaults to redis://localhost

CHROME_OPTS

  • A string of json used by LocalDriver to launch a browser (string)

CDP_PORT

  • The port to be used when communicating with a browser via the CDP (number)
  • Defaults to 9222

AUTO_ID Required when crawling

  • The id of the entire automation (string)

REQ_ID Required

  • The id of the request used to start this part of the entire automation (string)

Shepherd

SHEPARD_HOST

  • The URL that shepard is listening on (string)
  • Defaults to http://shepherd:9020

BROWSER_ID

  • The id of the browser to be used when requesting one from shepherd (string)
  • Defaults to chrome:67

BROWSER_HOST

  • The host name of the browser running in a container (string)

REQ_BROWSER_PATH

  • The path to the shepherd endpoint for requesting browsers (string)
  • Defaults to /request_browser/

INIT_BROWSER_PATH

  • The path to the shepherd endpoint for initializing new browsers (string)
  • Defaults to /init_browser?reqid=

GET_BROWSER_INFO_PATH

  • The path to the shepherd endpoint for requesting information about a browsers (string)
  • Defaults to /info/

Crawling

CRAWL_NO_NETCACHE

  • Should the browsers network cache be disable (bool)
  • Defaults to true

NAV_TO

  • How long should the navigation timeout be (time value in seconds)
  • Defaults to 30

WAIT_FOR_Q

  • How long should the crawler tab wait for the frontier q to become populated (time value in seconds)
  • Defaults to -1 (forever)

WAIT_FOR_Q_POLL_RATE

  • How long is the check interval (time value in seconds)
  • Defaults to 5

BEHAVIOR_RUN_TIME

  • How long should the behaviors be allowed to run for (time value in seconds)
  • Defaults to 60

NUM_TABS

  • How many tabs should the be created per browser connected to (number)
  • Defaults to 1

TAB_TYPE

  • Which tab type should be used (BehaviorTab or CrawlerTab)
  • Defaults to BehaviorTab

Behaviors

BEHAVIOR_API_URL

  • The base URL to be used for interaction with the behaviors api (string)
  • Defaults to http://localhost:3030

FETCH_BEHAVIOR_ENDPOINT

  • The URL of the behaviors api endpoint for retrieving just the behaviors JavaScript (string)
  • Defaults to {BEHAVIOR_API_URL}/behavior?url=

FETCH_BEHAVIOR_INFO_ENDPOINT

  • The URL of the behaviors api endpoint for retrieving just the behaviors info (string)
  • Defaults to {BEHAVIOR_API_URL}/info?url=

SCREENSHOT_API_URL

  • The url to be used to send screenshots of the page after a behavior has run (string)
  • Note acts as a flag indicating screenshots are to be taken

SCREENSHOT_TARGET_URI Required if SCREENSHOT_API_URL is provided

  • The url for the resource record for the screenshots (string)

SCREENSHOT_FORMAT

  • The type of screenshot to be taken png or jpg (string)
  • Defaults to png

SCREENSHOT_DIMENSIONS

  • The dimensions of the screen shot to be taken (number).
  • Format: width height, space or comma separated
  • Defaults to the natural width height of the page's content

Javascript Expressions

BEHAVIOR_ACTION_EXPRESSION

  • The expression used to initiate the next action of a behavior (string)
  • Defaults to: window.$WRIteratorHandler$()

BEHAVIOR_PAUSED_EXPRESSION

  • The expression used to determine if the running behavior is in the paused state (string)
  • Defaults to: window.$WBBehaviorPaused === true

PAUSE_BEHAVIOR_EXPRESSION

  • The expression used to pause a running behavior
  • Defaults to: window.$WBBehaviorPaused = true

UNPAUSE_BEHAVIOR_EXPRESSION

  • The expression used to un-pause a running behavior
  • Defaults to: window.$WBBehaviorPaused = false

PAGE_URL_EXPRESSION

  • The expression used to determine the URL of the page (string)
  • Defaults to: window.location.href

OUTLINKS_EXPRESSION

  • The expression used to retrieve the outlinks collected by the running behavior (string)
  • Defaults to: window.$wbOutlinks$

CLEAR_OUTLINKS_EXPRESSION

  • The expression used to clear the outlinks collected by the running behavior (string)
  • Defaults to: window.$wbOutlinkSet$.clear()

NO_OUT_LINKS_EXPRESS

  • The expression used to indicate to the behavior that it is not to collect outlinks (string)
  • Defaults to: window.$WBNOOUTLINKS = true

autobrowser's People

Contributors

n0tan3rd avatar ikreymer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.