Giter VIP home page Giter VIP logo

puppeteer's Introduction

Puppeteer

Build status npm puppeteer package

Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default, but can be configured to run in full (non-headless) Chrome/Chromium.

What can I do?

Most things that you can do manually in the browser can be done using Puppeteer! Here are a few examples to get you started:

  • Generate screenshots and PDFs of pages.
  • Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
  • Automate form submission, UI testing, keyboard input, etc.
  • Create an automated testing environment using the latest JavaScript and browser features.
  • Capture a timeline trace of your site to help diagnose performance issues.
  • Test Chrome Extensions.

Getting Started

Installation

To use Puppeteer in your project, run:

npm i puppeteer
# or `yarn add puppeteer`
# or `pnpm i puppeteer`

When you install Puppeteer, it automatically downloads a recent version of Chromium (~170MB macOS, ~282MB Linux, ~280MB Windows) that is guaranteed to work with Puppeteer. For a version of Puppeteer without installation, see puppeteer-core.

Environment Variables

Puppeteer looks for certain environment variables for customizing behavior. If Puppeteer doesn't find them in the environment during the installation step, a lowercased variant of these variables will be used from the npm config.

  • HTTP_PROXY, HTTPS_PROXY, NO_PROXY - defines HTTP proxy settings that are used to download and run the browser.
  • PUPPETEER_CACHE_DIR - defines the directory to be used by Puppeteer for caching. Defaults to os.homedir()/.cache/puppeteer.
  • PUPPETEER_SKIP_CHROMIUM_DOWNLOAD - do not download bundled Chromium during installation step.
  • PUPPETEER_TMP_DIR - defines the directory to be used by Puppeteer for creating temporary files. Defaults to os.tmpdir().
  • PUPPETEER_DOWNLOAD_HOST - specifies the URL prefix that is used to download Chromium. Note: this includes protocol and might even include path prefix. Defaults to https://storage.googleapis.com.
  • PUPPETEER_DOWNLOAD_PATH - specifies the path for the downloads folder. Defaults to <cache>/chromium, where <cache> is Puppeteer's cache directory.
  • PUPPETEER_BROWSER_REVISION - specifies a certain version of the browser you'd like Puppeteer to use. See puppeteer.launch on how executable path is inferred.
  • PUPPETEER_EXECUTABLE_PATH - specifies an executable path to be used in puppeteer.launch.
  • PUPPETEER_PRODUCT - specifies which browser you'd like Puppeteer to use. Must be either chrome or firefox. This can also be used during installation to fetch the recommended browser binary. Setting product programmatically in puppeteer.launch supersedes this environment variable.
  • PUPPETEER_EXPERIMENTAL_CHROMIUM_MAC_ARM โ€” specify Puppeteer download Chromium for Apple M1. On Apple M1 devices Puppeteer by default downloads the version for Intel's processor which runs via Rosetta. It works without any problems, however, with this option, you should get more efficient resource usage (CPU and RAM) that could lead to a faster execution time.

Environment variables except for PUPPETEER_CACHE_DIR are not used for puppeteer-core since core does not automatically handle browser downloading.

puppeteer-core

Every release since v1.7.0 we publish two packages:

puppeteer is a product for browser automation. When installed, it downloads a version of Chromium, which it then drives using puppeteer-core. Being an end-user product, puppeteer supports a bunch of convenient PUPPETEER_* env variables to tweak its behavior.

puppeteer-core is a library to help drive anything that supports DevTools protocol. puppeteer-core doesn't download Chromium when installed. Being a library, puppeteer-core is fully driven through its programmatic interface.

You should only use puppeteer-core if you are connecting to a remote browser or managing browsers yourself. If you are managing browsers yourself, you will need to call puppeteer.launch with an explicit executablePath or channel.

When using puppeteer-core, remember to change the import:

import puppeteer from 'puppeteer-core';

Usage

Puppeteer follows the latest maintenance LTS version of Node.

Puppeteer will be familiar to people using other browser testing frameworks. You launch/connect a browser, create some pages, and then manipulate them with Puppeteer's API.

For more in-depth usage, check our guides and examples.

Example

The following example searches developers.google.com/web for articles tagged "Headless Chrome" and scrape results from the results page.

import puppeteer from 'puppeteer';

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://developers.google.com/web/');

  // Type into search box.
  await page.type('.devsite-search-field', 'Headless Chrome');

  // Wait for suggest overlay to appear and click "show all results".
  const allResultsSelector = '.devsite-suggest-all-results';
  await page.waitForSelector(allResultsSelector);
  await page.click(allResultsSelector);

  // Wait for the results page to load and display the results.
  const resultsSelector = '.gsc-results .gs-title';
  await page.waitForSelector(resultsSelector);

  // Extract the results from the page.
  const links = await page.evaluate(resultsSelector => {
    return [...document.querySelectorAll(resultsSelector)].map(anchor => {
      const title = anchor.textContent.split('|')[0].trim();
      return `${title} - ${anchor.href}`;
    });
  }, resultsSelector);

  // Print all the files.
  console.log(links.join('\n'));

  await browser.close();
})();

Default runtime settings

1. Uses Headless mode

Puppeteer launches Chromium in headless mode. To launch a full version of Chromium, set the headless option when launching a browser:

const browser = await puppeteer.launch({headless: false}); // default is true

2. Runs a bundled version of Chromium

By default, Puppeteer downloads and uses a specific version of Chromium so its API is guaranteed to work out of the box. To use Puppeteer with a different version of Chrome or Chromium, pass in the executable's path when creating a Browser instance:

const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});

You can also use Puppeteer with Firefox Nightly (experimental support). See Puppeteer.launch for more information.

See this article for a description of the differences between Chromium and Chrome. This article describes some differences for Linux users.

3. Creates a fresh user profile

Puppeteer creates its own browser user profile which it cleans up on every run.

Using Docker

See our guide on using Docker.

Using Chrome Extensions

See our guide on using Chrome extensions.

Resources

Contributing

Check out our contributing guide to get an overview of Puppeteer development.

FAQ

Our FAQ has migrated to our site.

puppeteer's People

Contributors

aslushnikov avatar jackfranklin avatar orkon avatar jrandolf avatar joeleinbinder avatar dependabot[bot] avatar mathiasbynens avatar ebidel avatar jschfflr avatar release-please[bot] avatar vsemozhetbyt avatar kblok avatar pavelfeldman avatar yanivefraim avatar johanbay avatar whimboo avatar sadym-chromium avatar tasneemkoushar avatar mjzffr avatar paulirish avatar hanselfmu avatar jrandolf-zz avatar thedavidbarton avatar christian-bromann avatar timvdlippe avatar yury-s avatar alixaxel avatar avgp avatar androbin avatar kikobeats avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.