Giter VIP home page Giter VIP logo

puppeteer-extra's Introduction

puppeteer-extra Downloads

This is the monorepo for puppeteer-extra, a modular plugin framework for puppeteer. :-)

๐ŸŒŸ For the main documentation, please head over to the puppeteer-extra package.

We've also recently introduced support for Playwright, if you're interested in that head over to playwright-extra.

Monorepo

Contributing

Contributing

PRs and new plugins are welcome! The plugin API for puppeteer-extra is clean and fun to use. Have a look the PuppeteerExtraPlugin base class documentation to get going and check out the existing plugins (minimal example is the anonymize-ua plugin) for reference.

We use a monorepo powered by Lerna (and yarn workspaces), ava for testing, the standard style for linting and JSDoc heavily to auto-generate markdown documentation based on code. :-)

Lerna

Lerna

This monorepo is powered by Lerna and yarn workspaces.

Initial setup

# Install deps
yarn

# Bootstrap the packages in the current Lerna repo.
# Installs all of their dependencies and links any cross-dependencies.
yarn bootstrap

# Build all TypeScript sources
yarn build

Development flow

# Install debug in all packages
yarn lerna add debug

# Install fs-extra to puppeteer-extra-plugin-user-data-dir
yarn lerna add fs-extra --scope=puppeteer-extra-plugin-user-data-dir

# Remove dependency
# https://github.com/lerna/lerna/issues/833
yarn lerna exec --concurrency 1 'yarn remove fs-extra; echo 0'

# Run test in all packages
yarn test

# Update JSDoc based documentation in markdown files
yarn docs

# Upgrade project wide deps like puppeteer
# (We keep the devDependency version blurry)
rm -rf node_modules
rm -rf yarn.lock
yarn
yarn lerna bootstrap

# Update deps within packages (interactive)
yarn lernaupdate

# If in doubt :-(
yarn lerna exec "rm -f yarn.lock; rm -rf node_modules; echo 0"
rm -f yarn.lock &&  rm -rf node_modules && yarn cache clean

# Run tests of specific package
cd packages/puppeteer-extra-plugin-stealth
yarn test

# Run tests of specific stealth evasion
cd packages/puppeteer-extra-plugin-stealth
yarn ava -v ./evasions/user-agent-override/index.test.js

# Test a local monorepo package in an outside folder as it would've been installed from the registry
# Change PACKAGE_DIR to the path of this monorepo and PACKAGE to the package you wish to install
PACKAGE=puppeteer-extra PACKAGE_DIR=/Users/foo/puppeteer-extra/packages && yarn remove $(echo $PACKAGE); true && rm -f $(pwd)/$(echo $PACKAGE)-latest.tgz && yarn --cwd $(echo $PACKAGE_DIR)/$(echo $PACKAGE) pack --filename $(pwd)/$(echo $PACKAGE)-latest.tgz && YARN_CACHE_FOLDER=/tmp/yarn yarn add file:$(pwd)/$(echo $PACKAGE)-latest.tgz && rm -rf /tmp/yarn

Publishing

# make sure you're signed into npm before publishing
# yarn publishing is broken so lerna uses npm
npm whoami

# ensure everything is up2date and peachy
yarn
yarn bootstrap
yarn lerna link
yarn build
yarn test

# Phew, let's publish these packages!
# - Will publish all changed packages
# - Will ask for new pkg version per package
# - Will updated inter-package dependency versions automatically
yarn lerna publish

# Fix new dependency version symlinks
yarn bootstrap && yarn lerna link

puppeteer-extra's People

Contributors

0x7357 avatar abowcut avatar adamsultan7 avatar aledbf avatar anarcal avatar atymic avatar benww avatar berstend avatar dependabot[bot] avatar dev-hyperweb avatar ev-kenjiterai avatar fdezromero avatar josep11 avatar jozsi avatar mnmkng avatar niek avatar peterblazejewicz avatar proydenko94 avatar regseb avatar remusao avatar rolyatwilson avatar scottpierce avatar spyfly avatar starrify avatar thibaultmthh avatar transitive-bullshit avatar urlbox-io avatar verglor avatar victornpb avatar vikashloomba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

puppeteer-extra's Issues

Add support for puppeteer-core

Can we add a constructor option for using puppeteer-core instead of puppeteer to avoid the chromium download when connecting to a remote chromium instance?

Add support for puppeteer.connect()

Currently we're only overloading the puppeteer launch() method to add plugin functionality and bind browser lifecycle events to plugins.

This is sufficient for local Chromium instances but it would make sense to add support for connect() as well, which is used to connect to an existing Chromium instance.

The _bindBrowserEvents() method in the PuppeteerExtraPlugin base class already has rudimentary support for a context, so it's mostly a matter of adding support to the main puppeteer-extra module.

Not all plugins will support an existing Chromium instance (e.g. the user-preferences plugin needs to modify the local Chromium Profile folder before launch), so introducing a new noConnect or localOnly requirement could make sense, to potentially show a warning to the user.

On the bright side:
As the PuppeteerExtraPlugin base class is handling event binding for onPageCreated() et al, plugins not relying on beforeLaunch() should be agnostic to whether launch or connect was used. ๐ŸŽ‰

Recaptcha stopped working

For some reason, the recaptcha stopped working on samsclub.com. It was working fine and then it just stopped working. Any thoughts?

puppeteer-extra-plugin-stealth www.hyatt.com search bot detection

If I normally browse the following link, it doesn't throw any errors, however when I use puppeteer, it throws following error.
https://www.hyatt.com/en-US/search/New%20York?checkinDate=2019-04-05&checkoutDate=2019-04-06&rooms=1&adults=1&kids=0&rate=Standard
1554175658(1)
1554175620(1)

first, is the manually , second is the puppeteer and puppeteer-extra-plugin-stealth,
The system is Windows 10
"puppeteer": "^1.12.2",
"puppeteer-extra": "^2.1.3",
"puppeteer-extra-plugin-stealth": "^2.2.2",

The code is as follows
`const puppeteer = require("puppeteer-extra")

const pluginStealth = require("puppeteer-extra-plugin-stealth")

puppeteer.use(pluginStealth())
puppeteer.launch({ headless: false }).then(async browser => {
const page = await browser.newPage();
await page.goto('https://www.hyatt.com/en-US/search/New%20York?checkinDate=2019-04-05&checkoutDate=2019-04-06&rooms=1&adults=1&kids=0&rate=Standard');
await page.waitFor(5000)
await page.screenshot({ path: "testresult2.png", fullPage: true })
//await browser.close()
// await browser.close();
});`

Target creation events are sometimes triggered too late

Unfortunately it's possible that the targetcreated events are not triggered early enough for listeners (e.g. plugins using onPageCreated) to be able to modify the page instance (e.g. user-agent) before the browser request occurs.

This only affects the first request of a newly created page target.

As a workaround I've noticed that navigating to about:blank (again), right after a page has been created reliably fixes this issue and adds no noticable delay or side-effects.

The above workaround only fixes explicitly created pages, implicitly created ones (e.g. through window.open) are still subject to this issue. I didn't find a reliable mitigation for implicitly created pages yet.

This problem is not specific to puppeteer-extra but default Puppeteer behaviour:
puppeteer/puppeteer#2669

Rewrite in Typescript

Current progress (28 Nov - not pushed yet!) ๐Ÿ“š

  • Rewrite puppeteer-extra in TypeScript ๐ŸŽ‰
  • New addExtra export to support non-standard or multiple puppeteer instances
  • Find a way to stay compatible with the existing default export and behavior
  • Make all existing tests go green
  • Find a way to use documentation.js with TypeScript
  • Find a way to have plugins extend Puppeteer types (e.g. page.solveCaptchas())
  • Add TypeScript usage examples to readme
  • Add puppeteer-firefox usage examples to readme
  • Beautify the documentation.js output to the current look
  • Update all plugins to use the new core
  • Update all plugins to use the new documentation approach

(initial comment)

I started this repo before noticing how awesome typescript is. ๐Ÿ˜„

Whenever I find the time I shall refactor this project into typescript.

Potential issues:

Update

Parts have been rewritten in TypeScript ๐ŸŽ‰

Before I can rewrite the rest I need to improve the generated typedoc output.

We're using typedoc-plugin-markdown (the current best thing) and the output looks like this, compared to the previous documentation.js based docs. ๐Ÿ˜ข

I'm really close to start developing my own minimal typedoc-plugin-readme plugin. ๐Ÿ˜…

pluginPath?

The pluginPath is required? Do I need to detect os and give it a .dll, .so, or .plugin file?

Thanks

problem with WebGL test

I have already check and in my case your solution with WebGL test doesn't work (antibot detects invalid renderer: Google SwiftShader ).

Which part of your code regards HEADCHR_IFRAME?

Do you use some sort of IP rotating or proxy while scraping?

Using puppeteer-extra-plugin-flash to auto allow flash on certain/all sites

Hello, thank you all for the great initiative developing the framework :)

I've discovered puppeteer-extra-plugin-flash but I am unable to make it work in regards of automatically allowing flash to be run on certain/all websites, without user interactions (as it is meant to be used for automated testing)

I've created an SO question here (it has 50rep bounty on it if you collect reps).

The contents of the original question:

Disclaimer: I know that Flash will be abandoned by the end of 2020, but I simply cannot drop the case and need to have flash in Puppeteer, though I don't like it either.

I need to crawl certain flash sites and take a screenshot of them, for later programatic comparison. I could provide a finite list of domains that I need to check against (though the list may change in time, so it'd be great to be able to somehow load them at the runtime).

Been searching through the Internet after solutions for a while now, the closest I got in matter of SA question is this: how to add urls to Flash white list in puppeteer

I managed to get Flash sites be properly recognized after using puppeteer-extra-plugin-flash, providing path and version for PepperFlash and running Chrome executable instead of Chromium, but I still need to click the greyed out puzzle to allow flash to be run on any website.

I just can't find a solution that will work in July 2019.

I've tried using various arguments:

   --ppapi-in-process || 
   --disable-extensions-except=${pluginPath}/.. || 
   --allow-outdated-plugins || 
   --no-user-gesture-required

And bunch of more, possibly unrelated. The approach that seems most successful for other people seems to be using PluginsAllowedForUrls and providing a list of urls with wildcards, then loading predefined profile via --user-data-dir - but I had not luck in that matter either (I have issues with preparing proper profile I suppose).

This tool that I am building will not be public and be used only internally, by educated team - so I don't have too much security constrains to care about. I simply need the Flash in puppeteer. I also do not need to care about Dockerizing it.

My current setup, simplified:

    // within async function

    const browser = await puppeteer.launch({
        headless: false,
        executablePath: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
        args: [
            '--window-size=800,600',
            '--enable-webgl',
            '--enable-accelerated-2d-canvas',
            `--user-data-dir=${path.join(process.cwd(), 'chrome-user-data')}`
            // '--always-authorize-plugins', -> does not seem to be doing anything in our case
            // '--enable-webgl-draft-extensions', -> does not seem to be doing anything in our case
            // '--enable-accelerated-vpx-decode', -> does not seem to be doing anything in our case
            // '--no-user-gesture-required',  -> does not seem to be doing anything in our case
            // '--ppapi-in-process', -> does not seem to be doing anything in our case
            // '--ppapi-startup-dialog', -> does not seem to be doing anything in our case
            // `--disable-extensions-except=${pluginPath}/..`, -> does not solve issue with blocked
            // '--allow-outdated-plugins', -> does not seem to be doing anything in our case
        ],
    });

    const context = await browser.defaultBrowserContext();
    const page = await context.newPage();

    const url = new URL('http://ultrasounds.com');
    const response = await fetch(url.href);

    await page.setViewport({ width: 800, height: 600});
    await page.goto(url.href, { waitUntil: 'networkidle2' });
    await page.waitFor(10000);

    const screenshot = await page.screenshot({
      encoding: 'binary',
    });

Chrome version: 75.0.3770.100
puppeteer-extra: 2.1.3
puppeteer-extra-plugin-flash: 2.13

Could you perhaps provide additional info on how to work with it, or what am I doing wrong? Thanks in advance!

Not detecting reCAPTCHAs for site "https://givingassistant.org"

Version:

"puppeteer": "^1.17.0",
"puppeteer-extra": "^2.1.3",
"puppeteer-extra-plugin-recaptcha": "^3.0.4"

After login submit is clicked, the reCAPTCHA popup appears. Then I call await page.solveRecaptchas() and nothing happens.

Debugged a little bit and in the reCAPTCHA page I tried calling let { captchas, error } = await page.findRecaptchas() it returns null

So the plugin fails to detect the reCAPTCHA in the page.

here is what is done:

    // go to giving assistant page
    await page.goto('https://givingassistant.org');
    // click signin
    await page.waitForSelector('a.account-menu__button.user-signin');
    await page.evaluate(() => document.querySelector('a.account-menu__button.user-signin').click());
    //enter credentials
    await page.waitForSelector('#signin-email');
    await page.type('#signin-email', ga_email);
    await page.waitForSelector('#signin-password');
    await page.type('#signin-password', ga_password);
    //sign in button click
    await page.waitForSelector('#myModal_signin > div > form > div.modal-footer > button');
    await page.evaluate(() =>  document.querySelector("#myModal_signin > div > form > div.modal-footer > button").click())

    await page.waitFor(2000);

    // check whether signed in to GA
    // looking for the element is present in DOM
    let signedin = await page.$('li.anim_signedin > button > span > span > img')

    if (await signedin) {
      console.log("Login in to GivingAssistant successful")
      return true;
    } else {
      console.log("GivingAssistant - Redirects to reCAPTCHA page")
      // recaptcha solver
      let { captchas, error } = await page.findRecaptchas() // returns null
      // await page.solveRecaptchas() 
      ....
      ....
    }

Headless being detected by Amazon

const puppeteer = require("puppeteer-extra")
// Enable stealth plugin with all evasions
puppeteer.use(require("puppeteer-extra-plugin-stealth")())

  try {
    (async () => {
      const browser = await puppeteer.launch({
        headless: true,
        slowMo: 25,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
      })
      const page = await browser.newPage()

      await page.setViewport({ width: 1280, height: 800 })

      // goto the provided list
      await page.goto('https://www.amazon.com/hz/wishlist/dl/invite/3gq5Qfx?ref_=wl_share')

      // login
      await page.type('#ap_email', "amazon email here")
      await page.type('#ap_password', "amazon password here")
      await page.click('#signInSubmit')

      await browser.close()
    })()
  } catch (err) {
    console.error(err)
    return Promise.reject(new Error("Mission failed, we'll get em next time"));
  }

The interesting bit is that headless: false is not being detected. Is there a way to figure out how only the headless mode is being detected?

[puppeteer-extra-plugin-stealth] Chromium flag `--lang` doesn't work properly

Looks like there is a problem of puppeteer launch arg --lang in puppeteer-extra-plugin-stealth. No matter what value was set, navigator.languages is constant [ 'en-US', 'en' ].

Sample codes:

async function printLangs() {
  const browser = await puppeteer.launch(
    { args: ['--lang=en-GB'] }
  );

  const page = await browser.newPage();

  const langs = await page.evaluate(() => ({
    language: navigator.language,
    languages: navigator.languages,
  }));

  console.log('language settings:', langs);

  await page.close();
  await browser.close();
}
  • Run with original puppeteer:
const puppeteer = require('puppeteer');
...
printLangs();  // language settings: { language: 'en-GB', languages: [ 'en-GB' ] }
  • Run with puppeteer-extra:
const puppeteer = require('puppeteer-extra');
...
printLangs();  // language settings: { language: 'en-GB', languages: [ 'en-GB' ] }
  • Run with puppeteer-extra-plugin-stealth:
const puppeteer = require('puppeteer-extra');
const pluginStealth = require("puppeteer-extra-plugin-stealth");
puppeteer.use(pluginStealth());
...
printLangs();  // language settings: { language: 'en-GB', languages: [ 'en-US', 'en' ] }

Flash on AWS Lambda

I'm using puppeteer extra flash plugin locally but when I try to run on Aws its seems that flash is not working. Any ideas how to get it working?

Use stealth mode but overriding the user-agent

Hi,

I want to use this super awesome library, but use my own user-agents (for testing various devices and browsers).
If I'm not overriding the user agent, cloudflare doesn't detect me, but if I do, it does :(

Any suggestions?
I'm overriding it with the args like this:

let args = [
    // When not commented out, I'm getting caught
    //`--user-agent="${visit_metadata.user_agent.userAgent}"`,
    `--window-size=${visit_metadata.user_agent.screenWidth},${visit_metadata.user_agent.screenHeight}`,
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-infobars',
    '--window-position=0,0',
    '--ignore-certificate-errors',
    '--ignore-certificate-errors-spki-list',
    '--disable-web-security',
    '--disable-gpu',
  ]

how to add puppeteer-extra flash plugin to site-scan to generate screenshots with flash enabled

mpi mk ua
Would anyone be able to point out to a total coding noob how to integrate in the site-scan script (to generate website screenshots) at https://github.com/christopherwk210/site-scan/blob/master/lib/site-scan.js
the flash plugin provided by puppeteer-extra?
I have tried including these lines in the site-scan script but with no much luck:

`#! /usr/bin/env node

// Modules
const argv = require('minimist')(process.argv.slice(2));
const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-flash')())
const chalk = require('chalk');`

I need to create website screenshots in bulk but with site-scan (both headless or not) all the sites with flash are not generated properly and show a placeholder instead.
This site screenshot, for example, http://mpi.mk.ua/ shows a massive gray placeholder requiring to enable flash.
Thanks for any tip

Stealth plugins not found when making .exe with pkg

// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require("puppeteer-extra")
// register plugins through `.use()`
puppeteer.use(
  require("puppeteer-extra-plugin-anonymize-ua")({ makeWindows: true })
)
puppeteer.use(require("puppeteer-extra-plugin-stealth")())

// usage as normal
puppeteer.launch().then(async browser => {
  const page = await browser.newPage()
  await page.goto("https://httpbin.org/headers", {
    waitUntil: "domcontentloaded"
  })
  const content = await page.content()
  console.log("content:", content) // => (..) User-Agent: (..) Windows NT 10.0
  await browser.close()
})

When I use that piece of code it doesn't work, giving me this error

C:\Users\Dan\Desktop\nodetests>node index

          A plugin listed 'puppeteer-extra-plugin-stealth/evasions/chrome.runtime' as dependency,
          which is currently missing. Please install it:

          yarn add puppeteer-extra-plugin-stealth

          Note: You don't need to require the plugin yourself,
          unless you want to modify it's default settings.

(node:38324) UnhandledPromiseRejectionWarning: TypeError: Class extends value #<Object> is not a constructor or null
    at Object.<anonymous> (C:\Users\Dan\node_modules\puppeteer-extra-plugin-stealth\evasions\chrome.runtime\index.js:10:22)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
    at Function.Module._load (internal/modules/cjs/loader.js:530:3)
    at Module.require (internal/modules/cjs/loader.js:637:17)
    at require (internal/modules/cjs/helpers.js:22:18)
    at PuppeteerExtra.resolvePluginDependencies (C:\Users\Dan\node_modules\puppeteer-extra\index.js:264:15)
    at PuppeteerExtra.launch (C:\Users\Dan\node_modules\puppeteer-extra\index.js:96:10)
(node:38324) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
(node:38324) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

When I remove puppeteer.use(require("puppeteer-extra-plugin-stealth")()) it works fine

How to spoof window.screen api?

Have you given some thought to the screen API? Seems like it could de-anonymize you. For example my mac gives me this:

availHeight: 972
availLeft: 0
availTop: 23
availWidth: 1680
colorDepth: 24
height: 1050
orientation: ScreenOrientation {angle: 0, type: "landscape-primary", onchange: null}
pixelDepth: 24
width: 1680

What if I wanted to emulate a windows computer? How can I know what is correct "avail" values?

Stealth only works in headless=false

I am trying to crawl an ads website (https://www.xe.gr/property/search?Transaction.type_channel=117518&page=1) and i managed to bypass the blockings for headless browsers with the stealth plugin, but i only achieved that in headless=false. Headless true i get the blocking message that i am a bot.

I am using the same code structure as the quickstart example here: https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra

Is there anything else i can try so i can use headless=true? Deploying to a production environment with headless=false would be a pain in the ass, plus it takes a lot of time to render the page.

Thanks

Support for older node version (v8.10.x)

I have other packages which require use of node 8.10.x for the time being.

It looks like plugins such as puppeteer-extra-plugin-stealth are compatible with node 8, but that the base puppeteer-extra-plugin requires node >=9.11.2 (see packages/puppeteer-extra-plugin/package.json).

Is there any way to make puppeteer-extra-plugin compatible with lesser versions of node?

Captcha solves but iframe doesn't disappear

Intermittently, I get this behaviour where a captcha is solved (purple turns to green) and solved returns some result but the captcha element doesn't hide afterwards, causing my test to fail.

anyone had this before?

Screenshot from 2019-08-26 10-54-26

Works in !headless, fails in Headless..

Sign in on redacted works on !headless, but rejects login while headless...
I'm using the standard plugins + the language fix. Anybody know of a possible solution?

`const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());

const UserAgentPlugin = require("puppeteer-extra-plugin-anonymize-ua");
puppeteer.use(UserAgentPlugin({ makeWindows: true }));

(async () => {
const browser = await puppeteer.launch(
{ headless: true },
{ args: ["--lang=en-US,en;q=0.9"] }
);
const page = await browser.newPage();
await page.setViewport({ width: 800, height: 600 });

await page.goto(
"redacted",
{ waitUntil: "networkidle2" }
);`

Add generic pageFilter() method to plugins

A lot of plugins just define a onPageCreated() class member for their functionality.

It could be a cool addition to add generic page filtering support for these types of plugins through the PuppeteerExtra base class, to conditionally toggle the plugin functionality per page.

Something along the lines of:

puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')({
  pageFilter: (page) => page.url().includes('example.com')
}))

// or

const anonymizeUA = require('puppeteer-extra-plugin-anonymize-ua')()
anonymizeUA.pageFilter = (page) => page.url().includes('example.com')
puppeteer.use(anonymizeUA)

Which would then pre-filter pages before the onPageCreated plugin method (and others?) are triggered.

A couple of things to look into:

  • Most stealth plugin evasions use page.evaluateOnNewDocument which is not re-triggered when the URL changes
  • Supporting a simplified pageFilterMatch method using *://*.example.com/* rules could be convenient for users

disable inforbars doesn't work

To .launch method I'm passing some args:

const browser = await puppeteer.launch({
    args: [
        `--window-size=${ width - 400 + 7 },${ height + 6 }`,
        '--window-position=400,0',
        '--no-sandbox',
        '--disable-infobars',
        '--ignore-ssl-errors=yes'
    ],
    headless: false,
    devtools: false
});

although I have '--disable-infobars', it still show infobar:

Chrome is being controlled by automated test software


I dig some more and found out that it's because chromium version. In past few days they removed --disable-infobars for some reason. Puppeteer v1.17.0 ships chromium Chromium 76.0.3803.0 (r662092) in which I can not disable infobar for some reason. Puppeteer v1.13.0 ships Chromium 74.0.3723.0 (r637110) and '--disable-infobars', flag work just fine.


I'll close this issue as it's clearly not related to puppeteer-extra nor to puppeteer but to chromium itself.

[TypeScript] Puppeteer-extra launching

import * as puppeteer from "puppeteer-extra";
// init method
puppeteer.launch({ headless: true }).then(async browser => {
      const page = await browser.newPage()
      await page.setViewport({ width: 800, height: 600 })
      await page.goto("https://bot.sannysoft.com")
      await page.waitFor(5000)
    });

returns
Uncaught (in promise) TypeError: merge is not a function
at PuppeteerExtra.launch (index.js:95)
at _callee2$ (Controller.js:42)
at tryCatch (runtime.js:45)
at Generator.invoke [as _invoke] (runtime.js:271)
at Generator.prototype. [as next] (runtime.js:97)
at asyncGeneratorStep (Controller.js:7)
at _next (Controller.js:9)
at eval (Controller.js:9)
at new Promise ()
at Controller.eval (Controller.js:9)

What could be the problem?

dep:
"@babel/polyfill": "^7.4.4",
"@types/jquery": "^3.3.29",
"@types/puppeteer": "^1.12.4",
"@types/sizzle": "^2.3.2",
"debug": "^4.1.1",
"deepmerge": "^3.3.0",
"kind-of": "^6.0.2",
"ms": "^2.1.2",
"puppeteer-cluster": "^0.16.0",
"puppeteer-extra-plugin": "^3.0.4",
"puppeteer-extra-plugin-anonymize-ua": "^2.1.4"

How to set download dir using puppeteer-extra-plugin-user-preferences

puppeteer.use(require('puppeteer-extra-plugin-user-data-dir')()) puppeteer.use(require('puppeteer-extra-plugin-user-preferences')({prefs: { download: { prompt_for_download: false, default_directory: OUTDIR, } } })). This wasn't successful. Any idea to set download behaviour at browser level. Btw great wrapper for puppeteer.

Feature request: add accept-language header in headless mode

Hi and thanks for the work

I've noticed that when puppeteer is run in headless mode, it doesn't send the accept-language header, which can can be a give-away in bot detection. It would be nice to have it added in the stealth plugins.
Current workaround is adding to the code :

await page.setExtraHTTPHeaders({
    'accept-language': 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7'
});

Best regards

puppeteer-extra-plugin-flash: "Click to enable Adobe Flash Player" prompt visible when using Puppeteer 1.10.0 with "external" Chromium 70.0.3538.102

Hello,

first of all, thanks for the plugin!

It worked perfectly earlier with both the "built-in" Chromium and stock Chrome, but when I changed the browser to Chromium 70.0.3538.102, the prompt requesting a click came back. Some kind of a change in the prefs or ...?

Package info:

ii  chromium                           70.0.3538.102-1~deb9u1         amd64        web browser
ii  flashplayer-chromium               31.0.0.148-dmo1                amd64        Flash Player for Chromium (Pepper)

Is there smaller version like puppeteer core only?

Currently I only use pupeeteer core as puppetter is too heavy and I already have chrome installed so I use chrome-paths and puppeteer-core which works really well. Is there any way to switch from pupeeteer to puppeteer core?

[TypeScript] puppeteer.use(...) is not a function

I write in TS, and code below causes exception: puppeteer.use(...) is not a function, any suggestions?

//index.ts
const puppeteer = require('puppeteer-extra')
// add recaptcha plugin and provide it your 2captcha token
// 2captcha is the builtin solution provider but others work as well.
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
puppeteer.use(
    RecaptchaPlugin({
        provider: { id: '2captcha', token: 'xxxx' },
        visualFeedback: true // colorize reCAPTCHAs (violet = detected, green = solved)
    })
)

Contact detail

How can i contact you? i can't see any information (email, twitter)

How to use with puppeteer-core?

HI,

I don't want to install full puppeteer package as it also installs a browser with it. If I want to use puppeteer-core and use installed browser to connect using chrome-launcher how would I use it with puppeteer-extra?

Captchas not solved when iframe not in a form

This line here assumes that recaptcha is in a form: https://github.com/berstend/puppeteer-extra/blob/master/packages/puppeteer-extra-plugin-recaptcha/src/content.ts#L142

But there are certain sites that do not wrap their captcha in a form.

Workaround is to move the captcha to a form on the page before calling the page.solveRecaptchas() function:

    await page.evaluate(() => {
      document
        .querySelector('form')
        .append(document.querySelector('#login-recaptcha'));
    });
    await page.solveRecaptchas();

Thank you very much for this library, extremely helpful, but my 17 years of experience are telling me the source code is a bit over-engineered. I recommend only using a function or two instead of a class when possible. Nonetheless, thank you so much, this library saved my tush :)

How to figure out new evasions

Hey guys, sorry if this is not in the scope of this library. I have an interesting question: how would one figure out other detections if this stealth library is enabled but you still get blocked? You must definitely have more experience in this area since you already have figured out a bunch of evasions. Perhaps we should create a list of articles on different evasions for headless detection that are not covered by this library?

Thanks a bunch if you find some time to contribute to this issue

Add 'user-agents' package

The user-agents package is updated daily to include popular user agents. For instance, a random ua sample:

{
  "appName": "Netscape",
  "connection": {
    "downlink": 10,
    "effectiveType": "4g",
    "rtt": 0
  },
  "platform": "Win32",
  "pluginsLength": 3,
  "vendor": "Google Inc.",
  "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
  "viewportHeight": 660,
  "viewportWidth": 1260,
  "deviceCategory": "desktop",
  "screenHeight": 800,
  "screenWidth": 1280
}

I would use a custom function as indicated in puppeteer-extra-plugin-anonymize-ua, but I would also like to implement the other ua data, such as viewport, platform, etc.

What would be the best way to go about adding the user-agents library to either puppeteer-extra-plugin-stealth or puppeteer-extra-plugin-anonymize-ua?

Methods isn't assigned

Using recaptcha plugin with extra,
const page0 = await browser.pages[0];
const page1 = await browser.newPage();
console.log(page1.solveRecaptchas); // [AsyncFunction]
console.log(page0.solveRecaptchas); // undefined
Is this in my environment only?

thoughts on a more modular design

I'm definitely onboard with the core idea behind this module, and I have several additional use cases I'd like to add, but there are a few things currently stopping me from jumping in:

  1. Overriding puppeteer methods like newPage seems like it could lead to trouble; why not just listen to the built-in Browser event targetcreated which would do the same thing but in a less intrusive way?
  2. If we continue piling distinct functionality into this module, imho it'll become confusing and unwieldy pretty fast. I'm also hesitant to include code mods by default that I'm not going to make use of (even if they're not enabled).

This seems like a perfect use case for a plugin pattern, where the only addition to the public puppeteer API is a single use method that takes in a plugin and adds it to a private list of registered plugins. PuppeteerExtra.launch would then create a new browser and run it through each of the registered plugins to add whatever hooks / functionality that plugin wants to the browser instance.

The only special case here is plugins that want to affect the functionality / params of puppeteer.launch, so any registered plugin which defines a launch method would be allowed to edit the launch params before creating the browser instance.

const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-stealth')())
puppeteer.use(require('puppeteer-extra-plugin-user-preferences')({ customPrefs: ... }))
puppeteer.use(require('puppeteer-extra-plugin-extension')('/path/to-extension/...'))

const browser = await puppeteer.launch({ ... })

This is a very similar pattern to how passport / express / babel work, and it scales really nicely once you get the base puppeteer-extra and puppeteer-extra-plugin interfaces defined.

Looking forward to hearing your thoughts :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.