Giter VIP home page Giter VIP logo

puppeteer-renderer's Introduction

Puppeteer(Chrome headless node API) based web page renderer

Puppeteer (Chrome headless node API) based web page renderer.

Useful server side rendering through proxy. Outputs HTML, PDF and screenshots as PNG.

Requirements

You can run Chromium or docker.

Getting Started

Start server using docker (If you can not run Chromium and installed docker)

docker run -d --name renderer -p 8080:3000 ghcr.io/zenato/puppeteer-renderer:latest

Local (git clone)

pnpm install

Start server (If you can run Chromium)

pnpm dev (service port: 3000)

Locally build the image

docker build . --file ./Dockerfile --tag local/puppeteer-renderer --build-arg SCOPE=puppeteer-renderer

docker run -d --name renderer -p 8080:3000 local/puppeteer-renderer


### Test on your browser
Input url `http://localhost:{port}/{html|pdf|screenshot}?url=https://www.google.com`

If you can see html code, server works fine.

### Puppeteer customization

When starting `pnpm {dev|start}` or docker container you can customize puppeteer using environment variables.

- `IGNORE_HTTPS_ERRORS=true` - Ignores HTTPS errors
- `PUPPETEER_ARGS='--host-rules=MAP localhost yourproxy'` - Ads additional args that will be passed to puppeteer. Supports multiple arguments.

## Integration with existing service.

If you have active service, set proxy configuration with middleware.
See [puppeteer-renderer-middleware](packages/middleware/README.md) for express.

```ts
import express from 'express'
import renderer from 'puppeteer-renderer-middleware'

const app = express()

app.use('/render-proxy', renderer({
  url: 'http://installed-your-puppeteer-renderer-url',
  // userAgentPattern: /My-Custom-Agent/i,
  // excludeUrlPattern: /*.html$/i
  // timeout: 30 * 1000,
}));

// your service logics..

app.listen(8080);

API

Endpoint: /{html|pdf|screenshot}

Name Required Value Description Usage
url yes Target URL http://puppeteer-renderer/html?url=http://www.google.com
animationTimeout Timeout in milliseconds Waits for animations to finish before taking the screenshot. Only applicable to type screenshot http://puppeteer-renderer/screenshot?url=http://www.google.com&animationTimeout=3000
(Extra options) Extra options (see puppeteer API doc) http://puppeteer-renderer/pdf?url=http://www.google.com&scale=2

PDF File Name Convention

Generated PDFs are returned with a Content-disposition header requesting the browser to download the file instead of showing it. The file name is generated from the URL rendered:

URL Filename
https://www.example.com/ www.example.com.pdf
https://www.example.com:80/ www.example.com.pdf
https://www.example.com/resource resource.pdf
https://www.example.com/resource.extension resource.pdf
https://www.example.com/path/ path.pdf
https://www.example.com/path/to/ pathto.pdf
https://www.example.com/path/to/resource resource.pdf
https://www.example.com/path/to/resource.ext resource.pdf

License

MIT

Copyright (c) 2017-present, Yeongjin Lee

puppeteer-renderer's People

Contributors

7a6163 avatar carbogninalberto avatar chaelli avatar dependabot[bot] avatar diskopete avatar ihipop avatar johnroyer avatar ksdme avatar pionl avatar scharfie avatar weph avatar zenato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

puppeteer-renderer's Issues

Zombie chromium processes

Hi,

We are running the zenato/puppeteer-renderer docker image (version 2.1.5) on Ubuntu 18.04.3, and experiencing zombie chromium processes that slowly eat out all memory and CPU of the host machine. I suspect that the issue could be due to the Puppeteer Process problem. The problem should be fixed in the latest puppeteer release v10.4.0.

Would it be possible to bump the puppeteer version in puppeteer-renderer eg. to v10..40, or alternatively try one of the workarounds suggested in the above problem thread, eg. adding --no-zygote to the puppeteer launch arguments as suggested in this comment

Defaults for screenshots are broken

Puppeteer defaults screenshots to PNG now, which means the defaults here are broken, since the PNG renderer can't have any quality set. Also since pupeteer-renderer already uses the type parameter, you can't override the extraOptions with a type of screenshot.

quality: Number(quality) || 100,

Error 500

Some websites (mainly internal ones, so may be routing issue) is giving me error 500 while trying to take a screenshot. Is it possible to display errors somehow what caused 500? Or access any logs?

no old tags available on docker hub`

problem

newly released version of this image break my tests. admittedly i use it in a pretty hacky way! i would like to stay pinned to the last release, not the latest release

discussion

can we publish tagged images on dockerhub vs just latest?

thx!

Failed to start

Fresh instalation failed

renderer    | 
renderer    | > [email protected] start /app
renderer    | > node src/index.js
renderer    | 
renderer    | Fail to initialze renderer. Error: Failed to launch chrome!
renderer    | /app/node_modules/puppeteer/.local-chromium/linux-515411/chrome-linux/chrome: error while loading shared libraries: libgconf-2.so.4: cannot open shared object file: No such file or directory
renderer    | 
renderer    | 
renderer    | TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
renderer    | 
renderer    |     at onClose (/app/node_modules/puppeteer/lib/Launcher.js:211:14)
renderer    |     at Interface.helper.addEventListener (/app/node_modules/puppeteer/lib/Launcher.js:200:50)
renderer    |     at emitNone (events.js:111:20)
renderer    |     at Interface.emit (events.js:208:7)
renderer    |     at Interface.close (readline.js:370:8)
renderer    |     at Socket.onend (readline.js:149:10)
renderer    |     at emitNone (events.js:111:20)
renderer    |     at Socket.emit (events.js:208:7)
renderer    |     at endReadableNT (_stream_readable.js:1064:12)
renderer    |     at _combinedTickCallback (internal/process/next_tick.js:138:11)

Inclusion of iframes in rendered output

Hi,

This could be considered more of an enhancement on your side and maybe more of a bug on the puppeteer side ??
I am using the 2.2.0 version and it seems that rendering of thumbnails from embedded youtube videos in the webpage that I am screenshot-ing is no longer possible while it was in version 2.1.5. I guess the bump in puppeteer version is to blame ??
I saw this article regarding iframes rendering with puppeteer and wonder if this is something that could be incorporated in your side of the code to facilitate the rendering of embedded iframes.

In my use case, I am using your docker version 2.2.0 and calling it wihtout any middleware straight through a URL like http://localhost:{port}/?url=https://www.google.com to render a webpage screenshot.

Thank you in advance for your time and work.

Add health check endpoint

Adding a health check endpoint would allow this to be used as-is in cloud environments such as AWS fargate/ECS/EBS.

Puppeter cache disable or clear

Is there a way to clear puppeter/docker cache? Preferably by parameter.

I render preview and then when I change image on server renderer does not see it. Holds html image in cache.

Is there a way to force puppeteer docker api to disable caching or force to refresh it?

API to generate jpeg file

Hi, I want to use api to generate jpeg file, but the screenshot option 'type' can not use, because in the URL already has 'type' parameter for choose pdf or screenshot, please help thanks.

Docker container fails to run

> [email protected] start /app
> node src/index.js

(node:18) ExperimentalWarning: The fs.promises API is experimental
Fail to initialze renderer. Error: Could not find browser revision 756035. Run "npm install" or "yarn install" to downlo
ad a browser binary.
    at ChromeLauncher.launch (/app/node_modules/puppeteer/lib/Launcher.js:59:23)

After building a custom docker container from the source using docker build -t imagename . and then running docker running imagename, I'm getting the above error message, looks like the dependencies aren't being installed properly.

Any ideas how to fix this?

Cannot find module 'fs/promises'

Error with latest update 2.4.2
Container fails to start
Reverting back to 2.4.0 solves the issue

Error: Cannot find module 'fs/promises'
Require stack:
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/node.js
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/puppeteer-core.js
- /app/node_modules/puppeteer/lib/cjs/puppeteer/puppeteer.js
- /app/src/renderer.js
- /app/src/index.js
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:815:15)
    at Function.Module._load (internal/modules/cjs/loader.js:667:27)
    at Module.require (internal/modules/cjs/loader.js:887:19)
    at require (internal/modules/cjs/helpers.js:74:18)
    at Object.<anonymous> (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js:36:20)
    at Module._compile (internal/modules/cjs/loader.js:999:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Module.require (internal/modules/cjs/loader.js:887:19) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js',
    '/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/node.js',
    '/app/node_modules/puppeteer-core/lib/cjs/puppeteer/puppeteer-core.js',
    '/app/node_modules/puppeteer/lib/cjs/puppeteer/puppeteer.js',
    '/app/src/renderer.js',
    '/app/src/index.js'
  ]
}
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: `node src/index.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

Add more options

Hi, thank you for building this! I have been using it and I think it has a great potential, especially if it more configurable via the URL. Right now it only has url and type, I think it's good that if we have another:

  • waitUntil
  • waitForSelector
  • width
  • height
  • fullPage

Custom DNS servers

Hello, Would it be possible to make option to provide custom DNS servers?
Currently docker is using 8.8.8.8/8.8.4.4

I would like to provide another DNS servers at runtime.

I can edit those in etc/resolv, but it resets every time i restart host PC.

Expected options.clip.x to be a number but found undefined

As in title

Error: Expected options.clip.x to be a number but found undefined
at Object.exports.assert (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
at Page.screenshot (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1070:25)
at Renderer.screenshot (/app/src/renderer.js:120:33)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/src/index.js:64:44

Maybe last fix for #36 broke something?

Oops, An expected error seems to have occurred.

Error: net::ERR_CERT_AUTHORITY_INVALID at https://localhost/c5934.html
    at navigate (/app/node_modules/puppeteer/lib/FrameManager.js:108:45)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at Frame.<anonymous> (/app/node_modules/puppeteer/lib/helper.js:105:23)
    at Page.goto (/app/node_modules/puppeteer/lib/Page.js:615:53)
    at Page.<anonymous> (/app/node_modules/puppeteer/lib/helper.js:106:31)
    at Renderer.createPage (/app/src/renderer.js:22:16)
    at process._tickCallback (internal/process/next_tick.js:68:7)

media for pdf

Hi, I am trying to understand if I can export the pdf with emulatemedia as screen but it seems that this can't be altered because it is hardcoded to always use print? Or am I not using the url varaibles properly?

I tried media, emulateMedia, mediaType & emulateMedia.mediaType none of which gave me any different results.

If I am reading this line

page = await this.createPage(url, { timeout, waitUntil, credentials, emulateMedia: 'print' })

correctly and it is hardcoded is there a chance you might leave it as an option in the URL and if it is not provided then by default use 'print'?

Thank you for taking the time to build and maintain this image!

type=pdf not actually returning a PDF and hanging

Hi, I like the approach you taka a lot - using the official Chrome and puppeteer libraries to stay as near to "real chrome" as possible.

But I have a pretty fundamental issue: started master branch locally. Trying your example URL curl -v localhost:3000/?url=http://www.google.com&type=pdf (or trying the same in the browser) does not return a PDF but the HTML just as if "type=pdf" were not passed.

In addition, the server is not closing the connection and not advising the content-length header, so that the download dialog in the browser is hanging and can only say " X kb of unknown, duration unkonwn"

Any idea ?

How to pass additional options for clipping

I've looked at extra options, however these does not seem to work. From Docs clip.y should work with screenshots, but I get 500 error. How should I build query to be able to clip (or scroll site)?

<?php
 $params['width'] = 1200;
    $params['height'] = 600;
    $params['margin.top'] = 100;
    $params['options.clip.y'] = 120; // No effect
    $params['clip.y'] = 120; // Error 500
    $params['clip']['y'] = 120; // Error 500
    $params['type'] = 'screenshot';
//    header('Content-type: image/png');
    echo file_get_contents(SCREENSHOT_API_URL . http_build_query($params));

animationTimeout shouldn't cause exceptions with fullPage (before the timeout passes, at least)

I've noticed occasional exceptions in
https://github.com/zenato/puppeteer-renderer/blob/master/src/wait-for-animations.js#L18

I think what's happening is JS is causing the page to change size, then the image returned by page.screenshot is a different size than previous. This results in an exception from pixelmatch:
https://github.com/mapbox/pixelmatch/blob/master/bin/pixelmatch#L26

Off the top of my head I'd prefer the loop just keep checking until the timeout expires, then return false if for some reason it still didn't match.

UnhandledPromiseRejectionWarning: Error: Page crashed!

I ran into this issue and was wondering if this problem is already known.

(node:25) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 178)
(node:25) UnhandledPromiseRejectionWarning: Error: Page crashed!
    at Page._onTargetCrashed (/app/node_modules/puppeteer/lib/Page.js:209:28)
    at CDPSession.Page.client.on (/app/node_modules/puppeteer/lib/Page.js:129:57)
    at CDPSession.emit (events.js:189:13)
    at CDPSession._onMessage (/app/node_modules/puppeteer/lib/Connection.js:166:18)
    at Connection._onMessage (/app/node_modules/puppeteer/lib/Connection.js:83:25)
    at WebSocketTransport._ws.addEventListener (/app/node_modules/puppeteer/lib/WebSocketTransport.js:25:32)
    at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:125:16)
    at WebSocket.emit (events.js:189:13)
    at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:797:20)
    at Receiver.emit (events.js:189:13)

Once the error is thrown it won't 'recover' and throw the same error until I restart the docker container.

According to this issues it's related to too small shared memory files:
https://github.com/puppeteer/puppeteer/issues/1321\
https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#tips

I haven't debugged the problem thoroughly but It should be solved with this simple flag.

const browser = await puppeteer.launch({
  args: ['--disable-dev-shm-usage']
});

If you think it's a good idea, I am happy to prepare a pull request.

Thanks!

Render started to act as proxy

I jast ran fresh install on docker. sudo docker run -d --restart unless-stopped --name webscreenshot -p 8050:3000 zenato/puppeteer-renderer:latest

But instead of image, i get website source html (which renders website) . It started to act as proxy and was working fine. No idea what happend. Most probably after recent upgrade?

http://localhost:8050/?url=http://google.pl No idea what I do wrong. Worked fine just yesterday.

Authentication

Is there any way to add authentication to the docker service api? For example a secret Bearer.
Currently if I run the docker image, everybody can use the service - which is really an issue - even more as it can act as a proxy.
Many thanks for your great work!

How should it work?

Hi, @zenato

Just wonder about the idea (I'm surprised that can't see this question in issues).

We have next in middleware:

let isRender = false

module.exports = function(options) {
  if (!options || !options.url) {
    throw new Error('Must set url.')
  }

  let rendererUrl = options.url

  const userAgentPattern = options.userAgentPattern || new RegExp(botUserAgents.join('|'), 'i')
  const excludeUrlPattern =
    options.excludeUrlPattern || new RegExp(`\\.(${staticFileExtensions.join('|')})$`, 'i')
  const timeout = options.timeout || 10 * 1000

  return (req, res, next) => {
    if (isRender) return next()
    isRender = true

My concern is that isRender flag. On first request it's false. That's fine. We can run our logic to detect bot and so. After first call you make it true. And it makes middleware ignore all the following requests.

What's the idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.