Giter VIP home page Giter VIP logo

rod's Introduction

Overview

Go Reference Discord Chat

Rod is a high-level driver directly based on DevTools Protocol. It's designed for web automation and scraping. Rod is designed for both high-level and low-level use, senior programmers can use the low-level packages and functions to easily customize or build up their own version of Rod, the high-level functions are just examples to build a default version of Rod.

Features

  • Chained context design, intuitive to timeout or cancel the long-running task
  • Debugging friendly, auto input tracing, remote monitoring headless browser
  • Thread-safe for all operations
  • Automatically find or download browser
  • Lightweight, no third-party dependencies, CI tested on Linux, Mac, and Windows
  • High-level helpers like WaitStable, WaitRequestIdle, HijackRequests, WaitDownload, etc
  • Two-step WaitEvent design, never miss an event (how it works)
  • Correctly handles nested iframes or shadow DOMs
  • No zombie browser process after the crash (how it works)

Examples

Please check the examples_test.go file first, then check the examples folder.

For more detailed examples, please search the unit tests. Such as the usage of method HandleAuth, you can search all the *_test.go files that contain HandleAuth or HandleAuthE, for example, use Github online search in repository. You can also search the GitHub issues, they contain a lot of usage examples too.

Here is a comparison of the examples between rod and Chromedp.

If you have questions, please raise an issue or join the chat room.

How it works

Here's the common start process of rod:

  1. Try to connect to a Devtools endpoint (WebSocket), if not found try to launch a local browser, if still not found try to download one, then connect again. The lib to handle it is launcher.

  2. Use the JSON-RPC to talk to the Devtools endpoint to control the browser. The lib handles it is cdp.

  3. Use the type definitions of the JSON-RPC to perform high-level actions. The lib handles it is proto.

Object model:

object model

FAQ

Q: How to contribute or become a maintainer

Please check this doc.

Q: How to use Rod with docker so that I don't have to install a browser

To let rod work with docker is very easy:

  1. Run the rod image docker run -p 9222:9222 rodorg/rod

  2. Open another terminal and run a go program like this example

The rod image can dynamically launch a browser for each remote driver with customizable browser flags. It's tuned for screenshots and fonts among popular natural languages. You can easily load balance requests to the cluster of this image, each container can create multiple browser instances at the same time.

Q: Does it support other browsers like Firefox or Edge

Rod should work with any browser that supports DevTools Protocol.

  • Microsoft Edge can pass all the unit tests.
  • Firefox is supporting this protocol.
  • Safari doesn't have any plan to support it yet.
  • IE won't support it.

Q: Why is it called rod

Rod is the name of a control device for puppetry, such as this image. The meaning is we are the puppeteer, the browser is the puppet, we use the rod to control the puppet.

Q: How versioning is handled

Semver is used.

Before v1.0.0 whenever the second section changed, such as v0.1.0 to v0.2.0, there must be some public API changes, such as changes of function names or parameter types. If only the last section changed, no public API will be changed.

You can use Github's release comparison to see the automated changelog, for example, compare v0.75.2 with v0.76.0.

Q: Why another puppeteer like lib

There are a lot of great projects, but no one is perfect, choose the best one that fits your needs is important.

  • Chromedp

    Theoretically, Rod should perform faster and consume less memory than Chromedp.

    Chromedp uses a fix-sized buffer for events, it can cause dead-lock on high concurrency. Because Chromedp uses a single event-loop, the slow event handlers will block each other. Rod doesn't have these issues because it's based on goob.

    Chromedp will JSON decode every message from the browser, rod is decode-on-demand, so Rod performs better, especially for heavy network events.

    Chromedp uses third part WebSocket lib which has 1MB overhead for each cdp client, if you want to control thousands of remote browsers it can become a problem. Because of this limitation, if you evaluate a js script larger than 1MB, Chromedp will crash, here's an example of how easy you can crash Chromedp: gist.

    When a crash happens, Chromedp will leave the zombie browser process on Windows and Mac.

    Rod is more configurable, such as you can even replace the WebSocket lib with the lib you like.

    For direct code comparison you can check here. If you compare the example called logic between rod and chromedp, you will find out how much simpler rod is.

    With Chromedp, you have to use their verbose DSL like tasks to handle the main logic, because Chromedp uses several wrappers to handle execution with context and options which makes it very hard to understand their code when bugs happen. The heavily used interfaces make the static types useless when tracking issues. In contrast, Rod uses as few interfaces as possible.

    Rod has less dependencies, a simpler code structure and better test automation, you should find it's easier to contribute code to Rod. Therefore compared with Chromedp, Rod has the potential to have more nice functions from the community in the future.

    Another problem of Chromedp is their architecture is based on DOM node id, puppeteer and rod are based on remote object id. In consequence, it will prevent Chromedp's maintainers from adding high-level functions that are coupled with runtime. For example, this ticket had opened for 3 years. Even after it's closed, you still can't evaluate js express on the element inside an iframe.

  • puppeteer

    Puppeteer will JSON decode every message from the browser, Rod is decode-on-demand, so theoretically Rod will perform better, especially for heavy network events.

    With puppeteer, you have to handle promise/async/await a lot, it makes elegant fluent interface design very hard. End to end tests requires a lot of sync operations to simulate human inputs, because Puppeteer is based on Nodejs all IO operations are async calls, so usually, people end up typing tons of async/await. If you forget to write a await, it's usually painful to debug leaking Promise. The overhead grows when your project grows.

    Rod is type-safe by default, and has better internal comments about how Rod itself works. It has type bindings for all endpoints in Devtools protocol.

    Rod will disable domain events whenever possible, puppeteer will always enable all the domains. It will consume a lot of resources when driving a remote browser.

    Rod supports cancellation and timeout better, this can be critical if you want to handle thousands of pages. For example, to simulate click we have to send serval cdp requests, with Promise you can't achieve something like "only send half of the cdp requests", but with the context we can.

  • playwright

    Rod and Playwright were first published almost at the same time. It's a great step forward for the Puppeteer team. Most comparisons between Rod and Puppeteer remain true to Playwright.

    One of Rod's architectural goal is to make it easier for everyone to contribute and make it a pure community project, that's one big reason why I chose Golang and the MIT license. Typescript is a nice choice but if you check Playwright's design choices, any and union types are everywhere, if you try to jump to the source code of page.click, d.ts files will let you understand the reality of typescript. Golang is definitely not good enough, but it usually introduces less tech debt than node.js typescript, if you want me to choose which one to use for QA or Infra who's not familiar with coding to automate end-to-end test or site-monitoring, I would pick Golang.

    Their effort for cross-browser support is fabulous. But nowadays, HTML5 is well adopted by main brands, it's hard to say the complexity it brings can weight the benefits. Will the cross-browser patches become a burden in the future? Security issues for patched browsers is another concern. It also makes it tricky to test old versions of Firefox or Safari. Hope it's not over-engineering.

  • selenium

    Selenium is based on webdriver protocol which has much less functions compare to devtools protocol. Such as it can't handle closed shadow DOM. No way to save pages as PDF. No support for tools like Profiler or Performance, etc.

    Harder to set up and maintain because of extra dependencies like a browser driver.

    Though selenium sells itself for better cross-browser support, it's usually very hard to make it work for all major browsers.

    There are plenty of articles about "selenium vs puppeteer", you can treat rod as the Golang version of Puppeteer.

  • cypress

    Cypress is very limited, for closed shadow dom or cross-domain iframes it's almost unusable. Read their limitation doc for more details.

    If you want to cooperate with us to create a testing focused framework base on Rod to overcome the limitation of cypress, please contact us.

rod's People

Contributors

ysmood avatar infalmo avatar normalpunch avatar hhhapz avatar youshy avatar mkfsn avatar rajender avatar riatre avatar yasarluo avatar wings-xue avatar jaekook avatar lu4p avatar zema1 avatar zhangguanzhang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.