Giter VIP home page Giter VIP logo

mrwh1te / botmation Goto Github PK

View Code? Open in Web Editor NEW
18.0 1.0 2.0 7.85 MB

A simple TypeScript framework for declaratively composing bots with Puppeteer

Home Page: https://botmation.dev

License: MIT License

TypeScript 96.87% JavaScript 1.54% HTML 1.58%
puppeteer typescript nodejs npm-package declarative higher-order-functions functional web-crawler async-functionality bots composable-architecture curry

botmation's Introduction

Botmation Crew

Build Status Known Vulnerabilities codecov Quality Gate Status dependencies Status GitHub

Introduction

Botmation is a simple declarative framework for building bots in TypeScript using Puppeteer. It follows a simple, composable pattern focused on a single type of function called a BotAction.

BotActions do everything, from simple tasks in crawling and scraping the web, to logging in & automating social media. They are composable, so they are easily assembled.

The possibilities are endless!

β€œEverything should be made as simple as possible, but no simpler.” - Albert Einstein

Why choose Botmation?

Baby Bot

It empowers Puppeteer code with a simple pattern to maximize code readability, reusability and testability.

Its compositional design comes pre-built with safe defaults for building bots with less code.

It encourages a learn at your own pace approach to exploring the possibilities of Functional programming.

Its Core library has 100% test coverage.

Getting Started

Botmation is a NodeJS library written in TypeScript. You'll need node.js LTS installed and the TypeScript compiler (tsc) installed globally (or have a transpiling code step).

Install

To get started, install Botmation's main package with npm:

Yellow Bot

npm install --save @botmation/core

If you're just getting started, install puppeteer:

npm install --save puppeteer 

You can install any other @botmation packages to extend upon the available functionality:

npm install --save @botmation/instagram

Documentation

Figure out the details with Botmation's Documentation for a deep reference into every package's functions with examples.

Core Library Reference

@botmation/core is the main package consisting of all functions in the API of Botmation docs. It has the foundational functions for building bots and a little more. Other packages, like @botmation/instagram have specific functions that work in conjunction with the core ones.

Import any core API function from:

import { chain, goTo, screenshot } from '@botmation/core'

@botmation/core v1 has 17 groups of BotActions to choose from:

Leader Bot

  • abort
    • abort an assembly of BotAction's
  • assembly-line
    • compose and run BotAction's in lines
  • branching
    • functional branching i.e. if statement
  • console
    • log messages to the nodeJS console
  • cookies
    • read/write page cookies
  • errors
    • try/catch errors in assembly-lines
  • files
    • write files to local disk ie screenshots, pdf's
  • indexed-db
    • read/write to page's IndexedDB
  • inject
    • insert new injects into a line of BotAction's
  • input
    • simulate User input ie typing and clicking with a mouse
  • local-storage
    • read/write/delete from a page's Local Storage
  • loops
    • functional loops i.e. for each
  • navigation
    • change the page's URL, wait for form submissions to change page URL, back, forward, refresh
  • pipe
    • functions specific to Piping
  • random
    • functions specific to randomness like rolling dice
  • scrapers
    • scrape HTML documents with an HTML parser and evaluate JavaScript inside a Page
  • time
    • time based operations i.e. scheduling

Contributors

Code

Michael Lage - Blog

Art

Patrick Capeto - Email

botmation's People

Contributors

dependabot[bot] avatar greenkeeper[bot] avatar mrwh1te avatar snyk-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

botmation's Issues

elementExists

Is your feature request related to a problem? Please describe.
Want to run BotActions only IF an element exists on the page

Describe the solution you'd like
Want elementExists(selector: string): ConditionalBotAction to check the webpage for an element by a selector. Return TRUE if found, else FALSE.

Then pair it with givenThat()() to branch the logic

textExists

Is your feature request related to a problem? Please describe.
Want to run BotActions only IF specific text exists in the page

Describe the solution you'd like
Want textExists(text: string): ConditionalBotAction to check the webpage for specific text like a Header to a form. Return TRUE if found, else FALSE.

Then pair it with givenThat()() to branch the logic

parallel()()

Basically Promise.all(actions), but.... before this, if running in a Pipe, then return the resolved value of Promise.all(), an array of Pipe values whose indexes correspond with the index in the actions de-spreaded array. Otherwise, if it's not running in a Pipe, then simply await the Promise.all() without returning any values.

concept

const parallel = (...actions: BotAction[]):BotAction => async(page, ...injects) => {
   // basically you want to reduce the functions to the Promises returned by the BotAction's
   const actionsPromises = actions.map(action is a Promise ? action : action is a Function ? action(page, ...injects) : action)
   if ( injects are piped ) 
      // then run the actions in parallel, return if pipe, otherwise dont bother
      then return await Promise.all(actionsPromises)
   else
     await Promise.all(actionsPromises)
}

htmlParser()(), $() & $$(): Scraping / Get HTML Element(s)

idea:

const $ = (htmlSelector: string): BotAction<Element> =>
   async(page) => await page.evaluate(functionToGrabElementBySelector, htmlSelector)

const $$ = (htmlSelector: string): BotAction<Element[]> =>
   async(page) => await page.evaluate(functionToGrabElementsBySelector, htmlSelector)

Upgrade pipeRunner typing

pipeObject = wrapValueInPipe(nextPipeValueOrUndefined as PipeValue|undefined)

Remove as PipeValue|undefined, perhaps through a generic, the function can have a type state for the pipe object/value that can be organically updated with each resolved BotAction.

This may end up making a major change where the Pipe object is put at the start of the injects to preserve type in array destructuring. That effects everything in Botmation.

Originally, the pipe object was put at the end of the injects array to make the expected injects order the same between chain and pipe, with no dev effort. So a BotAction that expects inject1, inject2, inject3 will get the same injects whether or not its ran inside a chain or pipe (if injected the same injects).

Putting the pipe object at the start of the injects array will set all the expected injects index off by 1. It creates a problem for dev's making BotAction's that use injects, in terms of chain and pipe compatibility

To get around this, Botmation can have a new standard helper function to get the pipe object/value and injects in the same "chain" like order. The downside, it would put a new requirement on dev's creating BotAction's that use injects, to always use this function.

Need time to think about this. Maybe this is a good time to explore "hooks" in some kind of way? Doesn't have to be programmed the same way as in React, but has a familiar syntax that may work here i.e. getInjects(), getPipeValue()

The goal is for all assemblers that carry a value such as Pipe and SwitchPipe will be upgraded with similar typing functionality, once this simpler function is upgraded

An in-range update of @types/node is breaking the build


🚨 Reminder! Less than one month left to migrate your repositories over to Snyk before Greenkeeper says goodbye on June 3rd! πŸ’œ πŸššπŸ’¨ πŸ’š

Find out how to migrate to Snyk at greenkeeper.io


The devDependency @types/node was updated from 14.0.4 to 14.0.5.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@types/node is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

separate master / dev branches

master will stay sync with public npm release, ie now 2.0.1

all the recent merges would be on dev, as it preps to merge to the 2.1 update into master

NPM Readme, Simplify Main Readme & Point to Docs Site

The main readme can be copied into a specific npm readme. The weback build for the dist needs to copy this new one instead. Modify it, with consideration of the npmjs website template for the content.

Then the main readme can be significantly reduced, with links to botmation.dev

Supporting Piping a collection into forAll()()

Make the higher order param optional, with the fallback being the Piped in value is the collection to iterate on.

Therefore, you could grab data from local storage, then base your scraping, in a loop, from what's collected.

schedule()()

The concept's inspiration is from cron-jobs. The idea is that you could use a tool like pm2 to run this script with automatic restarts, then in the code itself, have a scheduler for scheduling BotAction's

So in essence, two main use cases come to mind

  1. Main root BotAction / Bot assembler
    This is running a Bot's assembled BotActions in a schedule. Probably run, then timeout until next time to run again in a loop

Would be nice to be able to write to a Redis DB when the loop is ran, in case script breaks, pm2 restarts it, the schedule can be restored from a long term data solution (doesn't have to be redis, some kind of long'ish term data storage, in this case an "application state" sounds okay for redis as something we "cache")

  1. Composing a BotAction running in a schedule
    This consideration brings a requirement of some kind of schedule loop exit condition. So there needs to be an optional way to customize the scheduler with a way to exit scheduling, maybe a ConditionalBotAction, or a counter (after running X iterations in the scheduler), etc

As per implementation of the scheduling, maybe cron expression as 1st param and optional 2nd param ConditionalBotAction or counter?

const schedule = (cronExpression: string, exitCondition?: ConditionalBotAction|number) =>
   (...actions: BotAction[]): BotAction => 
     (page, ...injects) => {
        // ...
     }

Need to check the type of exitCondition, if provided, then run assembled actions in a loop with logic for either exitCondition type

browser()

An assembler that injects the browser instance of the page like files()() except without a higher-order sync function to override defaults injected.

ie: SSO usually opens an additional page to complete sign-on and to find utilize that page, you need to go through the browser

await chain(
   browser(
      singleSignOn('facebook', '[email protected]', 'password123')
   )
)(page)

// ...

const singleSignOn = (provider, ..., browser?) // therefore can pass browser in via higher order param or via inject but maybe throw error if browser not provided by any method: inject (higher-order or not), or via higher-order param

CI: tests + coverage

Describe the bug
Need to update CI for the upgrade #92

To Reproduce
Open a PR and look at TravisCI to see the wrong script running.

The new way to run the tests are:

nx test core

and require (either setup via jest or concurrent nodejs command):

npm run test:site

in order for the e2e tests done locally

also, coverage needs to be adjusted

Expected behavior
Tests to run for the core library. Leave out others.

Coverage to be collected of the core library's tests and submitted to codecov.

singleSignOn()

Related to #45 browser()

For web apps' with single sign on via a separate pop-up window. Need browser to grab pages to find the other window to manipulate with the bot to complete sso

Maybe a popup() BotAction for injecting the page of the 2nd page from the browser?

scrollTo()

Newer versions of Puppeteer have enhanced the page.click() method to scroll the element into view, before clicking it, if it isn't in view already.

Therefore, was hesitant to add this, but it can be advantageous for positioning the "camera" in taking screenshots.

"Navigation" BotAction

Docs Typo's

Few typo's in the main README and quite a few in the Actions docs (looks like a find/replace gone wrong)

Update Botmation Interface

export class Botmation implements BotmationInterface {

It's missing new methods added. Make most optional

The interface unifies the most important methods devs may want to use inner-changeable with multiple classes of Bots (for the imperative OOP approach, like InstagramBot implements BotmationInterface)

login() - linkedin

Similar to instagram

Will need a new webpack dist for new npm module botmation-linkedin if doesn't exist already

Future:

  • research methods for auth checking (local storage, idb, cookies, etc)
  • isGuest, isLoggedIn
  • likePostsFrom(...names)

Support sync BotAction's

Concerned about how most of Puppeteer's Page's methods are async so some dev's may try to await or resolve promises inside sync functions but page.url() is sync, and other BotActions would run faster if ran synchronously such as map() in a Pipe, etc

Concept/Idea
Same method parameters (page: Page, ...injects: any[]), but instead of returning a Promise, it either doesn't return (void) or returns a PipeValue. Therefore, all assembly lines need to check the returned value for something scalar like undefined or a PipeValue, and test it for being a Promise. If a promise, then await it before proceeding further.

As per interfaces, idea is:

interface BotAction<R = void, I extends Array<any> = any[]> extends Function {
  (page: Page, ...injects: I) : Promise<R> | R
}

Build/Package size

The build can be reduced by using a single entry point, the main barrel.

  • update barrel to include anything missing
  • update webpacks dist & maybe build

An in-range update of puppeteer is breaking the build


🚨 Reminder! Less than one month left to migrate your repositories over to Snyk before Greenkeeper says goodbye on June 3rd! πŸ’œ πŸššπŸ’¨ πŸ’š

Find out how to migrate to Snyk at greenkeeper.io


The dependency puppeteer was updated from 3.0.4 to 3.1.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

puppeteer is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details

Release Notes for v3.1.0

Raw notes

8ba3675 - chore: mark version v3.1.0 (#5883)
a17bd89 - feat: add securityDetails.subjectAlternativeNames() #5628 (#5881)
e823289 - feat(chromium): roll Chromium to r756035 (#5879)
ad3613d - docs(contributing): clarify list of Chromium versions (#5878)
dc26b8d - docs(examples): add cucumber-puppeteer-example for integration testing (#5875)
3e76554 - chore: fix async dialog specs when they fail (#5859)
b2552e4 - chore: restore page.setUserAgent test (#5868)
39f1b13 - chore: extract Request and Response into its own module (#5861)
b510c35 - chore: fetch Firefox from JSON source instead of RegExp (#5864)
69c38fc - chore: extract ConsoleMessage and FileChooser into its own module (#5856)
0aba6df - chore: force Mocha to exit on CI (#5862)
9368edb - chore: upgrade TypeScript to 3.9 (#5860)
5f42547 - chore: extract SecurityDetails into its own module (#5858)
f5d2597 - chore: add running TSC to test README (#5852)
c6d01c9 - chore: extract BrowserRunner into its own module (#5850)
b38bb43 - Warn when given unsupported product name. (#5845)
6099272 - chore: add @types/proxy-from-env (#5831)
5343c7a - chore: private-ise src/Accessibility.ts (#5832)
ce09742 - feat: add more options to check_availability script (#5827)
5103540 - chore: add command to run eslint with --fix flag (#5829)
49ce659 - chore: remove src/TaskQueue (#5826)
4fdb1e3 - chore: add Prettier (#5825)
ae576af - chore: mark v3.0.4-post (#5824)

Commits

The new version differs by 23 commits.

  • 8ba3675 chore: mark version v3.1.0 (#5883)
  • a17bd89 feat: add securityDetails.subjectAlternativeNames() #5628 (#5881)
  • e823289 feat(chromium): roll Chromium to r756035 (#5879)
  • ad3613d docs(contributing): clarify list of Chromium versions (#5878)
  • dc26b8d docs(examples): add cucumber-puppeteer-example for integration testing (#5875)
  • 3e76554 chore: fix async dialog specs when they fail (#5859)
  • b2552e4 chore: restore page.setUserAgent test (#5868)
  • 39f1b13 chore: extract Request and Response into its own module (#5861)
  • b510c35 chore: fetch Firefox from JSON source instead of RegExp (#5864)
  • 69c38fc chore: extract ConsoleMessage and FileChooser into its own module (#5856)
  • 0aba6df chore: force Mocha to exit on CI (#5862)
  • 9368edb chore: upgrade TypeScript to 3.9 (#5860)
  • 5f42547 chore: extract SecurityDetails into its own module (#5858)
  • f5d2597 chore: add running TSC to test README (#5852)
  • c6d01c9 chore: extract BrowserRunner into its own module (#5850)

There are 23 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

abort() - Abort Signal

This BotAction would return a Promise that resolves a signal like:

{
  brand: 'abort-signal'
  value: 'actions runner'
}

When a BotAction returns an object like this, the actions runner will check the returned value for matching this signal and if it is, it doesn't process any more actions.

The action runners would then check every action's return value except the last action, since that can just return up the context to the higher runner to kill that instead.

This would effectively change chain()() and the default BotAction to return Promise<void> or Promise<KillSignal>, unless killing an actions runner is a pipe only feature....

It would be handy with the switchPipe() concept as a means to kill the actions runner, but in this case, if a case() returned a signal that it ran (therefore had at least one matching value, to cause it to run its actions) -- a break; equivalent.

Improve code by replacing silent errors

Multiple instances in the code capture errors and provides safe fallbacks. This will increase debug work as the source of errors is obscure. This issue is in favor of changing the code to fail loudly. This will help devs find the source of the bug quicker

databaseName ? databaseName : injectDatabaseName ? injectDatabaseName : 'missing-db-name',

Determine a standard practice then document it:

  • Either use logError() or logWarning()

Separate NPM Packages for Sites

The bots directory is basically a directory of folders, 1 for each domain, that has pre-made BotAction's

As of now, there is only Instagram, which is okay. But with the addition of new one's, there is a need to separate this out.

Could either name-space it, or create separate modules that use Botmation as a peer dependency, one for each domain ie:

npm i botmation botmation-instagram botmation-linkedin

So botmation has the main, and each botmation-* builds on top with site specific BotAction's, Helper's, Selectors, etc.

injectMap()()

This sets the first inject with a hash-map of injectables for assembled BotAction's

This can be helpful if you have advanced injecting needs, as a way to distinguish injectables by key.

Goal is to resolve deep nesting of higher-order injectors, because then the functions assembled within, have an imposed injects order (inner array typing) that couples the functions to a particular injects structure, it gets worse the deeper nested you go. So if you need to make a major injector change, like in React, adding new Provider Context component, it can impose large amounts of required dev work to update other functions with new injects order, unless you just add it to the end, over and over again, which then imposes structure on itself in how injectors are assembled in the Bot.

By advancing injects with keys, slight more control can be gained to de-couple some of the imposed order.

Purely concept, the goal is more important

It would be nice to be able to use this over and over again, in a nesting way, that continues to enrich that same 1st injected hash-map with new injectables. Or would it be nice to collect injectables in various hash-maps injected as new injects, as nesting occurs? Maybe on assembly customizable?

Key collisions could override?

state()()

Ability to pass in an object with key/value pairs representing the state of something(s) in the first call state() (with fallback for Pipe value as initial state) then assemble BotAction's to run with that state object injected via the second call state()().

The initial concept is to inject a special State-like instance which is basically a Map, or similar in functionality, so complex state objects can be maintained and shared through many BotAction's.

Piping is nice, and could be used to share a State like object, so this would be another approach to sharing data across BotAction's that free's up the Pipe for other purposes

Randomize

Is your feature request related to a problem? Please describe.
I want to run some BotAction's randomly, like on the flip of a coin, or roll of a die.

Describe the solution you'd like
BotAction that can take a number to represent the probability of the following BotAction(s) to run. ie

chain(
   // 12 sided die where the botaction's run, if 1 specific side is landed in the param value sided die
   runOnRoll(12)(
      // ... botaction's
   )
)

Maybe a helper to generate a random number?

Describe alternatives you've considered
Nothing

Additional context
Make bot's more dynamic, less static in their actions.

Scrapers error out on not finding element by selector

Describe the bug
Errors when attempting to scrape element that does not exist

To Reproduce
Use the $() or $$() BotActions with a selector for an element(s) that does not exist

Expected behavior
The selector BotActions to return undefined instead of throwing an error

CI: Mergify Config

Mergify is setup to merge, but settings on the repo require squash & rebase

Adjust mergify settings to work with repo settings to resume automatic merging of passing PR's

Helper Function: Handle Cascading Implicit Dependency

handle implicit dependencies in 1 of 3 ways (in this cascading order):

  1. higher-order params
  2. pipe value
  3. injects

1 and 3 will probably be the most common, however this common functionality of cascading to an implicit dependency value, can be reduced into 1 reusable function, maybe 2?

goToPipeValue

Is your feature request related to a problem? Please describe.
Want to dynamically navigate to the next URL from returning it in a Pipe from a BotAction

Describe the solution you'd like
goToPipeValue: BotAction that looks at the injects pipe object's value, maybe checks to see if it's a url before attempting to navigate there

Proposal: simplify waitForNavigation()

export const waitForNavigation = (): BotAction => async(page: Page) => {

for simple bot actions that require not customization (no arguments), omit the function syntax by assigning the BotAction directly to an exported const

The above code, rewrite as:

export const waitForNavigation: BotAction = async(page: Page) => {
  await page.waitForNavigation()
}

Then when calling it in sequence, don't call it like a higher order function returning an async bot action, just give it as the async bot action that it is, ie:

bot.actions(
    // submitLoginForm sequence
   waitForNavigation, // <- no "()"
   // sequence to do after login
)

switchPipe()(), case()() & break()

Some kind of switch/case functional implementation. Either Pipe dependent, or inject, or maybe all in one higher order switch call...

see #35 (comment) for details

Edit: Use piping system, so switchPipe()() is a special kind of Pipe that passes in the same Pipe value from the original resolved BotAction in the first call switchPipe()

Therefore switchPipe()() is a new kind of Assembly Line

case()() checks the equality for the values provided in the first call case() against the Pipe value and if one or more are true (equal), then the assembled actions are ran.

case()() can be used inside and outside switchPipe()(). It's kind of like givenThat()() except for comparing the Pipe's value to the "cases" supplied (values)

break() is abort() from #36 to be reused but slightly different in switchPipe()()

Optimize screenshotAll composition

export const screenshotAll = (urls: string[], botFileOptions?: Partial<BotFileOptions>): BotFilesAction =>

Something like this:

const screenshotAll = (urls: string[], botFileOptions?: Partial<BotFileOptions>): BotFilesAction =>  
    forAll(urls)(
      url => ([
        goTo(url),
        files(enrichBotFileOptionsWithDefaults(botFileOptions))(
          screenshot(url.replace(/[^a-zA-Z]/g, '_')) // filenames are created from urls by replacing nonsafe characters with underscores
        )
      ])
    )

request() & pagelessPipe()()

Make an API request and return the response

Could potentially make a bot that never interacts with an actual chrome page instance... ie using https://www.reddit.com/dev/api to create a reddit bot

but then it's less about navigating pages, and more about navigating state effected by API calls, functions, etc. Grab data, parse it, make a decision, do something ie make a PUT/POST call to create data in a web app using a publicly shared API.

There is overlap between this and Piping.... not necessarily new stuff entirely. The downside is that you have to create an instance of page to inject it so maybe create a pageless pipe? ie:

await pagelessPipe(
   API('get', 'url'), // get forum threads
   map(transform response to iterative data collection),
   forAll()(
      // for each forum thread, read and respond
      API('post', 'url', 'some newly generated comment')
   )
)(undefined, inject1, inject2, ...) 
// can submit real `page` or nothing, doesn't matter, just to follow `BotAction` interface

This function would be an Assembly-Line BotAction that ignores page provided and injects undefined for page param in assembled BotAction's.

Have to be careful with this and not use it with BotAction's that use page

Move wait() into Navigation

In some ways, it's kind of like Navigation, ie reload() without reloading... and similar language to waitForNavigation while helping Utility BotAction's focus on logical branching/looping

No fundamental changes to API, just change of BotAction category/type for wait()

can be included with a future release

  • update docs site

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.