Giter VIP home page Giter VIP logo

actor-content-checker's Introduction

Content Checker

Features

This actor lets you monitor specific content on any web page and sends an email notification with before and after screenshots whenever that content changes. You can use this to create your own watchdog for prices, product updates, sales, competitors, or to track changes in any content that you want to keep an eye on.

Technically, it extracts text by selector and compares it with the previous run. If there is any change, it runs another actor to send an email notification, save, and send screenshots.

Tutorial

Read this (https://blog.apify.com/how-to-set-up-a-content-change-watchdog-for-any-website-in-5-minutes-460843b12271) blog post for more ideas and a step-by-step tutorial on how to set it up.

Input

The actor needs a URL, content selector, and an email address. A screenshot selector can also be defined or, if not defined, the content selector is used for the screenshot.

For detailed input description please see the Input page.

Output

Once the actor finishes, it will update content and screenshot in a named key-value store associated with the actor/task.

If the content changed, another actor is called to send an email notification.

Here's an example of an email notification with previous data, changed data, and two screenshots:

Changelog

Keep up with recent fixes and new features by reading the Changelog.

actor-content-checker's People

Contributors

davidjohnbarton avatar drobnikj avatar glgoose avatar jakubbalada avatar metalwarrior665 avatar olehveselov92 avatar pocesar avatar strajk avatar zpelechova avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

actor-content-checker's Issues

CSS selectors tempremental?

It might be just me - but I'm having mixed results using CSS selectors to correctly identify the area to be captured.

For example, using div#PDP_productPrice on https://www.boots.com/no7-men-anti-ageing-bundle, throws the error

2020-11-08T18:05:46.717Z ERROR The function passed to Apify.main() threw an exception:
2020-11-08T18:05:46.719Z   Error: Cannot get screenshot (screenshot selector is probably wrong)
2020-11-08T18:05:46.721Z       at /home/myuser/main.js:64:15
2020-11-08T18:05:46.722Z       at processTicksAndRejections (internal/process/task_queues.js:97:5)

which in my mind is not expected behaviour. Am I right?

Adding list of urls

Current version allows only one url, adding the option to check multiple urls at same time will be useful.

Timeout in most of the runs.

I am getting a timeout in most of the runs

2020-11-23T12:00:43.218Z ERROR The function passed to Apify.main() threw an exception:
2020-11-23T12:00:43.219Z   TimeoutError: Navigation timeout of 30000 ms exceeded
2020-11-23T12:00:43.220Z       at /home/myuser/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:106:111

Page is reachable and loads fast in a regular browser :S

"Error: Cannot get screenshot (screenshot selector is probably wrong)"

Hi there,

I've been able to use this promising actor on Apify, but checking another URL throws me this error:

2021-05-17T05:04:37.952Z INFO  Saving screenshot...
2021-05-17T05:04:37.956Z ERROR The function passed to Apify.main() threw an exception:
2021-05-17T05:04:37.957Z   Error: Cannot get screenshot (screenshot selector is probably wrong)
2021-05-17T05:04:37.957Z       at /home/myuser/src/main.js:63:15
2021-05-17T05:04:37.958Z       at processTicksAndRejections (internal/process/task_queues.js:97:5)

What works:

(taken from the documentation)

    {
      "url": "https://www.apify.com/change-log",
      "contentSelector": "[class^=change-log__MonthBox-]",
      "sendNotificationTo": "[email protected]",
      "proxy": {
        "useApifyProxy": false
      },
      "navigationTimeout": 30000
    }

Not working

     {
        "url": "https://www.leboncoin.fr/recherche?text=Neuf",
        "contentSelector": "[class^=styles_Listing_]",
        "sendNotificationTo": "[email protected]",
        "proxy": {
          "useApifyProxy": false
        },
        "navigationTimeout": 30000
      }

I've double-checked and the selector exists on the page, at least when I visit the website with my own browser.
The website Leboncoin ships some Bot blockers from Datadome, maybe is the scraping bot from Apify blocked?

Note that also tried with "useApifyProxy": true - the same error occurs..

Any hint?

Provide function for custom logic

Allows you to do some calculations between data in a function that returns a boolean (takes previous and latest state), and use the current behavior if not provided

Ignore Timestamp modifications

Hi there,

Content Checker works great, thanks!

but I receive many false positives because of the timestamp being updated, triggering a notification.
To be clear, I get emails because the post date changed from "Posted 1 hour ago" to "Posted 2 hours ago"

Is there a way to ignore such modifications?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.