Giter VIP home page Giter VIP logo

klaxon's Introduction

Get emailed when a website changes

Klaxon is a free, quick to setup and easy to use service that checks websites for changes so you don't have to. Working with the fine folks at MuckRock and Document Cloud, we have launched an even easier version of Klaxon that requires no server for you to maintain and very little setup. We encourage all new individual users, or anyone who no longer wants to run their own pesky Heroku servers, to use the new Klaxon Cloud tool that is built into Document Cloud, the open document archive for journalists and researchers.

While the original version of Klaxon will remain free and open source on Github, we will no longer be supporting development for individual users. Feel free to fork the source code to run your own instance, and send back pull requests) for features you think will benefit the long-standing, institutional users.

In either variety of Klaxon, you add websites you want monitored and the tool will visit them periodically. If they change, it'll email you what's different. It's perfect for monitoring website changes you might miss — like freedom of information disclosure logs, court records or anything related to Donald Trump. It can even send notifications to Slack channels with a little extra setup.

Built and refined in the newsroom of The Marshall Project, Klaxon has provided our journalists with many news tips, giving us early warnings and valuable time to pursue stories. Klaxon has been used and tested by journalists at The Marshall Project, The New York Times, The Texas Tribune, The Washington Post, Associated Press, and more.

The documentation here refers solely to the original version of Klaxon that requires you to manage your own server and email service through Heroku. Read more about it below or say hello to the humans behind the project at the Google Group email list.

Alerting journalists to changes on the web

The public release of this free and open-source software was supported by Knight-Mozilla OpenNews.

How Does Klaxon Work?

Klaxon enables users to "bookmark" portions of a webpage and be notified (via email, Slack, or Discord) of any changes that may occur to those sections. Learn more about bookmarklets on the help.md page.

Deploy

Setting up your Klaxon

Klaxon is open-source software built by the newsroom of The Marshall Project, a nonprofit investigative news organization covering the American criminal justice system. It was created by a team of three — Ivar Vong, Andy Rossback and Tom Meagher — and supported by contributions from users around the world. It is subject to the kind of shortcomings any side project might encounter. It may break unexpectedly. It may miss a change in a website or an email might not fire off correctly. Still we’ve found it immensely useful in our daily reporting. We want other journalists to benefit from Klaxon and to help us improve it, but keep these caveats in mind and use it at your own risk.

And if you’d like to help us make this better or add new functionality to it we’d love to have your help.

Getting started

One of our goals for Klaxon was to make it as easy as possible for reporters and editors without tech backgrounds to use and to setup. Getting your own Klaxon running in your newsroom will require you to run a handful of instructions one time through the help of online services Heroku and GitHub. It should take maybe 10 minutes to set up your Klaxon, including the time to create accounts on Heroku and GitHub if you need to.

We use Heroku to deploy software at The Marshall Project. We think it makes some of the tedious work of running servers a lot easier to deal with so we designed Klaxon to be easily deployable on Heroku. (If you’d like to run this in your newsroom’s preferred server setup — say using Docker or a Linux machine — we encourage you to do so, but know you'll be on your own maintaining it!)

If you want to use our setup, you’ll need to create an account with Heroku if you don’t already have one.

How much will this cost?

Heroku unfortunately no longer offers a free tier, so you will need to be able to pay around $10 a month to get started. Out of the box with Heroku you’ll get:

  • 12,000 emails per month with SendGrid
  • 10,000 records of changes in your Postgres database
  • A web interface
  • A scan of each watched site every 10 minutes

If you find your newsroom hitting the limits of these tiers you can pay to expand them. To send up to 40,000 emails a month you may upgrade your Sendgrid add-on in Heroku for $9.95 a month. If you need to store more records in your database you can pay an additional $4 a month for up to 10 million rows. And if you need your web interface running around the clock, you can upgrade your Heroku dyno from the Eco to the Hobby level for $2 more a month. This likely won't be necessary — particularly in smaller newsrooms — but it’s good to know.

Let’s do this

If you have a Heroku account you’re ready to go — it’s time to click on this button:

Deploy

It will take you to a page to configure your new app in Heroku’s dashboard. First, give your app a name in the first box. While this is technically optional, this will also double as the URL for your Klaxon instance, so think carefully about it for a moment. Try maybe an abbreviation for your newsroom with a hyphen and the word klaxon, like “wp-klaxon” or “sl-klaxon”. This will become a URL as https://sl-klaxon.herokuapp.com/

Scroll down to the admin_emails field and add a comma-separated list of email addresses of your newsroom’s Klaxon administrators. These administrators will be able to create accounts for users in your organization as well as configure the Slack and Discord integrations.

Click the big purple “Deploy" button. If you haven’t given Heroku your credit card yet it will ask you for your information now. After that, give Heroku a few minutes for the app to build.

When you see this message:

...you’re almost done.

Click on the button that says “Manage App”. This takes you behind the scenes of the various components powering your Klaxon. On this resources screen, click on the link for “Heroku Scheduler,” which will take you a new screen where you must add the very important piece. The scheduler is what runs every 10 minutes to actually check all the sites and pages you’re watching. Click the long, purple "Add new job" button. In the text box next to the dollar sign, type the words “rake check:all” with the colon and without the quotes. Under “Frequency” change it from “Daily” to “Every 10 minutes”. Click the purple “Save” button and your scheduler item should look like this:

Sendgrid now requires additional steps to confirm that you are not a spammer. Your new Sendgrid account is created in a "suspended" state. To get it unsuspended you have to contact Sendgrid support. You can do this by clicking the Sendgrid logo on the Resources tab.

If clicking on the logo takes you to an error page do not worry. This has been known to happen as Sendgrid's system has undergone redesigns. Instead go to Sendgrid's page to ask for support. Be sure to use the same email address associated with your Heroku account and provide the url of your Klaxon instance. When they ask for "Business impact," choose "P3 General - You have a question about Sendgrid or how to use its products".

This step is a nuisance, but important. You will not be able to get an email to log in to Klaxon until you are cleared by Sendgrid. This usually happens pretty quickly.

Unfortunately you are not yet done configuring Sendgrid. There are more steps to set up your account.

  1. From the Heroku application page, click on the "Resources" tab, and click the link to the Sendgrid plugin, this will take you to the Sendgrid website.
  2. In the left hand column choose Settings -> API Keys. Click the blue "Create an API Key" button in the top right corner.
  3. In the form that appears fill in the API key name (doesn't matter what you name it), make sure the "Full Access" option is selected, and click "Create & View".
  4. Click the API key to copy it to your clipboard. Then navigate back to Heroku.
  5. In the "Settings" tab of your Heroku app click the "Reveal Config Vars" button.
  6. Change the SENDGRID_PASSWORD variable to the API Key by clicking the pencil icon next to it, pasting it in, and saving it.
  7. Change the SENDGRID_USERNAME variable to the string "apikey" in the same manner.

Next you'll need to set up a "Verified Sender" account in Sendgrid using an email address that you have access to. In your Sendgrid dashboard, click "Sender Authentication" and choose "Verify Single Sender." (See #404 for some more context.)

When you've completed this process in Sendgrid, you'll need to set the MAILER_FROM_ADDRESS variable as you did above to your verified sender email address.

Finally, now, you should be done setting up your Sendgrid account.

At the top of the scheduler page, click the link that is the name of your app (“sl-klaxon”). This will take you to back to Klaxon's dashboard. Then click the button in the upper right that says "Open app," and this should take you to your Klaxon's login screen on the web.

Type the admin email address that you gave the Heroku dashboard earlier into the text box and hit the “Login” button. You’ll then be redirected to a page that says an email has been sent to you. Check your inbox. This may take a minute or two to arrive.

In the email, click the “Go to Dashboard” button. You’re now authenticated in the system and can access your Klaxon. Configuring your Klaxon Once you’re logged in, you should see the main page that will fill up in the coming days with the feed of all of your Klaxon updates. Now, you want to go add other users in your newsroom to the system. Click the “Settings” button in the upper right corner, and select “Users” from the menu.

On the right side of the page, click the “Create New User” button. Add the reporter’s first and last name and email address, and she will get an email allowing her into the Klaxon. Now, finally, you and your users can start adding web pages you want Klaxon to watch.

Limit new users to only those on specific email domain(s)

By default, people with any email address can be added as new users. If you'd like to allow only users with specific email domains, set the APPROVED_USER_DOMAINS environment variable (or "Config Vars" in Heroku's lingo). That variable should be a comma-separated list of domains, e.g., themarshallproject.org, nsa.gov.

Notify a Slack or Discord channel

You’re all set for email notifications. If you’d like to also receive alerts through Slack and/or Discord you can set that up now. Click on the “Settings” button in the upper right corner of the page and choose “Integrations” from the menu. On the Integrations page, click the “Create Slack Integration” button. You can add an integration for any number of channels in your newsroom’s Slack or Discord. For each channel, you just have to set up an Incoming Webhook.

Slack

In Slack, click on the dropdown arrow in the upper left corner and choose “Apps & Integrations” from the menu. This will open a new window in your browser for you to search the Slack app directory. In the search box, type “Incoming Webhooks” and choose that option when it pops up. If you already have webhooks, you’ll see a button next to your Slack organization’s name that says “Configure.” Otherwise, click the green button that says “Install”.

Now, choose the channel that you want the Klaxon alerts to go to from the dropdown menu. We’d recommend that you not send them to #General, but maybe create a new channel called #Klaxon. After you create or choose your channel, click the green button that says “Add Incoming Webhooks Integration”. Near the top of the next screen, you should see a red URL next to the label “Webhook URL”. Copy that URL and switch over to your browser window with Klaxon in it. Paste the URL into the box labeled “Webhook URL,” and type the name of the channel you want your Slack alerts to go to into the “Channel” box (this should be the same channel name you used in Slack when you created the integration). Now click the “Create Slack Integration Button”. Now you should be all set. If you want to have the ability to send Klaxon alerts to other channels, for specific reporting teams or for certain projects, you can repeat this process.

Discord

In Discord, click on the dropdown arrow in the upper left (next to your server name) and choose “Server Settings” from the menu. Click “Integrations” in the left sidebar, then click the “Create Webhook” button.

Now, choose the name for your webhook (you can leave it the default random name if you'd like) and choose the channel that you want the Klaxon alerts to go to. We’d recommend that you not send them to #general, but maybe create a new channel called #klaxon (you'll need to do this in your normal server view first). After you choose your channel, click “Copy Webhook URL”. Switch over to your browser window with Klaxon in it. Paste the URL into the box labeled “Webhook URL,” and type the name of the channel you want your Discord alerts to go to into the “Channel” box (this should be the same channel name you used in Discord when you created the integration). Additionally, you must append “/slack” (without quotes) to the end of your webhook URL, as these alerts will be sent to Discord as a Slack-Compatible Webhook. Now click the “Create Slack Integration Button”. Now you should be all set. If you want to have the ability to send Klaxon alerts to other channels, for specific reporting teams or for certain projects, you can repeat this process.

Applying upgrades as the project develops

When we release major changes to Klaxon, we’ll make an announcement to our Google Group email list. At that point, you’ll likely want to adopt those in your system as well. If you're comfortable using git on the command line, this would require just a few simple commands: pull the changes from the main branch of this repo, merge them into your forked repo and push it all to Heroku.

But if you're not a programmer, there is still a fairly painless way to upgrade by using GitHub and Heroku. First, you’ll need to fork our repo to your own GitHub account to receive the updates, and then you can use Heroku’s dashboard to push the changes to your application.

If you don’t already have an account at GitHub now is a good time to set one up (don’t worry, it’s free). This has the added benefit of giving you the ability to comment on issues. Once you’re logged into GitHub with your new account, go to the repo for the Klaxon project and click the “Fork” button. This copies our code into a separate version under your GitHub account that you can tie to your Klaxon instance running on Heroku’s servers.

Now, go to https://dashboard.heroku.com and choose your application (remember, the one you named when you first set up Klaxon, probably sl-klaxon or something similar if you followed our advice above). From the menu of options at the top of the page, click on the “Deploy” button. Look for section called “Deployment method,” which should be the second from the top of the Deploy page.

You should see three buttons. Click the one in the middle that says “GitHub Connect With GitHub”. The options at the bottom of the page will change. Now, click the gray button that says “Connect To GitHub”. It will pop up a new window to log you into GitHub, if you aren’t already. In that window, click the “Authorize Application” button. The popup window should now close itself.

On the Heroku page, in the “Connect to GitHub” section at the bottom, type ‘klaxon’ into the search box next to your GitHub username. Click the “Search” button. Next, click the “Connect” button next to the name of your forked repo that pops up below. Finally, select the 'main' branch from the dropdown and click “Enable Automatic Deploys” button in the “Automatic deploys” section. This ties your Heroku server to your GitHub account, so that every time you merge updates into your forked version of the Klaxon repository, they will automatically go live on your server with the latest updates. You'll only have to do all of this one time to set up the pipeline.

Note: if you are upgrading from version 0.2.0 or lower, please follow the additional instructions in migration_setup.md

Finally, each time an update is announced on the Google Group, you can go to your forked version of the repo on GitHub and click the green “New Pull Request” button to pull the changes from our main repo.

On the "basefork" dropdown on the left, click and select your repo. Then click the “compare across forks” link and change the “head fork” on the dropdown menu to “marshallproject/klaxon”. Make sure both the branches are set to "main" (they should already be). Below that, a green checkbox and the words “Able to merge” should appear. If they do, click the green “Create Pull Request” button. Give this pull request a title. You might want to say “Merging Klaxon release 0.9.1” or whatever the new version number is and click the “Create Pull Request” button again.

You should then get a response that looks like this:

If it does, and everything is green, you’re good to go. Just click the “Merge pull request” button then click the “Confirm merge” button and that’s that.

Acknowledgements

The core contributors to Klaxon have been Ivar Vong, Andy Rossback, Tom Meagher, Gabe Isman and Ryan Murphy.

We've been grateful for additional contributions to the project from:

  • captn3m0
  • Jackson Gothe-Snape, SBS News
  • Cameo Hill
  • Emily Hopkins
  • Matthew Verive
  • Jason Kulatunga
  • Yolanda Martinez
  • Jeremy Merrill
  • Justin Myers
  • Kevin Schaul
  • Ari Shapell
  • Jeremy Singer-Vine
  • Mike Stucka
  • k.wakitani
  • Bob Weston

We also owe thanks to Knight-Mozilla OpenNews, which supported the initial public release of this free and open source software.

klaxon's People

Contributors

ajshapell avatar alanhovorka avatar aljohri avatar analogj avatar arossback avatar bobweston avatar captn3m0 avatar chriszs avatar dependabot-preview[bot] avatar dependabot-support avatar dependabot[bot] avatar deroudilhep avatar gabeisman avatar greysteil avatar immewnity avatar ivarvong avatar jacksongs avatar jsvine avatar kevinschaul avatar myersjustinc avatar p53ud0k0d3 avatar rdmurphy avatar stucka avatar tommeagher avatar walinchus avatar wktk avatar yolimartinez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

klaxon's Issues

An alert if a page *doesn't* update

Sometimes a page changes URL or design and then you never hear from it again.
It would be good to get a notice if something hasn't updated in a month, or maybe 45 days, to alert a user to check if the page they're watching has moved rather than failing silently.

No "delete" button to remove a watched page?

As best I can tell, there's no way to remove a page from the list of monitored sites within the GUI.

I've made do by clearing the URL, and I suppose I could delete the entry by jumping into the database, but am I missing something?

special-case the CSS selector generator for <tables>?

Hey y'all,

I've noticed wonky behavior when trying to set up an alert for a table. Often, the only visible space in the table is a bunch of elements. When I set up an alert using tr as the selector (possibly with additional context, like .whatever #whatever tr), the output looks pretty bad. If I fix it to be table, everything works perfectly. I wonder if it'd be worthwhile to special-case the thing that generates CSS selectors to automatically select the parent table if the user has clicked on a tr.

(I guess the problem here is that the user's click is ambiguous -- do they want to watch the whole table, that one row, or that one cell? But that's a theoretical problem.)

I should clarify that I haven't tested this and there may be problems I'm not foreseeing. Obviously I defer to y'all if you've thought about this case. And, at the risk of repeating myself, I'm a big fan of Klaxon and thank you all for your work! Klaxon works super well and we're using it a ton. :)

Cancel/close the bookmarklet

Once you've activated the Klaxon bookmarklet on a page you want to watch, you can't get rid of it and stay on the page without reloading the page. Should we add a cancel/close button?

An API for the Feed to custom scrapers

Klaxon is about making it dead simple for someone to watch a basic HTML page, the kind many government websites still use. The feed, the alerts and the snapshot comparisons are all very useful for this. But I could imagine some people might like to add more complex scrapers of their own devising but still make use of the alerts and the feeds.
What would this entail?

Audio/visual alerts

What would it take for Klaxon to integrate with an arduino that gave audio or strobe light alerts when an update is detected?

Setting two Klaxons on a single page

Per a report from @jeremybmerrill:

if you’re trying to watch a second thing on the same page, it appears that the Klaxon bookmarklet edits the first, rather than (as this user expected) creating a second one.

The current workaround is to manually add a second watcher, but we should think about how the bookmarklet should handle multiple alerts for a single page.

error handling for incorrect url scheme

if a url starts with a space, the watching page breaks because there is a parse error on the url. example:

016-06-24T15:41:54.101794+00:00 app[web.1]:   Rendered pages/index.html.erb within layouts/application (38.7ms)
2016-06-24T15:41:54.102002+00:00 app[web.1]: Completed 500 Internal Server Error in 45ms (ActiveRecord: 16.7ms)
2016-06-24T15:41:54.103252+00:00 app[web.1]:
2016-06-24T15:41:54.103267+00:00 app[web.1]: ActionView::Template::Error (Invalid scheme format:  https):
2016-06-24T15:41:54.103269+00:00 app[web.1]:     40:         <tr>
2016-06-24T15:41:54.103269+00:00 app[web.1]:     41:           <td><%= page.created_at.strftime("%m/%d/%Y at %I:%M%p") %></td>
2016-06-24T15:41:54.103272+00:00 app[web.1]:     42:           <td><%= page.name %></td>
2016-06-24T15:41:54.103273+00:00 app[web.1]:     43:           <td><%= page.domain.truncate(40) %></td>
2016-06-24T15:41:54.103273+00:00 app[web.1]:     44:           <td><code><%= page.css_selector %></code></td>
2016-06-24T15:41:54.103274+00:00 app[web.1]:     45:           <td><%= page&.user&.display_name %></td>
2016-06-24T15:41:54.103275+00:00 app[web.1]:     46:           <td><% if page.page_snapshots.count > 1 %><%= link_to "Latest Change", latest_change_page_path(page), class: "btn btn-default btn-block" %><% end %></td>
2016-06-24T15:41:54.103276+00:00 app[web.1]:   app/models/page.rb:16:in `domain'
2016-06-24T15:41:54.103277+00:00 app[web.1]:   app/views/pages/index.html.erb:43:in `block in _app_views_pages_index_html_erb__2887604200305143239_70290127052820'
2016-06-24T15:41:54.103277+00:00 app[web.1]:   app/views/pages/index.html.erb:39:in `_app_views_pages_index_html_erb__2887604200305143239_70290127052820'
2016-06-24T15:41:54.103278+00:00 app[web.1]:
2016-06-24T15:41:54.103279+00:00 app[web.1]:

after removing the leading whitespace, everything worked again. can we trim urls and/or add some graceful fallback for assuming all URLs can parse correctly?

Can't use bookmarklet when not signed into Klaxon

If you're not signed into a Klaxon session, the bookmarklet will take over a page, but the right sidebar won't appear, so you can't save or close.
The preferred behavior would be for that sidebar to pop up with a signin box that would then reload the page (and bookmarklet) for you.

Better explain how to use the bookmarklet

We got feedback from our user testing that some people found the use of the bookmarklet confusing. Some people weren't familiar with CSS selectors or html and weren't quite sure what exactly they were selecting. Other users didn't really know what a bookmarklet is or how it works.

How can we improve the user documentation to make this powerful feature easier to grasp and use for new users?

Add a CONTRIBUTORS file

We should recognize everyone who has contributed to the project in code, documentation or other support.

Support Amazon SES instead of SendGrid?

I think this would be doable by adding additional ENV VARS here to set the SMTP hostname and SMTP port; this would, I think, address the question of the Sendgrid dependency out of #8.

Customize organization name

Using a config variable on the Heroku page, allow users to customize the name of their Klaxon instance so it's clear to users it is owned by their newsroom.

This could appear in the <title> tag or the header could say "Klaxon at the Marshall Project", for instance, instead of just "Klaxon".

If we do do that, we should allow users to customize this through the settings dropdown as well.

Limit the width of the diff on the change page

The diff is much easier to read on the email alert than in the change page in Klaxon itself. I wonder if we limit the width of the diff and force it to wrap if that wouldn't make it more accessible.

on the site

screen shot 2017-01-25 at 12 39 53 pm

on email

screen shot 2017-01-25 at 12 40 30 pm

Bookmarklet selector feedback is confusing

It's pretty confusing how the selector continues to update after you have clicked to select an element. There should be some visual indication of what the selected element is, or something similar.

Add link to user help from README

Insert this at end of 2nd graf:

[If you need help using Klaxon, [you can find it here.](https://github.com/themarshallproject/klaxon/blob/master/data/help.md)]

Is the number of changes being counted correctly?

The change page for a Klaxon alert states how many times a page has changed, like this:

screen shot 2017-01-25 at 12 43 48 pm

But when I look at the number of snapshots, I see this:
screen shot 2017-01-25 at 12 43 59 pm

That would say to me that it has changed one time since I started watching it, with the first snapshot being the baseline. Am I misunderstanding that, or is the first snapshot actually the first change and we're missing the baseline?

Should robots.txt default to disallow all?

Does Klaxon seem like a thing that should always turn away spiders? I know the login will stop them, but I could see the argument for never announcing this exists to search engines.

`rake check:all` not detecting changes reliably

I'm trying to evaluate Klaxon for possible use in my newsroom, and I've been seeing some weird behavior from the check:all Rake task: It sometimes detects too many changes (both the addition of content and its removal, at the same time) or ends up with a rollback in the middle of looking for changes. (Happy to split this up into two issues if you'd prefer.)

More details below. Would love to figure out what's going on here, because Klaxon does seem really useful!

Change reversal

I tried tracking new posts on the home page of Source. (I think the selector was body div#snap-content-wrapper.snap-content.overthrow div.content div.container article.homepage.)

When I did this, I sometimes would get two emails at roughly the same time (i.e., not from separate rake check:all runs): One email with a new top post (and a corresponding post getting bumped to the next page), and another email with roughly the inverse of that change (i.e., the new top post being removed and the old bumped post coming back), and not necessarily in that order.

Two examples:
klaxon-source-choochoobot
klaxon-source-sheep

These are weird, of course, because they aren't perfect inverses, but they seem to be happening at the same time regardless.

I don't have logs for these incidents, and since I've kept the scheduler task running, they're back beyond the 1,500-line maximum I can get from heroku logs.

Unexplainable rollbacks

I also tried watching some subreddits and saw some odd notes in my Heroku logs (again, don't have those), so I decided to try to spin up the app on my laptop instead so I could more easily observe/control what's going on.

It turns out my selectors didn't work, so no changes were ever going to be detected (more details below), but oddly some pages never get that far and log something about rolling back some sort of database transaction:

$ bundle exec rake check:all
  AppSetting Exists (0.4ms)  SELECT  1 AS one FROM "app_settings" WHERE "app_settings"."key" = $1 LIMIT 1  [["key", "default_host"]]
  AppSetting Load (0.3ms)  SELECT  "app_settings".* FROM "app_settings" WHERE "app_settings"."key" = $1 LIMIT 1  [["key", "default_host"]]
setting default_host to localhost:5000
  Page Load (0.4ms)  SELECT "pages".* FROM "pages"
downloading url=https://www.reddit.com/r/dataisbeautiful/new/ for page.id=1
  PageSnapshot Load (0.8ms)  SELECT  "page_snapshots".* FROM "page_snapshots" WHERE "page_snapshots"."sha2_hash" = $1 AND "page_snapshots"."page_id" = 1 LIMIT 1  [["sha2_hash", "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"]]
   (0.2ms)  BEGIN
  SQL (0.3ms)  UPDATE "page_snapshots" SET "updated_at" = '2016-05-03 14:50:16.574080' WHERE "page_snapshots"."id" = $1  [["id", 1]]
   (0.4ms)  COMMIT
page 1 has not changed
downloading url=https://www.reddit.com/r/aww/new/ for page.id=2
  PageSnapshot Load (0.4ms)  SELECT  "page_snapshots".* FROM "page_snapshots" WHERE "page_snapshots"."sha2_hash" = $1 AND "page_snapshots"."page_id" = 2 LIMIT 1  [["sha2_hash", "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"]]
   (0.1ms)  BEGIN
  PageSnapshot Exists (0.3ms)  SELECT  1 AS one FROM "page_snapshots" WHERE "page_snapshots"."sha2_hash" = 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' LIMIT 1
   (0.5ms)  ROLLBACK
  Page Load (0.3ms)  SELECT "pages".* FROM "pages"
  PageSnapshot Load (0.3ms)  SELECT "page_snapshots".* FROM "page_snapshots" WHERE "page_snapshots"."page_id" = $1  [["page_id", 1]]
  PageSnapshot Load (0.3ms)  SELECT "page_snapshots".* FROM "page_snapshots" WHERE "page_snapshots"."page_id" = $1  [["page_id", 2]]

The page hashes shown in the logs here match the SHA-256 for an empty string, suggesting my selectors don't work for the pages as Klaxon downloaded them, which makes sense because my selectors (picked via the bookmarklet) suggest a logged-in user with Reddit Gold. (For /r/dataisbeautiful, I have body.subscriber.listing-page.loggedin.gold.hot-page div.content div.spacer, and for /r/aww, I have body.subscriber.listing-page.loggedin.new-page.gold div.content div.spacer div#siteTable.sitetable.linklisting.)

2.3.0 :001 > require "digest"
 => true
2.3.0 :002 > Digest::SHA256.hexdigest('')
 => "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Klaxon doesn't follow 301 redirects -- should it?

I had been watching an agency's site that had an HTTP URL. That site implemented HTTPS by default in late October. (Good on the government! :) But a distracting time for me.) The site would issue a 301 for HTTP requests to point to the same URL under HTTPS.

This confused Klaxon. After the redirect was set up, the response body was empty, so my selector didn't match, so the email I got from Klaxon showed only a deletion (no green text). This led to me not getting notification of changes, since the page Klaxon was watching really didn't change (since its response body was empty). It's an easy issue to fix for my particular case -- switch the URL to HTTPS -- but it may take a user a while to realize, especially if their site doesn't update approximately daily as mine did.

Now, you all have certainly thought more about the "theory" of Klaxon, so I don't know if this is expected behavior or not. I can totally see it both ways and want to defer to y'all's thinking.

But if y'all think this is a fixable problem, the RestClient gem will follow redirects automatically and I think RestClient.get would be a drop-in replacement for Net:HTTP.get here -- though behavior with respect to encoding might differ...

Typo in text of login email

Message currently says: "You're recieving this email because you just tried to log into your Klaxon account." (emphasis mine)

Postgres error on deploying from master branch

Tried to test the Heroku deployment and got an error message during the build that seems to indicate a missing table in the PostGres setup.

Here's the log:

Configure environment
Build app  Hide build log
-----> Ruby app detected
-----> Compiling Ruby/Rails
-----> Using Ruby version: ruby-2.3.0
-----> Installing dependencies using bundler 1.11.2
       Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment
       Fetching gem metadata from https://rubygems.org/............
       Fetching version metadata from https://rubygems.org/...
       Fetching dependency metadata from https://rubygems.org/..
       Using json 1.8.3
       Installing rake 11.1.2
       Installing i18n 0.7.0
       Installing minitest 5.8.4
       Installing builder 3.2.2
       Installing erubis 2.7.0
       Installing thread_safe 0.3.5
       Installing mini_portile2 2.0.0
       Installing mime-types-data 3.2016.0221
       Installing rack 1.6.4
       Installing arel 6.0.3
       Installing addressable 2.4.0
       Installing bcrypt 3.1.11 with native extensions
       Installing coffee-script-source 1.10.0
       Installing execjs 2.6.0
       Installing concurrent-ruby 1.0.1
       Installing thor 0.19.1
       Installing dotenv 2.1.1
       Installing diffy 3.1.0
       Installing htmlentities 4.3.4
       Installing multi_xml 0.5.5
       Installing jwt 1.5.4
       Installing kramdown 1.10.0
       Using bundler 1.11.2
       Installing pg 0.18.4 with native extensions
       Installing puma 3.2.0 with native extensions
       Installing rails_serve_static_assets 0.0.5
       Installing rails_stdout_logging 0.0.5
       Installing sass 3.4.21
       Installing tilt 2.0.2
       Installing rdoc 4.2.2
       Installing tzinfo 1.2.2
       Installing nokogiri 1.6.7.2 with native extensions
       Installing mime-types 3.0
       Installing rack-test 0.6.3
       Installing rack-cache 1.6.1
       Installing css_parser 1.3.7
       Installing coffee-script 2.4.1
       Installing uglifier 3.0.0
       Installing sprockets 3.5.2
       Installing httparty 0.13.7
       Installing rails_12factor 0.0.3
       Installing sdoc 0.4.1
       Installing activesupport 4.2.5.1
       Installing mail 2.6.4
       Installing rails-deprecated_sanitizer 1.0.3
       Installing premailer 1.8.6
       Installing globalid 0.3.6
       Installing activemodel 4.2.5.1
       Installing activejob 4.2.5.1
       Installing activerecord 4.2.5.1
       Installing rails-dom-testing 1.0.7
       Installing loofah 2.0.3
       Installing rails-html-sanitizer 1.0.3
       Installing actionview 4.2.5.1
       Installing actionpack 4.2.5.1
       Installing sprockets-rails 3.0.4
       Installing railties 4.2.5.1
       Installing actionmailer 4.2.5.1
       Installing premailer-rails 1.9.1
       Installing simple_form 3.2.1
       Installing coffee-rails 4.1.1
       Installing jquery-rails 4.1.1
       Installing sass-rails 5.0.4
       Installing rails 4.2.5.1
       Installing turbolinks 2.5.3
       Bundle complete! 27 Gemfile dependencies, 66 gems now installed.
       Gems in the groups development and test were not installed.
       Bundled gems are installed into ./vendor/bundle.
       Post-install message from rdoc:
       Depending on your version of ruby, you may need to install ruby rdoc/ri data:
       <= 1.8.6 : unsupported
       = 1.8.7 : gem install rdoc-data; rdoc-data --install
       = 1.9.1 : gem install rdoc-data; rdoc-data --install
       >= 1.9.2 : nothing to do! Yay!
       Post-install message from httparty:
       When you HTTParty, you must party hard!
       Bundle completed (30.67s)
       Cleaning up the bundler cache.
-----> Preparing app for Rails asset pipeline
       Running: rake assets:precompile
       failed to set default_host, reason: PG::UndefinedTable: ERROR:  relation "app_settings" does not exist
       LINE 5:                WHERE a.attrelid = '"app_settings"'::regclass
       ^
       :               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
       pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
       FROM pg_attribute a LEFT JOIN pg_attrdef d
       ON a.attrelid = d.adrelid AND a.attnum = d.adnum
       WHERE a.attrelid = '"app_settings"'::regclass
       AND a.attnum > 0 AND NOT a.attisdropped
       ORDER BY a.attnum
       I, [2016-05-04T15:36:03.132671 #1086]  INFO -- : Writing /tmp/build_4a7006eeb7404807023d62e92fb2b7a9/themarshallproject-klaxon-fe46e0c/public/assets/klaxon-logo-100px-c4c978b074949e6706cc0bc17d59e0260b7ee3c22c7249ae0cb743e7c2cd9501.png
       I, [2016-05-04T15:36:05.879619 #1086]  INFO -- : Writing /tmp/build_4a7006eeb7404807023d62e92fb2b7a9/themarshallproject-klaxon-fe46e0c/public/assets/application-ce4b02c94ce9d4bc18d4b7e67a7a9de3e1f5a04605e8ffd4d03320b35ced23a0.js
       I, [2016-05-04T15:36:05.879912 #1086]  INFO -- : Writing /tmp/build_4a7006eeb7404807023d62e92fb2b7a9/themarshallproject-klaxon-fe46e0c/public/assets/application-ce4b02c94ce9d4bc18d4b7e67a7a9de3e1f5a04605e8ffd4d03320b35ced23a0.js.gz
       I, [2016-05-04T15:36:10.176761 #1086]  INFO -- : Writing /tmp/build_4a7006eeb7404807023d62e92fb2b7a9/themarshallproject-klaxon-fe46e0c/public/assets/application-aedc3aa66c9871bda1986138e49412820f788b214b419e9cf495af594c3d6401.css
       I, [2016-05-04T15:36:10.176966 #1086]  INFO -- : Writing /tmp/build_4a7006eeb7404807023d62e92fb2b7a9/themarshallproject-klaxon-fe46e0c/public/assets/application-aedc3aa66c9871bda1986138e49412820f788b214b419e9cf495af594c3d6401.css.gz
       Asset precompilation completed (8.90s)
       Cleaning assets
       Running: rake assets:clean
       failed to set default_host, reason: PG::UndefinedTable: ERROR:  relation "app_settings" does not exist
       LINE 5:                WHERE a.attrelid = '"app_settings"'::regclass
       ^
       :               SELECT a.attname, format_type(a.atttypid, a.atttypmod),
       pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod
       FROM pg_attribute a LEFT JOIN pg_attrdef d
       ON a.attrelid = d.adrelid AND a.attnum = d.adnum
       WHERE a.attrelid = '"app_settings"'::regclass
       AND a.attnum > 0 AND NOT a.attisdropped
       ORDER BY a.attnum
-----> Discovering process types
       Procfile declares types     -> web
       Default types for buildpack -> console, rake, worker
-----> Compressing...
       Done: 33.9M
-----> Launching...
       Released v7
       https://klaxon-test.herokuapp.com/ deployed to Heroku

Klaxon sending repeated alerts for website that hasn't changed (gawker.com)

One of our reporters set up an alert for the main content section of gawker.com so they could get an alert if and when it was ever updated or altered. It's been two days since then and Klaxon has sent three alerts for changes, yet the substance of all the alerts don't seem to indicate anything has changed.

I attached screenshots of all the alerts in question (each one appears to be more or less the same), and of the alert itself in the Klaxon control panel.

klaxon1
klaxon2
klaxon3
screen shot 2016-09-09 at 3 56 10 pm

Configurable outbound email?

Not 100% sure if the bug is on my end or not — I was trying to set up Klaxon to use the Tribune's SES account, and it threw an unverified email address error.

[ActiveJob] [ActionMailer::DeliveryJob] [f9740ecf-5e14-477d-932d-418a9e4bfaa2] Performed ActionMailer::DeliveryJob from Inline(mailers) in 1621.73ms
Completed 500 Internal Server Error in 1694ms (ActiveRecord: 6.3ms)

Net::SMTPFatalError (554 Message rejected: Email address is not verified.
):
  app/controllers/sessions_controller.rb:11:in `create'

The error makes sense — our SES account hasn't verified [email protected], so it turns it away. Should it be possible to set the email address to use? Is that something else that can be passed into ActionMailer's config? I was able to get it to work by switching out that line with the email I originally tried.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.