Giter VIP home page Giter VIP logo

feeder's Introduction

Feeder

Feeder

A set of scripts to parse websites for updates which are displayed in a dashboard-fashion.

The feeds of the websites that the user finds interesting can be specified in sites.yaml. The given sites are scraped by the given parser for updates. After all sites are scraped, a json-file is produced with all aggregated data. The feeder.html will display a dashboard in HTML of the results.

Running

Simply update sites.yaml with the websites and data you find interesting. To scrape, run the scrape.sh script, and then open feeder.html in your browser.

In order to not have git bug you about sites.yaml being changed, you can run git update-index --assume-unchanged sites.yaml. To have git track the file again, run git update-index --no-assume-unchanged sites.yaml.

Dashboard usage

In the dashboard, one can mark entries as "seen". By doing so, one can visually keep track of which items have been read, and any new items will be clearly visible.

All feeds will be shown in the dashboard, divided into two sections: those with updates and those without (i.e. when all the feed items have been marked as "seen").

To mark an item as read, hover over the user picture with the mouse, which will make an eye icon appear. Click it to mark that entry as "seen". To mark all items as "seen", click on the icon for the whole feed.

When a user marks items as "seen", a Save-button will appear. Save the corresponding file as seenentries.json next to the feeder.html file.

Note that if a feed item which was previously marked as "seen" is updated, it will no longer retain the marking.

Parsers

There are a number of parsers that are supported:

  • rss - Parser for the very common RSS feed format, used all over the Internet.
  • atom - Parser for the very common Atom feed format, used all over the Internet.
  • github - Can parse GitHub issues and PRs, new releases and new commits to a repository. The GitHub API has a rate limit of 60 requests per hour. If set, the parser will read the GITHUB_ACCESS_TOKEN environment variable and use in the requests. So if you need more than 60 requests per hour, create a GitHub Access Token and specify it in the environment GITHUB_ACCESS_TOKEN, whereupon GitHub will increase the rate limit to 5000 requests per hour.
  • gitlab - Parser for GitLab issues
  • youtrack - Parser for issues on YouTrack
  • wardruna - Custom parser for the site wardruna.com
  • shadowflare - Custom parser for the site sfsrealm.hopto.org
  • stormcoast-fortress - Custom parser for the site stormcoast-fortress.net

Yaml configuration

When specifying a website feed in sites.yaml, the following attributes are recognized:

  • name: Required. Name of site to parse.
  • url: Required. URL to parse from.
  • parser: Required. Which parser to use.
  • icon: Optional. An icon to display next to the name of the site.
  • displayUrl: Optional. The name of the site will be clickable in the dashboard. If the url parameter is not suitable as link (e.g. if it leads to a RSS-feed file), then a URL to be used instead can be specified here.
  • description: Optional. A description of the site to be displayed in the dashboard.
  • insertValues: Optional. A list of hard-coded keys and values which will be inserted into each feed entry. Can be used e.g. if the author information or picture is missing.
  • filters: Optional. A list of search filters which must be present for the feed entry to be considered. The title and entry text is combined and searched. The filters can be specified as regex.

Below is example configuration for the website Phoronix. Since the url attribute points to an RSS-feed, the displayUrl attribute is given as well, so that the dashboard has a link to the main site. Since (almost) all articles are written by the same author and that information is not encoded into the RSS-feed, the name and picture of the author is specified through the insertValues. Finally, there is a filter which specifies that the only articles to keep are those where the text or title contains "wayland" or "arch".

  - name: Phoronix
    url: "https://www.phoronix.com/rss.php"
    displayUrl: "https://www.phoronix.com"
    icon: "https://www.phoronix.com/android-chrome-192x192.png"
    parser: rss
    insertValues:
      - user: Michael Larabel
      - userPicture: "https://www.phoronix.com/assets/categories/michaellarabel.webp"
    filters:
      - filter: (wayland|arch)

Filters

Filters can be used to remove articles from the list. The title and text of the articles will be searched.

Simple case

The simplest case is to just give a keyword that must exist in the title or text of the article.

In the following example, only articles containing "firefox" will be kept:

filters:
  - filter: firefox

Regular expressions

Filters support regular expressions. This allows to create powerful matching criteria.

In the following example, only articles containing "wayland" or "arch" will be kept:

filters:
  - filter: (wayland|arch)

Boolean logic

Filters also support boolean logic, i.e. and, or and not. A filter consists of an arbitrary number of sections, each one level more nested than the previous. Each section must contain at most one filter key, and the rest of the keys on that section should be and or or.

In the following exmple, only articles containing "wayland" or "firefox" or "arch" will be kept. However, if it does contain "arch", it must not contain "march", "architect" or "research".

filters:
  - filter: wayland
  - or:
    - filter: firefox
  - or:
    - filter: arch
    - and:
      - not:
        - filter: march
        - or:
          - filter: architect
        - or:
          - filter: research

Requirements

The scripts make use of the following programs:

feeder's People

Contributors

sjoblomj avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.