Giter VIP home page Giter VIP logo

feedpls's Introduction

feedpls's People

Contributors

elliotwutingfeng avatar skofli avatar witjem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

feedpls's Issues

Added support parsing data from yoast

Context
Some website use yoast for SEO. This plugin on the HTML page has all needed information for feed.

<script type="application/ld+json" class="yoast-schema-graph">yoast json</script>

Proposal
Maybe do feed config like this?

- id: feed-id
  title: feed title
  description: feed desc
  url: http://example.com/news
  matchers:
    engine: xpath-yoast
    itemUrl:
      expr: //article//a/@href

Timezone for feed

Context
Some website shows time in one timezone.
For example: https://www.kyivpost.com/post/11824 shows in Europe/Kyiv timezone, but system parse time in UTC.

Proposal
Add tz field to publish section to feeds.yaml config.
Example:

published:
  expr: "//div[@class='post-info']/text()[2]"
  layout: "January 2, 2006, 3:04 pm"
  tz: "Europe/Kyiv"

Reminder
Should updates docs and examples too.

Write tests for examples

Context

Examples in the project can be broken for several reasons:

  • Project codebase changed, but the examples have not been updated.
  • Website of examples was changed, and therefore feeds.yml is outdated.

Propose

Added tests to GitHub pipeline which will be triggered by codebase changing or cron one a day.
Tests should run service from example and call endpoints from examples.

Support locale for date

Context
Some website shows time with locale format, for example: Π›ΡŽΡ‚ΠΈΠΉ 19, 2023, 12:19 in uk_UA locale

Proposal
Add locale field to publish section to feeds.yaml config.
Example:

published:
  expr: "//div[@class='post-info']/text()[2]"
  layout: "January 2, 2006, 3:04 pm"
  tz: "Europe/Kyiv"
  locale: "uk_UA"

For parse time can use https://github.com/goodsign/monday

Reminder
Should updates docs and examples too.

Support author for items

Context
RSS and Atom items has author field. It will be good if service will be support it.

Proposal
Add optional field author to feeds.yml config

matchers:
  engine: xpath
  author: # Optional props.
    name:
      expr: //meta[@name='author']/@content
    email:
      expr: //meta[@name='author']/@content

Should ignore author on the RSS if author name not found.

Reminder
Updates docs and examples too.

Remove pkg/errors

Context
pkg/errors has been archived, that's why better remove it from the project.

Proposal
Migrate pkg/errors to standard go errors

Add `replace` post processor mechanism for fields

Context
Some news website add some prefix - DNN News to title, like: Apple presented a new iPhone - DNN News

Proposal
Add replace postprocesor which will be replace content by regex.

- id: some-blog-open-source
  title: The Some Blog | Open Source
  description: Open Source Archives
  url: https://someblog.blog/category/open-source/
  matchers:
    itemUrl:
      selector: article a
      attr: href
    title:
      selector: title
    description:
      selector: meta[name='description']
      attr: content
    published:
      selector: meta[name='parsely-pub-date']
      attr: content
      layout: 2006-01-02T15:04:05Z07:00
 
  postprocessors:  // new field on yaml
    - replace:
        field: title // can be: title, description
        from: some_regex
        to: ""

You can always offer your option

Add XPath engine for parsing

Context
Service use goquery for select feed data from web page. Will be good have alternative XPath parsing mechanism.

Proposal
Add XPath engine to select data for feeds.
Libs:

feeds.yml:

- id: github-blog-open-source
  title: The GitHub Blog | Open Source
  description: Open Source Archives
  url: https://github.blog/category/open-source/
  matchers:
   
    # new props
    engine: xpath # engine name

    itemUrl:
      selector: article a
      attr: href
    title:
      selector: title
    description:
      selector: meta[name='description']
      attr: content
    published:
      selector: meta[name='parsely-pub-date']
      attr: content
      layout: 2006-01-02T15:04:05Z07:00

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.