Giter VIP home page Giter VIP logo

rss's Introduction

RSS

Really Simple Syndication (RSS) is a family of formats that describe feeds, specially constructed XML documents that allow an interested person to subscribe and receive updates from a particular web service. This portion of the standard library provides tooling to read and create these feeds.

The standard library supports RSS 0.91, 1.0, 2.0, and Atom, a related format. Here are some links to the standards documents for these formats:

Installation

Add this line to your application's Gemfile:

gem 'rss'

And then execute:

$ bundle

Or install it yourself as:

$ gem install rss

Usage

Consuming RSS

If you'd like to read someone's RSS feed with your Ruby code, you've come to the right place. It's really easy to do this, but we'll need the help of open-uri:

  require 'rss'
  require 'open-uri'

  url = 'https://www.ruby-lang.org/en/feeds/news.rss'
  URI.open(url) do |rss|
    feed = RSS::Parser.parse(rss)
    puts "Title: #{feed.channel.title}"
    feed.items.each do |item|
      puts "Item: #{item.title}"
    end
  end

As you can see, the workhorse is RSS::Parser#parse, which takes the source of the feed and a parameter that performs validation on the feed. We get back an object that has all of the data from our feed, accessible through methods. This example shows getting the title out of the channel element, and looping through the list of items.

Producing RSS

Producing our own RSS feeds is easy as well. Let's make a very basic feed:

  require "rss"

  rss = RSS::Maker.make("atom") do |maker|
    maker.channel.author = "matz"
    maker.channel.updated = Time.now.to_s
    maker.channel.about = "https://www.ruby-lang.org/en/feeds/news.rss"
    maker.channel.title = "Example Feed"

    maker.items.new_item do |item|
      item.link = "https://www.ruby-lang.org/en/news/2010/12/25/ruby-1-9-2-p136-is-released/"
      item.title = "Ruby 1.9.2-p136 is released"
      item.updated = Time.now.to_s
    end
  end

  puts rss

As you can see, this is a very Builder-like DSL. This code will spit out an Atom feed with one item. If we needed a second item, we'd make another block with maker.items.new_item and build a second one.

Development

After checking out the repo, run rake test to run the tests.

To install this gem onto your local machine, run rake install. To release a new version, update the version number in lib/rss/version.rb, and then run rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/ruby/rss.

License

The gem is available as open source under the terms of the BSD-2-Clause.

rss's People

Contributors

aitor avatar akr avatar babasbot avatar dependabot[bot] avatar drbrain avatar gildesmarais avatar hsbt avatar ianrandmckenzie avatar jeremyevans avatar k-tsj avatar ko1 avatar kou avatar m-nakamura145 avatar mame avatar marcandre avatar mrkn avatar nagachika avatar nobu avatar nurse avatar ozydingo avatar petergoldstein avatar saerdnaer avatar shugo avatar shyouhei avatar stomar avatar timofey282228 avatar ursm avatar znz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rss's Issues

Error parsing <itunes:duration> when value is an integer.

The example docs of the podcast from itunes returns an integer value:
https://help.apple.com/itc/podcasts_connect/#/itcbaf351599

When trying to parse a file with that format (anchor.fm uses the integer value) you receive an error:
See:

I uploaded the sample file from Apple here:
https://gist.github.com/jmsalcido/30d76adba744e4e445331a6710136476

Still their own file is invalid... so, not possible to reproduce it with it.

Anyway I have a rss feed from Anchor.fm, and received this error while parsing it on my rails app:

RSS::NotAvailableValueError: value <1947> of tag <duration> is not available.
from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/itunes.rb:345:in `rescue in content='
Caused by ArgumentError: must be one of HH:MM:SS, H:MM:SS, MM::SS, M:SS: "1947"
from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/itunes.rb:282:in `parse'
[7] pry(main)>

If you want to replicate it, you can use my RSS feed from anchor.fm:
https://anchor.fm/s/a6578b8/podcast/rss

irb(main):005:0> open(url) {|rss| RSS::Parser.parse(rss) }
Traceback (most recent call last):
       16: from (irb):5
       15: from (irb):5:in `rescue in irb_binding'
       14: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/open-uri.rb:35:in `open'
       13: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/open-uri.rb:736:in `open'
       12: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/open-uri.rb:169:in `open_uri'
       11: from (irb):5:in `block in irb_binding'
       10: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/parser.rb:88:in `parse'
        9: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/forwardable.rb:230:in `parse'
        8: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/parser.rb:183:in `parse'
        7: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/rexmlparser.rb:18:in `_parse'
        6: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/rexml-3.2.4/lib/rexml/document.rb:242:in `parse_stream'
        5: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/rexml-3.2.4/lib/rexml/parsers/streamparser.rb:36:in `parse'
        4: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/parser.rb:377:in `tag_end'
        3: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/parser.rb:550:in `block in setup_next_element'
        2: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/itunes.rb:342:in `content='
        1: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/itunes.rb:345:in `rescue in content='
RSS::NotAvailableValueError (value <1327> of tag <duration> is not available.)

Same error while using irb.

Trouble setting categories for RSS 2.0 feed

Hi! I've had a similar issue to #28 today with trying to set categories on an Rss 2.0 feed. I couldn't set categories=, but I appeared to be able to do

  item.categories << 'Digital specialists'

... but when the feed was generated, I'd get

  NoMethodError:
       undefined method `to_feed' for "Digital specialists":String`

Eventually, by some trial and error and a debugger, I found that what I seemingly need to do is

  category = RSS::Maker::RSS20::Channel::Categories::Category.new(maker)
  category.content = 'Digital specialists'
  item.categories << category

Is there a shorter way of doing this, and is there a way in which I might be able to contribute to the docs to cover both this sort of example and the one from #28?

Handling missing tags when parsing feeds

RSS::Parser.parse(open('foo.xml')) on a feed that lacks the elements marked NULL gives:

rss-0.2.9/lib/rss/rss.rb:1180:in 'block in _validate': tag <title> is missing in tag <image> (RSS::MissingTagError)

Is there any way to recover from / ignore this? Or must one switch to XML parsing with Nokogiri?

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
  <channel>
    <link>NULL</link>
    <title></title>
    <description></description>
    <itunes:owner>
      <itunes:name>NULL</itunes:name>
      <itunes:email></itunes:email>
    </itunes:owner>
    <image>
      <link>NULL</link>
      <title></title>
      <url></url>
    </image>
  </channel>
</rss>

Cant parse media content

I have been trying to parse media content and don't seem to be able to capture it. I can get the basics but trying to get something like the url of the image below throws an error.

<media:content url="https://static.independent.co.uk/s3fs-public/thumbnails/image/2020/03/10/14/payday-loans-5.jpg" type="image/jpeg" medium="image"/>

Example of code I'm trying.

require 'rss'

rss = RSS::Parser.parse('http://www.independent.co.uk/rss', false)

rss.items.each do |item|
  puts "#{item.title}"
  puts "#{item.media_content.url}"

end

How to set content?

I'm trying to set the content of an item:

maker.items.new_item do |item|
  item.content = '...'
end

But that doesn't work. I see there's a content_encoded= method, but that doesn't result in anything additional being rendered in the XML.

What I need is the ability to set the content as seen in this example GitHub rss feed under feed -> entry -> content.

Case sensitivity requires all-or-nothing validation

I am also running into the issue brought up in #48. Speciically, I am working with a partner whose rss feed contains

    <item>
      [...]
      <itunes:episodeType>full</itunes:episodeType>
      [...]
    </item>

This causes parse validation to fail. @kou has advised to disable validation in this case; is this possible to do on just this attribute, or is validation all-or-nothing? Suggesting clients disable all validation because of a common casing issue present in RSS feeds of well-known publishers is a bit impractical and opens the door to many validation bugs.

My ask is that:

  • How to disable validation is documented (please forgive me if it is, but I was unable to find it and simply looked in the method signatures and have landed on RSS::Parser.parse(content, validate: false))
  • Discuss the approach disabling validation of specific attributes vs global validation
  • Reconsider if casing should be considered a validation error. As @liberlanco points out, existing validators do not seem to consider this a validation error, and it is a common occurrence in publishers' feeds. I am also unable to find a technical document stating that this should be a case sensitive value; I would happily take a link to a document that does as ammo to push back on the publisher.

Podcasts - Many <duration> values are not accepted

Running a script through various podcasts' feeds brings up the RSS::NotAvailableValueError for quite a few feeds, such as value <2238> of tag <duration> is not available due to validating the time format at https://github.com/ruby/rss/blob/master/lib/rss/itunes.rb#L283.

Podcasts with duration values like 22:38 are accepted, which makes sense.

But it seems lots of the feeds (for podcasts that are listed in iTunes) have values like 2238.

Ex: https://feeds.megaphone.fm/HSW2732644812

I wonder if a time of 22:38 should be assumed from something like above, as it seems a few RSS creators are outputting times this way? Thanks!

Is there any way to access unparsed tags?

I have the following code

MINDSCAPE_FEED='https://rss.art19.com/sean-carrolls-mindscape'
URI.open(MINDSCAPE_FEED) do |rss|

  feed = RSS::Parser.parse(rss, ignore_unknown_element=true)

  puts "Title: #{feed.channel.title}"
  feed.items.each do |item|
    puts item.itunes_episode_type
  end
end

Is there any way to access the itunes::episodeType attribute? More generically is there a way to access tags that the parser doesn't know about?

Parsing enc:enclosure

I was attempting to parse an RSS doc with

<enc:enclosure resource="http://image_url" type="image/jpeg"/>
<item rdf:about="https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html">
<title><![CDATA[2003 ACURA CL TYPE-S (MCKINNEY) &#x0024;850]]></title>
<link>https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html</link>
<description><![CDATA[SELLING MY BELOVED HONDA FOR PARTS OR PERSONAL PROJECT. ENGINE IN GREAT SHAPE HAS 165,000 ML MAINTATINED REALLY WELL, CLEAN TITLE. BRAND NEW FRONT SUSPENSION, GOOD BREMBOO BRAKES, TIRES IN GOOD SHAPE, EVERYTHING WORKS INSIDE CAR, FRONT SEAT BIT TORN. ...]]></description>
<dc:date>2020-04-16T10:51:41-05:00</dc:date>
<dc:language>en-us</dc:language>
<dc:rights>copyright 2020 craigslist</dc:rights>
<dc:source>https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html</dc:source>
<dc:title><![CDATA[2003 ACURA CL TYPE-S (MCKINNEY) &#x0024;850]]></dc:title>
<dc:type>text</dc:type>
<enc:enclosure resource="https://images.craigslist.org/00h0h_fyqO7icY0vm_300x300.jpg" type="image/jpeg"/>
<dcterms:issued>2020-04-16T10:51:41-05:00</dcterms:issued>
</item>

Even with passing ignore_unknown_element to the parse method, I could not find these in the parsed results. Does this library not support these enclosures, and if not is there a plan/willingness to have it added? Enclosures seem to be a standard RSS feature: https://en.wikipedia.org/wiki/RSS_enclosure

Paged feeds (RFC 5005)?

RFC 5005 describes paging for (large) Atom and RSS feeds:

A paged feed is a set of linked feed documents that together contain the entries[...]

Reading the current documentation of ruby/rss, it seems not supported currently. Are there plans to add it, or perhaps a willingness to consider a PR?

Issues parsing RSS from academic publishers

Most academic publishers seem to use a standard software to generate RSS (e.g.: SAGE, Academy of Management). When trying to parse these feeds I get the following error:

RSS::MissingAttributeError: attribute <rdf:about> is missing in tag <channel>
from /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/rss/parser.rb:521:in `block in collect_attributes'

Disabling validation, though, works. I doubt that all these feeds are corrupt. Or is there a specific spec they follow that the current gem does not support?

Also, almost all academic publishers use PRISM as well as the default DublinCore to include metadata in RSS entries. Is it possible to implement that into the current gem? Or better than that, is there a way to allow the gem to read those nodes as generic nodes (that it won't try to validate), just like feedparser in Python?

It looks like the code for DublinCore model can be used as a template for that. But PRISM also includes Reference nodes on top of Text and Date. Is there another NS implementation I can use as template for those nodes?

Missing rel=self link?

When I validate an atom feed produced by this gem I get a warning:

Missing atom:link with rel="self"

https://validator.w3.org/feed/docs/warning/MissingSelf.html

I don't know if this is something which I can add using the gem's DSL or would it require a change in the source code?

Source:

    RSS::Maker.make('atom') do |maker|
      maker.channel.author = 'Swing Out London'
      maker.channel.updated = Audit.maximum(:created_at).iso8601
      maker.channel.id = 'https://www.swingoutlondon.com/'
      maker.channel.link = 'https://www.swingoutlondon.com/audit_log.atom'
      maker.channel.title = 'Swing Out London Audit Log'
      maker.channel.description = 'Audit Log for Swing Out London'

      audits.each do |audit|
        editor = Editor.build(audit)
        maker.items.new_item do |item|
          item.link = audit_show_link(audit.auditable_type, audit.auditable_id)
          item.title = audit_title(audit)
          item.updated = audit.created_at.iso8601
          item.author = editor.name
          item.description = JSON.pretty_generate(audit.as_json)
        end
      end
    end

Atom:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <author>
    <name>Swing Out London</name>
  </author>
  <id>https://www.swingoutlondon.com/</id>
  <link href="https://www.swingoutlondon.com/audit_log.atom"/>
  <subtitle>Audit Log for Swing Out London</subtitle>
  <title>Swing Out London Audit Log</title>
  <updated>2022-04-07T20:59:47Z</updated>
  <entry>
   <!-- ... -->

readme reading example

Ruby version: 3.0.0

In the README example on how to read an RSS feed the open method is used but it looks like it's trying to open a file

Errno::ENOENT (No such file or directory @ rb_sysopen - <rss-feed>)

Using instead:

URI.open(<rss-feed>)

gives the desired result

itunes:image for Items::Item

I have been digging through the source code and can't seem to find any code that supports an itunes:image tag for the Items::Item class.

I never run into many opportunities for contributing to source code that I use, but I'd be happy to make a pull request if it would be well received. That being said, I don't write code nearly as well as you folks do, so there would be some hand-holding involved if I contributed.

itunes:image tags are now a recommended tag for items in Podcast feeds:
https://help.apple.com/itc/podcasts_connect/#/itcb54353390

Thanks for your time.

Edit: In case it is already in the code, I am getting an error attempting to use it:

undefined method `itunes_image' for #<RSS::Maker::RSS20::Items::Item:0x00007fefe954f138> Did you mean? itunes_email itunes_name

Here's my code:

item.itunes_image.href = feed_image # feed_image is defined elsewhere

I also tried:

item.itunes_image = feed_image

I tried using RSS versions 0.9 and 2.0

Final edit:
I managed to get itunes:image working for items on a fork of this repository, but I couldn't figure out how to make a pull request (maybe I have to be added as a contributor?). If you'd like to see what I did, checkout https://github.com/ianrandmckenzie/rss — All I really did was copy the itunes:image channel code and pasted it in the appropriate sections for item. If you'd like me to make a pull request, I'll add tests before I do (that's the only part I haven't done so far).

Parsing the <pubDate> tag fails if it is in ISO8601 format

If the parser hits a date in ISO8601 format instead of RFC2822, it throws an error.

I understand that the RSS 2.0 spec says all date-times should conform to RFC2822, but it would be really nice if this gem could support ISO8601 as well, because Apple podcasts permits it. You can see this by parsing Apple's example of a well-formed podcast feed. I saved that example as example.xml to test, and this is the result:

gem 'rss', '0.2.8'
require 'rss'
ex = open('example.xml')
feed = RSS::Parser.parse(ex)

Result:

Traceback (most recent call last):
       11: from /home/jon/.asdf/installs/ruby/2.5.1/bin/irb:11:in `<main>'
       10: from (irb):5
        9: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/parser.rb:88:in `parse'
        8: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/parser.rb:183:in `parse'
        7: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/rexmlparser.rb:18:in `_parse'
        6: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/2.5.0/rexml/document.rb:242:in `parse_stream'
        5: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/2.5.0/rexml/parsers/streamparser.rb:36:in `parse'
        4: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/parser.rb:377:in `tag_end'
        3: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/parser.rb:475:in `block in start_get_text_element'
        2: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/2.0.rb:82:in `pubDate='
        1: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/2.0.rb:85:in `rescue in pubDate='
RSS::NotAvailableValueError (value <2019-05-16T07:00:00.000Z> of tag <pubDate> is not available.)

can't parse media:thumbnail

I'm trying to parse an RSS file that has a media thumbnail, but it looks like we don't have support for that yet.

<media:thumbnail url="http://www.foo.com/keyframe.jpg" width="75" height="50" time="12:05:01.123" />

RSS::Parser.parse('https://justthenews.com/syndication/willcountygazette.com/rss.xml')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.