ruby / rss Goto Github PK
View Code? Open in Web Editor NEWRSS reading and writing
License: BSD 2-Clause "Simplified" License
RSS reading and writing
License: BSD 2-Clause "Simplified" License
RSS::Parser.parse(open('foo.xml'))
on a feed that lacks the elements marked NULL
gives:
rss-0.2.9/lib/rss/rss.rb:1180:in 'block in _validate': tag <title> is missing in tag <image> (RSS::MissingTagError)
Is there any way to recover from / ignore this? Or must one switch to XML parsing with Nokogiri?
<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
<channel>
<link>NULL</link>
<title></title>
<description></description>
<itunes:owner>
<itunes:name>NULL</itunes:name>
<itunes:email></itunes:email>
</itunes:owner>
<image>
<link>NULL</link>
<title></title>
<url></url>
</image>
</channel>
</rss>
I have been digging through the source code and can't seem to find any code that supports an itunes:image tag for the Items::Item
class.
I never run into many opportunities for contributing to source code that I use, but I'd be happy to make a pull request if it would be well received. That being said, I don't write code nearly as well as you folks do, so there would be some hand-holding involved if I contributed.
itunes:image tags are now a recommended tag for items in Podcast feeds:
https://help.apple.com/itc/podcasts_connect/#/itcb54353390
Thanks for your time.
Edit: In case it is already in the code, I am getting an error attempting to use it:
undefined method `itunes_image' for #<RSS::Maker::RSS20::Items::Item:0x00007fefe954f138> Did you mean? itunes_email itunes_name
Here's my code:
item.itunes_image.href = feed_image # feed_image is defined elsewhere
I also tried:
item.itunes_image = feed_image
I tried using RSS versions 0.9 and 2.0
Final edit:
I managed to get itunes:image working for items on a fork of this repository, but I couldn't figure out how to make a pull request (maybe I have to be added as a contributor?). If you'd like to see what I did, checkout https://github.com/ianrandmckenzie/rss — All I really did was copy the itunes:image channel code and pasted it in the appropriate sections for item. If you'd like me to make a pull request, I'll add tests before I do (that's the only part I haven't done so far).
The example docs of the podcast from itunes returns an integer value:
https://help.apple.com/itc/podcasts_connect/#/itcbaf351599
When trying to parse a file with that format (anchor.fm uses the integer value) you receive an error:
See:
I uploaded the sample file from Apple here:
https://gist.github.com/jmsalcido/30d76adba744e4e445331a6710136476
Still their own file is invalid... so, not possible to reproduce it with it.
Anyway I have a rss feed from Anchor.fm, and received this error while parsing it on my rails app:
RSS::NotAvailableValueError: value <1947> of tag <duration> is not available.
from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/itunes.rb:345:in `rescue in content='
Caused by ArgumentError: must be one of HH:MM:SS, H:MM:SS, MM::SS, M:SS: "1947"
from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/itunes.rb:282:in `parse'
[7] pry(main)>
If you want to replicate it, you can use my RSS feed from anchor.fm:
https://anchor.fm/s/a6578b8/podcast/rss
irb(main):005:0> open(url) {|rss| RSS::Parser.parse(rss) }
Traceback (most recent call last):
16: from (irb):5
15: from (irb):5:in `rescue in irb_binding'
14: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/open-uri.rb:35:in `open'
13: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/open-uri.rb:736:in `open'
12: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/open-uri.rb:169:in `open_uri'
11: from (irb):5:in `block in irb_binding'
10: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/parser.rb:88:in `parse'
9: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/forwardable.rb:230:in `parse'
8: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/parser.rb:183:in `parse'
7: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/rexmlparser.rb:18:in `_parse'
6: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/rexml-3.2.4/lib/rexml/document.rb:242:in `parse_stream'
5: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/rexml-3.2.4/lib/rexml/parsers/streamparser.rb:36:in `parse'
4: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/parser.rb:377:in `tag_end'
3: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/parser.rb:550:in `block in setup_next_element'
2: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/itunes.rb:342:in `content='
1: from /home/jmsalcido/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rss/itunes.rb:345:in `rescue in content='
RSS::NotAvailableValueError (value <1327> of tag <duration> is not available.)
Same error while using irb.
I am also running into the issue brought up in #48. Speciically, I am working with a partner whose rss feed contains
<item>
[...]
<itunes:episodeType>full</itunes:episodeType>
[...]
</item>
This causes parse validation to fail. @kou has advised to disable validation in this case; is this possible to do on just this attribute, or is validation all-or-nothing? Suggesting clients disable all validation because of a common casing issue present in RSS feeds of well-known publishers is a bit impractical and opens the door to many validation bugs.
My ask is that:
RSS::Parser.parse(content, validate: false)
)Most academic publishers seem to use a standard software to generate RSS (e.g.: SAGE, Academy of Management). When trying to parse these feeds I get the following error:
RSS::MissingAttributeError: attribute <rdf:about> is missing in tag <channel>
from /usr/local/Cellar/ruby/2.6.5/lib/ruby/2.6.0/rss/parser.rb:521:in `block in collect_attributes'
Disabling validation, though, works. I doubt that all these feeds are corrupt. Or is there a specific spec they follow that the current gem does not support?
Also, almost all academic publishers use PRISM as well as the default DublinCore to include metadata in RSS entries. Is it possible to implement that into the current gem? Or better than that, is there a way to allow the gem to read those nodes as generic nodes (that it won't try to validate), just like feedparser in Python?
It looks like the code for DublinCore model can be used as a template for that. But PRISM also includes Reference nodes on top of Text and Date. Is there another NS implementation I can use as template for those nodes?
I'm trying to set the content of an item:
maker.items.new_item do |item|
item.content = '...'
end
But that doesn't work. I see there's a content_encoded=
method, but that doesn't result in anything additional being rendered in the XML.
What I need is the ability to set the content as seen in this example GitHub rss feed under feed -> entry -> content.
Example XML: https://alerts.weather.gov/cap/us.php?x=0
More about "cap" namespace: http://docs.oasis-open.org/emergency-adopt/cap-feeds/v1.0/cn02/cap-feeds-v1.0-cn02.html#_Toc382489980
Is there a way to get these elements with this gem?
I have been trying to parse media content and don't seem to be able to capture it. I can get the basics but trying to get something like the url of the image below throws an error.
<media:content url="https://static.independent.co.uk/s3fs-public/thumbnails/image/2020/03/10/14/payday-loans-5.jpg" type="image/jpeg" medium="image"/>
Example of code I'm trying.
require 'rss'
rss = RSS::Parser.parse('http://www.independent.co.uk/rss', false)
rss.items.each do |item|
puts "#{item.title}"
puts "#{item.media_content.url}"
end
Is it possible to add namespace content such as snf:logo
?
RSS::Maker.make('2.0') do |maker|
maker.channel.title = 'Hello World!'
# <snf:logo>
# <url>http://times.smartnews.co.jp/snlogo.png</url>
# </snf:logo>
end
SmartFormat仕様書(RSS2.0準拠)
https://publishers.smartnews.com/hc/ja/articles/360010977813-SmartFormat%E4%BB%95%E6%A7%98%E6%9B%B8-RSS2-0%E6%BA%96%E6%8B%A0-#channel
I appreciate any help you can provide.
I have the following code
MINDSCAPE_FEED='https://rss.art19.com/sean-carrolls-mindscape'
URI.open(MINDSCAPE_FEED) do |rss|
feed = RSS::Parser.parse(rss, ignore_unknown_element=true)
puts "Title: #{feed.channel.title}"
feed.items.each do |item|
puts item.itunes_episode_type
end
end
Is there any way to access the itunes::episodeType attribute? More generically is there a way to access tags that the parser doesn't know about?
As defined here:
This is helpful when there're XML/HTML tags inside summary contents. But I did not find a way to achieve it. A method from StackOverflow doesn't work now.
Ruby version: ruby 2.7.0p0 (2019-12-25 revision ruby/ruby@647ee6f091)
Hello,
I'm trying to parse from a feed and this format isn't parsing correctly. Is there a way to get this to work?
<a10:author>
<a10:name>Some author</a10:name>
</a10:author>
I was attempting to parse an RSS doc with
<enc:enclosure resource="http://image_url" type="image/jpeg"/>
<item rdf:about="https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html">
<title><![CDATA[2003 ACURA CL TYPE-S (MCKINNEY) $850]]></title>
<link>https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html</link>
<description><![CDATA[SELLING MY BELOVED HONDA FOR PARTS OR PERSONAL PROJECT. ENGINE IN GREAT SHAPE HAS 165,000 ML MAINTATINED REALLY WELL, CLEAN TITLE. BRAND NEW FRONT SUSPENSION, GOOD BREMBOO BRAKES, TIRES IN GOOD SHAPE, EVERYTHING WORKS INSIDE CAR, FRONT SEAT BIT TORN. ...]]></description>
<dc:date>2020-04-16T10:51:41-05:00</dc:date>
<dc:language>en-us</dc:language>
<dc:rights>copyright 2020 craigslist</dc:rights>
<dc:source>https://dallas.craigslist.org/dal/cto/d/mckinney-2003-acura-cl-type/7109117586.html</dc:source>
<dc:title><![CDATA[2003 ACURA CL TYPE-S (MCKINNEY) $850]]></dc:title>
<dc:type>text</dc:type>
<enc:enclosure resource="https://images.craigslist.org/00h0h_fyqO7icY0vm_300x300.jpg" type="image/jpeg"/>
<dcterms:issued>2020-04-16T10:51:41-05:00</dcterms:issued>
</item>
Even with passing ignore_unknown_element
to the parse
method, I could not find these in the parsed results. Does this library not support these enclosures, and if not is there a plan/willingness to have it added? Enclosures seem to be a standard RSS feature: https://en.wikipedia.org/wiki/RSS_enclosure
RFC 5005 describes paging for (large) Atom and RSS feeds:
A paged feed is a set of linked feed documents that together contain the entries[...]
Reading the current documentation of ruby/rss
, it seems not supported currently. Are there plans to add it, or perhaps a willingness to consider a PR?
I'm trying to parse an RSS file that has a media thumbnail, but it looks like we don't have support for that yet.
<media:thumbnail url="http://www.foo.com/keyframe.jpg" width="75" height="50" time="12:05:01.123" />
RSS::Parser.parse('https://justthenews.com/syndication/willcountygazette.com/rss.xml')
Running a script through various podcasts' feeds brings up the RSS::NotAvailableValueError
for quite a few feeds, such as value <2238> of tag <duration> is not available
due to validating the time format at https://github.com/ruby/rss/blob/master/lib/rss/itunes.rb#L283.
Podcasts with duration values like 22:38 are accepted, which makes sense.
But it seems lots of the feeds (for podcasts that are listed in iTunes) have values like 2238.
Ex: https://feeds.megaphone.fm/HSW2732644812
I wonder if a time of 22:38 should be assumed from something like above, as it seems a few RSS creators are outputting times this way? Thanks!
Ruby version: 3.0.0
In the README example on how to read an RSS feed the open
method is used but it looks like it's trying to open a file
Errno::ENOENT (No such file or directory @ rb_sysopen - <rss-feed>)
Using instead:
URI.open(<rss-feed>)
gives the desired result
https://github.com/ruby/ruby/actions/runs/10211279098/job/28252311558#step:13:290
Error: test_parse(RSS::TestParser):
RSS::NotWellFormedError: This is not well formed XML
Malformed XML: Content at the start of the document (got '/tmp/rss10-20240802-16264-835wxo.rdf-garbage')
Line: 1
Position: 44
Last 80 unconsumed characters:
This test is not failing with my local environment.
When I validate an atom feed produced by this gem I get a warning:
Missing atom:link with rel="self"
https://validator.w3.org/feed/docs/warning/MissingSelf.html
I don't know if this is something which I can add using the gem's DSL or would it require a change in the source code?
Source:
RSS::Maker.make('atom') do |maker|
maker.channel.author = 'Swing Out London'
maker.channel.updated = Audit.maximum(:created_at).iso8601
maker.channel.id = 'https://www.swingoutlondon.com/'
maker.channel.link = 'https://www.swingoutlondon.com/audit_log.atom'
maker.channel.title = 'Swing Out London Audit Log'
maker.channel.description = 'Audit Log for Swing Out London'
audits.each do |audit|
editor = Editor.build(audit)
maker.items.new_item do |item|
item.link = audit_show_link(audit.auditable_type, audit.auditable_id)
item.title = audit_title(audit)
item.updated = audit.created_at.iso8601
item.author = editor.name
item.description = JSON.pretty_generate(audit.as_json)
end
end
end
Atom:
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<author>
<name>Swing Out London</name>
</author>
<id>https://www.swingoutlondon.com/</id>
<link href="https://www.swingoutlondon.com/audit_log.atom"/>
<subtitle>Audit Log for Swing Out London</subtitle>
<title>Swing Out London Audit Log</title>
<updated>2022-04-07T20:59:47Z</updated>
<entry>
<!-- ... -->
@hsbt I want to set a webhook for commit-email.info to this repository. Could you provide the "Admin" role for this repository to me? Or could you add the following webhook?
Hi! I've had a similar issue to #28 today with trying to set categories on an Rss 2.0 feed. I couldn't set categories=
, but I appeared to be able to do
item.categories << 'Digital specialists'
... but when the feed was generated, I'd get
NoMethodError:
undefined method `to_feed' for "Digital specialists":String`
Eventually, by some trial and error and a debugger, I found that what I seemingly need to do is
category = RSS::Maker::RSS20::Channel::Categories::Category.new(maker)
category.content = 'Digital specialists'
item.categories << category
Is there a shorter way of doing this, and is there a way in which I might be able to contribute to the docs to cover both this sort of example and the one from #28?
If the parser hits a date in ISO8601 format instead of RFC2822, it throws an error.
I understand that the RSS 2.0 spec says all date-times should conform to RFC2822, but it would be really nice if this gem could support ISO8601 as well, because Apple podcasts permits it. You can see this by parsing Apple's example of a well-formed podcast feed. I saved that example as example.xml to test, and this is the result:
gem 'rss', '0.2.8'
require 'rss'
ex = open('example.xml')
feed = RSS::Parser.parse(ex)
Result:
Traceback (most recent call last):
11: from /home/jon/.asdf/installs/ruby/2.5.1/bin/irb:11:in `<main>'
10: from (irb):5
9: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/parser.rb:88:in `parse'
8: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/parser.rb:183:in `parse'
7: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/rexmlparser.rb:18:in `_parse'
6: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/2.5.0/rexml/document.rb:242:in `parse_stream'
5: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/2.5.0/rexml/parsers/streamparser.rb:36:in `parse'
4: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/parser.rb:377:in `tag_end'
3: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/parser.rb:475:in `block in start_get_text_element'
2: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/2.0.rb:82:in `pubDate='
1: from /home/jon/.asdf/installs/ruby/2.5.1/lib/ruby/gems/2.5.0/gems/rss-0.2.8/lib/rss/2.0.rb:85:in `rescue in pubDate='
RSS::NotAvailableValueError (value <2019-05-16T07:00:00.000Z> of tag <pubDate> is not available.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.