Giter VIP home page Giter VIP logo

rss's People

Contributors

alexanderthaller avatar elimisteve avatar hippasus avatar jonbuffington avatar kyleterry avatar linuxsuren avatar matt3o12 avatar mhaligowski avatar powerivq avatar raphting avatar rbatukaev avatar runtakun avatar shebaw avatar shihanng avatar slomek avatar slymarbo avatar tsudoko avatar tvainika avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rss's Issues

Item.Date issue with timezone (UTC vs PDT)

In an Item for the feed that I'm pulling from...

<pubDate>Tue, 19 Jul 2016 13:14:13 PDT</pubDate>

On my local laptop with local time set to PDT, I don't have a problem:
2016-07-19 13:14:13 -0700 PDT

On my server with local time set to UTC, I have this odd timezone problem:
2016-07-19 13:14:13 +0000 PDT

support more charset

currently, ISO88591 is supported, do you know how to extend CharsetReader to other charsets? For example 'GBK'.

CDATA tags inside content not parsed

package main

import (
	"github.com/SlyMarbo/rss"
)

func main() {
	feed, err := rss.Fetch("http://www.ruanyifeng.com/blog/atom.xml")
	if err != nil {
		// handle error.
	}

	// ... Some time later ...

	err = feed.Update()
	if err != nil {
		// handle error.
	}
}

image

Related issue on other library: mmcdole/gofeed#98

What happened to rss.CacheParsedItemIDs

My code no longer works. I assume this was changed recently. How do we reproduce this behaviour in the new versions?

See: https://gowalker.org/github.com/SlyMarbo/rss#CacheParsedItemIDs

I only remember I had to use this function to work around a bug I was encountering. I can't remember why exactly I was using it, it was a while ago now. It was likely to do with either running out of memory in a long running process, or to seeing updates to feeds coming through.

Is a RSS channel with no feeds faulty?

Consider a RSS feed with no feeds:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Willkommen auf unserem Blog</title>
    <link>https://www.bitkom.org//bitkom/org/Presse/Blog/index.jsp</link>
    <description>Der RSS Feed mit den aktuellsten Blogbeiträgen</description>
    <language>de</language>
  </channel>
</rss>

For such a feed parseRSS2 returns an error:

rss/rss_2.0.go

Line 65 in 6288663

return nil, fmt.Errorf("no feeds found in %q", string(data))

But such a feed is valid.

Wouldn't it more consistent to not return an error but a Feed with no items?

Support HTTP Basic Authentication

I suggest to add support for HTTP Basic Authentication. This would allow access to password protected feeds. net/http.Request provides SetBasicAuth which could be used.

Do you have plans and/or time to implement this? If not, I'd try to add this and then submit a PR.

Support media

Hi, I was wondering about supporting media content like --are there any plans to add such support?

Supoort HTTP Conditional GET

What do you think about supporting HTTP Conditional GET? As far as I understand the current code, the HTTP GET requests made don't use the If-None-Match or If-Modified-Since HTTP headers. This results in the full feed XML always being downloaded.

Since many servers support HTTP Conditional GET, it would be a way to reduce network load to not only pass the URL but also values for these headers. rss.Fetch would have to return the values of the ETag and Last-Modified headers, if present on the response. The code calling rss.Fetch could then remember the values of these headers and pass them in later requests.

What do you think?

10 minute refresh

The docs say that 10 minutes is the default refresh period, and for Atom feeds it looks like the only possible refresh period (because Atom doens't have a ttl element).

This is very aggressive -- if your library is used in a popular application and it polls a popular feed (especially an Atom feed), it would create a lot of load and traffic.

Can I suggest that the default be changed to something like 12 or 24 hours? That fits the use case for most feeds much better.

Also, the feed response can have an explicit freshness lifetime -- if it's there, it'd be good to use it. E.g., if the response says Cache-Control: max-age=3600, there's no reason to poll the feed for the next hour (taking into account the Age header, in case it was cached upstream).

Architecture help?

How are you building RSS reader? For education purposes, I want to build one myself, but has no idea how to build one. Are there any resources that you could share to help.

Thanks a lot

  • Harit

Parse metadata element: author

https://indieweb.org/payment#Implementations

Would also parse atom:link (s) within items

  <author>
  	<name>Mark Pilgrim</name>
  	<email>[email protected]</email>
  	<uri>https://mysite.com</uri>
  	<atom:link rel="payment" type="application/bitcoin-paymentrequest" href="bitcoin:abc7askjdfg"/>
  </author>
<entry>
<id>abc</id>
<title>Iabc</title>
<link href="https://siasky.net/CADcPfMxnOgtgwllK9-kp12sIy9L8De7br9nvNFcslCKRg" rel="alternate"/>
<summary>
The story of abc
</summary>
<atom:link rel="payment" type="application/bitcoin-paymentrequest" href="bitcoin:abc7askjdfg"/>
</entry>

Throw Error when fetch wasnt called because of RefreshTime?

Hey there,

is it a good idea (or is it even possible without breaking the program because of error catching) to return an error when rss.Fetch() or feed.Update() failed because of the feed.Refresh time?

Its kinda confusing to receive a complete empty list after recalling the function without an error.

Maximum number of posts in feed struct?

Right now I'm testing with an RSS feed which returns 100 posts. If the feed updates and the update() function is called then new posts are appended to the Feed struct resulting in a huge struct over time. How can I make sure it never contains more than 100 posts?

How to mark item as read?

I'm parsing an RSS feed for URL's, then doing stuff with the URL later on. Let's say the feed updates every 5 minutes and I'm parsing it every 30 mins. How would I make sure it doesn't parse the previously read items again?

package main

import (
	"fmt"
	"github.com/SlyMarbo/rss"
)

func main() {
	url := "http://URL"
	feed, _ := rss.Fetch(url)
	fmt.Printf("Sent fetch for %s\n", url)
	fmt.Printf("There are %d items in %s\n\n", len(feed.Items), url)
	for key, value := range feed.Items {
		fmt.Println(key, value.Link)
	}

}

Not returning items for FeedBurner

Example
http://feeds.feedburner.com/ImgurGallery?format=xml

I also created a custom FeedBurner for this same feed and attempted to convert it into several different formats.

I am running inside of Google App engine and have substituted http.Client with urlfetch.Client in rss.go. There have been a handfull of other feeds that I have experienced trouble with, but so far your library has worked great for 95% of everything I've hit so far. Great Work!

CachedParsedItemIDs breaking change

So my code failed when using your package because you removed the cache function, and can you specify what is the default behavior now? it caches items by default or not? Thanks.

escaped HTML within XML causes feed parse failure

hi! i wrote https://vore.website, which uses this library internally to fetch rss/atom feeds.

i ran into an issue recently where certain feeds containing escaped HTML causes the following failure: panic: XML syntax error on line 4: invalid character entity &ndash;

here's a minimal reproducible example:

package main

import (
	"github.com/SlyMarbo/rss"
)

func main() {
	_, err := rss.Fetch("https://trash.j3s.sh/bad-feed.xml")
	if err != nil {
		panic(err)
	}
}

note that this is triggered by the following XML:

    <title>&ndash; feed with html escaped stuff</title>

i'm wondering if it might make sense to unescape the HTML prior to processing to avoid this? unfortunately i don't think that i can do that kind of pre-processing using FetchByFunc, because i need to modify the returned Body.

Edit UserAgent

As a feature request I'd like to see the ability to edit the user agent. A few sites block requests from the default golang user-agent.

Support JSON Feed

JSON Feed is becoming a more popular format for feed publication. It has similarities to RSS/Atom (see https://jsonfeed.org/mappingrssandatom for more details) and it would be really useful if this library supported it too. The v1 spec is available at https://jsonfeed.org/version/1 and looks relatively straightforward with support for top-level metadata, items, and enclosures (called "attachments").

If you agree, I can try and put together a PR adding basic support for it!

Tests fail in 1.0.3

phase `build' succeeded after 1.0 seconds
starting phase `check'
--- FAIL: TestParseItemDateOK (0.00s)
    rss_2.0_test.go:84: testdata/rss_2.0: got "2009-09-06 16:45:00 +0000 UTC", want "2009-09-06 16:45:00 +0000 +0000"
    rss_2.0_test.go:84: testdata/rss_2.0_content_encoded: got "2009-09-06 16:45:00 +0000 UTC", want "2009-09-06 16:45:00 +0000 +0000"
    rss_2.0_test.go:84: testdata/rss_2.0_enclosure: got "2009-09-06 16:45:00 +0000 UTC", want "2009-09-06 16:45:00 +0000 +0000"
FAIL
FAIL	github.com/SlyMarbo/rss	0.044s
FAIL

Attempting to fetch certain feed blocks indefinitely

hi! i recently attempted to fetch a certain feed, and slymarbo/rss blocks indefinitely - here's a reproducible example:

package main

import (
	"github.com/SlyMarbo/rss"
	"log"
)

func main() {
	feed, err := rss.Fetch("https://www.idealista.pt/news/rss/v2/latest-news.xml")
	if err != nil {
		log.Println(err)
	}

	err = feed.Update()
	if err != nil {
		log.Println(err)
	}
}

i've attached the XML of that feed as it exists today, in case the example i've provided stops reproducing:
bad-xml.txt

Support for non standard date format

While reading:

http://rss.wn.com/English/top-stories

The following error is returned:

parsing time "Mon, 29 Aug 2016 02:52 GMT" as
"2006-01-02T15:04:05.999999999Z07:00": cannot parse
"Mon, 29 Aug 2016 02:52 GMT" as "2006"

PS: I know providers should use the standard date format, but sometimes this is out of our control.

Support feed generating

github.com/gorilla/feeds is no longer active, it'd be great if this library could support feed generating.

func (f *Feed) WriteAtom(w io.Writer) error

func (f *Feed) WriteRSS(w io.Writer) error

func (f *Feed) WriteJSON(w io.Writer) error

Some Atom feeds not populating Link correctly

Based on the spec, atom:link elements with a rel attribute of alternate or a missing rel attribute should be considered as links.

Currently, line 54 in atom.go correctly sets the link for the latter case, but not the former.

if link.Rel == "" {
  next.Link = link.Href
}

should probably be

if link.Rel == "alternate" || link.Rel == "" {
  next.Link = link.Href
}

This small change fixes a few feeds that were parsing incorrectly for me.

fails to populate items for when fetching some feedburner feeds

Hi, I've successfully fetched other feeds, but these fox news feeds are not populating any of the items. any ideas?

http://feeds.foxnews.com/podcasts/TalkingPoints?format=xml
http://feeds.feedburner.com/foxnews/podcasts/FoxNewsSundayVideo?format=xml

This is all it populates in my *feed:

%+v

    "FOX News Sunday Video"
    "http://feeds.feedburner.com/foxnews/podcasts/FoxNewsSundayVideo?format=xml"
    Image ""
    Refresh at Wed 22 Oct 2014 04:41:00 UTC
    Unread: 0
    Items:

%#v

&rss.Feed{Nickname:"", Title:"FOX News Sunday", Description:"FOX News Sunday Video", Link:"http://feeds.feedburner.com/foxnews/podcasts/FoxNewsSundayVideo?format=xml", UpdateURL:"http://feeds.feedburner.com/foxnews/podcasts/FoxNewsSundayVideo?format=xml", Image:(*rss.Image)(0xc21012be40), Items:[]*rss.Item{}, ItemMap:map[string]struct {}{}, Refresh:time.Time{sec:63549549818, nsec:0x26182591, loc:(*time.Location)(0xa61fe0)}, Unread:0x0}

edit: it works fine with the CNN feedburner feed: http://rss.cnn.com/services/podcasting/ac360/rss.xml
(but none of the foxnews ones)

Atom struct tag wrong

in atom.go:85
Content string xml:"summary"
should be
Content string xml:"content"

Possible to parse other XML fields?

I would like to get the value of the field <newznab:attr name="group" value="alt.binaries.teevee"/> so ending up with the value alt.binaries.teevee.

How do I do so?

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/" encoding="utf-8">
 <channel>
  <atom:link href="https://REMOVED.com/api" rel="self" type="application/rss+xml"/>
  <title>REMOVED</title>
  <description>API Details</description>
  <link>https://REMOVED.com/</link>
  <language>en-gb</language>
  <webMaster>[email protected]</webMaster>
  <category>Stuff</category>
  <generator>Me</generator>
  <ttl>10</ttl>
  <docs>https://removed.com/apihelp/</docs>
  <image url="https://removed.com/themes/shared/img/logo.png" title="REMOVED" link="https://removed.com/" description="Visit REMOVED"/>
  <newznab:response offset="0" total="125000"/>
  <item>
   <title>Fair.Go.2017.09.18.HDTV.x264-FiHTV </title>
   <guid isPermaLink="true">https://REMOVED.com/details/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d</guid>
   <link>https://REMOVED.com/getnzb/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d.nzb&amp;i=1&amp;r=3bc4e94ef14337e4e2b490a3897c48f6</link>
   <comments>https://REMOVED.com/details/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d#comments</comments>
   <pubDate>Tue, 19 Sep 2017 10:18:21 +0200</pubDate>
   <category>TV &gt; SD</category>
   <description>Fair.Go.2017.09.18.HDTV.x264-FiHTV </description>
   <enclosure url="https://REMOVED.com/getnzb/427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d.nzb&amp;i=1&amp;r=3bc4e94ef14337e4e2b490a3897c48f6" length="168013625" type="application/x-nzb"/>
   <newznab:attr name="category" value="5030"/>
   <newznab:attr name="size" value="168013625"/>
   <newznab:attr name="files" value="17"/>
   <newznab:attr name="poster" value="[email protected] (yeahsure)"/>
   <newznab:attr name="prematch" value="1"/>
   <newznab:attr name="info" value="https://REMOVED.com/api?t=info&amp;id=427d2b6c5fb3a0f73bd43be4bb8cff955700fd4d&amp;r=3bc4e94ef14337e4e2b490a3897c48f6"/>
   <newznab:attr name="grabs" value="0"/>
   <newznab:attr name="comments" value="0"/>
   <newznab:attr name="password" value="0"/>
   <newznab:attr name="usenetdate" value="Tue, 19 Sep 2017 10:07:47 +0200"/>
   <newznab:attr name="group" value="alt.binaries.teevee"/>
  </item>
</channel>
</rss>

cannot parse "Sat, 30 Abr 2016 08:28:59 GMT" as "2006"

As far as I can see, the package tries to parse time with the default string format provided by the time package...

Providing a way to set the string format for parsing certain feeds which don't use a orthodox (from time point of view) display format for time would be helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.