Giter VIP home page Giter VIP logo

rssreader's People

Contributors

arlol avatar dependabot[bot] avatar kdima001 avatar markuspoerschke avatar mikusch avatar scorta avatar w3stling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rssreader's Issues

Support read from file

After read from remote successfully, I would store it into storage (i.e, local file system) for next time read faster.
There is no method to read from String or Bytes :-(

How to pass some object / parameter into custom RssReader / AbstractRssReader?

I want to do some logic in the registerItemTags() based on several conditions,

For example when creating the reader, i passed some param thru the header,

List<Item> items = new RssReader().addHeader("Param","1").read("http://google.com").toList();

Then i want to read it again,

public class MovieRssReader extends AbstractRssReader<Channel, Item> {

    @Override
    protected void registerItemTags() {
           if(parent.getHeader("Param") == "1")
                do this
    }

Any help will be appreciated, thanks

Response http status code: 401

Hi,

I am getting the below exception on this line -rssReader.read(URL)
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Response http status code: 401

how can we pass the credentials ?

thanks !

Too Many Open Files

I am getting an exception after using the RssReader on my server. It seems to only happen after awhile. From a little research it seems things are not being closed down somewhere. I tried to copy the code and debug it to find out where but I couldn't.

Sep 17 12:37:16 ip-172-31-46-83 web: java.io.IOException: java.util.concurrent.ExecutionException: java.lang.InternalError: java.net.SocketException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.read(RssReader.kt:60)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationViewModel.updateArticlesForCategory(ApplicationViewModel.kt:98)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationViewModel.refreshArticles(ApplicationViewModel.kt:70)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationKt$module$$inlined$scheduleAtFixedRate$1$lambda$1.invokeSuspend(Application.kt:38)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
Sep 17 12:37:16 ip-172-31-46-83 web: Caused by: java.util.concurrent.ExecutionException: java.lang.InternalError: java.net.SocketException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.read(RssReader.kt:47)
Sep 17 12:37:16 ip-172-31-46-83 web: ... 9 common frames omitted
Sep 17 12:37:16 ip-172-31-46-83 web: Caused by: java.lang.InternalError: java.net.SocketException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.PlainHttpConnection.<init>(PlainHttpConnection.java:224)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.AsyncSSLConnection.<init>(AsyncSSLConnection.java:49)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpConnection.getSSLConnection(HttpConnection.java:239)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpConnection.getConnection(HttpConnection.java:225)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Http2Connection.createAsync(Http2Connection.java:360)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Http2ClientImpl.getConnectionFor(Http2ClientImpl.java:127)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.ExchangeImpl.get(ExchangeImpl.java:89)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Exchange.establishExchange(Exchange.java:299)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Exchange.responseAsyncImpl0(Exchange.java:431)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Exchange.responseAsyncImpl(Exchange.java:336)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.Exchange.responseAsync(Exchange.java:328)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.MultiExchange.responseAsyncImpl(MultiExchange.java:346)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.MultiExchange.lambda$responseAsync0$2(MultiExchange.java:292)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.lang.Thread.run(Thread.java:834)
Sep 17 12:37:16 ip-172-31-46-83 web: Caused by: java.net.SocketException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.Net.socket0(Native Method)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.Net.socket(Net.java:433)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.Net.socket(Net.java:426)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:121)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.nio.channels.SocketChannel.open(SocketChannel.java:150)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.PlainHttpConnection.<init>(PlainHttpConnection.java:213)
Sep 17 12:37:16 ip-172-31-46-83 web: ... 18 common frames omitted
Sep 17 12:37:16 ip-172-31-46-83 web: Exception in thread "DefaultDispatcher-worker-1" java.lang.InternalError: java.io.IOException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientImpl.<init>(HttpClientImpl.java:311)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientImpl.create(HttpClientImpl.java:253)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientBuilderImpl.build(HttpClientBuilderImpl.java:135)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.createHttpClient(RssReader.kt:391)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.sendAsyncRequest(RssReader.kt:84)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.readAsync(RssReader.kt:73)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.freedom.utils.RssReader.read(RssReader.kt:47)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationViewModel.updateArticlesForCategory(ApplicationViewModel.kt:98)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationViewModel.refreshArticles(ApplicationViewModel.kt:70)
Sep 17 12:37:16 ip-172-31-46-83 web: at tech.devezin.ApplicationKt$module$$inlined$scheduleAtFixedRate$1$lambda$1.invokeSuspend(Application.kt:38)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
Sep 17 12:37:16 ip-172-31-46-83 web: at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
Sep 17 12:37:16 ip-172-31-46-83 web: Caused by: java.io.IOException: Too many open files
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.EPoll.create(Native Method)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:79)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.base/java.nio.channels.Selector.open(Selector.java:295)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.<init>(HttpClientImpl.java:699)
Sep 17 12:37:16 ip-172-31-46-83 web: at java.net.http/jdk.internal.net.http.HttpClientImpl.<init>(HttpClientImpl.java:308)
Sep 17 12:37:16 ip-172-31-46-83 web: ... 15 more

getPubDateZonedDateTime() throws an Exception instead of returning an empty Optional

The method getPubDateZonedDateTime() returns an Optional<ZonedDateTime>. For some reason it cannot parse the date shown in the example below:

Item item = items.get(0);
item.setPubDate("Sat, 21 Jan 2023 11:12:30 GMT");

// Throws an Exception
Optional<ZonedDateTime> pubDate = item.getPubDateZonedDateTime();

An exception is thrown

Exception in thread "main" java.time.format.DateTimeParseException: Text 'Sat, 21 Jan 2023 11:12:30 GMT' could not be parsed at index 0
	at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2052)
	at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1954)
	at java.base/java.time.ZonedDateTime.parse(ZonedDateTime.java:600)
	at com.apptasticsoftware.rssreader/com.apptasticsoftware.rssreader.DateTime.toZonedDateTime(DateTime.java:179)
	at java.base/java.util.Optional.map(Optional.java:260)
	at com.apptasticsoftware.rssreader/com.apptasticsoftware.rssreader.Item.getPubDateZonedDateTime(Item.java:235)

I wonder that the pubDate can't be parsed.
Expected behavior: Returning an empty Optional if parsing fails.
Tested with version 3.4.1.

"content" tag should be treated separately

At this moment the content tag is being treated as description, but it should be treated separately (adding the content variable along with setContent(), getContent() methods, etc).

Many feeds have the entire article there (most of WordPress sites, in fact).

Failed to read file

Currently trying to read http://na.leagueoflegends.com/en/rss.html but get this error

WARNING: Failed to parse XML. 
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
Message: Premature end of file.
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:652)
	at com.apptastic.rssreader.RssReader$RssItemIterator.next(RssReader.java:202)
	at com.apptastic.rssreader.RssReader$RssItemIterator.peekNext(RssReader.java:177)
	at com.apptastic.rssreader.RssReader$RssItemIterator.hasNext(RssReader.java:187)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
	at Commands.CustomCommands.Subscribers.RssLeagueThread.run(RssLeagueThread.java:43)
	at java.base/java.lang.Thread.run(Thread.java:834)

I suspect the website is blocking the connection? As that URL and XML is perfectly valid.

Datetime format not recognized

When trying to sort Items I get an exception "Unknown date time format" fired from https://github.com/w3stling/rssreader/blob/master/src/main/java/com/apptastic/rssreader/DateTime.java#L87 .

The date that is trying to be parsed is 2021-11-17T13:21:21Z. It isn't recognized but is compliant with the case DateTimeFormatter.ISO_OFFSET_DATE_TIME (atom feed rfc).

An example of a feed like this can be the GitHub personal feed (found at the bottom of https://github.com when logged in).

Problems with some special date time format(s)

Getting IllegalArgumentException when a feed provides the item date with this special format. I do not have found a simple way to provide a specific date time format to handle this formatting aspect.

Would be nice to have some access points to enhance the API to use/handle additional formats.

Exception in thread "main" java.lang.IllegalArgumentException: Unknown date time format 2023-02-28T17:37:08.823050+00:00
at com.apptasticsoftware.rssreader.DateTime.toZonedDateTime(DateTime.java:174)
at com.apptasticsoftware.rssreader.DateTime.toEpochMilli(DateTime.java:353)
at java.base/java.util.Optional.map(Optional.java:260)
at com.apptasticsoftware.rssreader.util.ItemComparator.lambda$newestItemFirst$1(ItemComparator.java:32)

blank enclosure length leads to exception

First of all many thanks for this lib :) I started experimenting with it today and it's been great.

One thing I came across: one of the feeds I am trying to consume has the enclosure length set to blank (length=""). Sadly this breaks here:

enclosureAttributes.put("length", (i, v) -> i.getEnclosure().ifPresent(e -> e.setLength(Long.parseLong(v))) );

According to the spec the value must be set - so the feed is doing it wrong. Nonetheless I was wondering if there is a way for the code to do something reasonable and came up with this locally:

enclosureAttributes
		.put("length", (i, v) -> i.getEnclosure().ifPresent(e -> {
			if (!v.isBlank()) {
				e.setLength(Long.parseLong(v));
			}
		}));

In accordance with the Robustness principle it doesn't seem like the worst idea to me.

What do you think?

If you think it's reasonable, I could create a PR.

Support for multiple instances of Enclosure

I'm not sure if this would be out of scope or not but torznab feeds have the ability to contain multiple enclosures.
The spec for that is here.

I would greatly appreciate if this was possible to add.

How to subscribe to RSS feed?

I'm not quite understanding how this repository does what it promises. You state the following in the description:

Subscribing to a website RSS removes the need for the user to manually check the website for new content.

This is closely followed by:

This Java RSS parser library makes it easier to automate data extraction from RSS or Atom feeds via Java stream API.

To me, and I would like to believe most people, this suggests that this repository has the capabilities of "subscribing" to an "RSS" feed (henceforth "automating data extraction"). But unless I'm being silly and just completely missing something, this is not possible with this library?

If this could be added, that would be great because otherwise I don't believe it fulfils the description of this repository. Don't get me wrong, this library is very handy nonetheless, but what I would consider the more important part of the library is being able to subscribe to a feed. Otherwise it verges on the edge of not being so useful.

I also cannot help but point out, the fact that it uses the Java Stream API is a weird point to advertise this repository from? Is that really a such a big thing? It's not any different than just giving us a List and us calling List#stream. Generally it feels like a bad idea to just pass streams around since obviously they do need to be closed.

How to get all image?

I have about 700 different RSS sources and lots of different image formats. How do I detect all images in these?

[FEATURE REQUEST] Parse String directly.

Would be nice to have a read() method capable of parsing a String containing a feed directly. At this moment you need to convert the String to byte[] and then to ByteArrayInputStream.

Are the RSS items cached?

Hello,

I've just noticed that my RSS feed source has changed content but the data I get from RssReader is still the old data. Does it have a cache? If so, is that configurable? I'm on version 3.4.0.

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.