Giter VIP home page Giter VIP logo

damoeb / rss-proxy Goto Github PK

View Code? Open in Web Editor NEW
1.7K 25.0 108.0 3.64 MB

RSS-proxy allows you to do create an RSS or ATOM feed of almost any website, just by analyzing just the static HTML structure.

Home Page: https://rssproxy.migor.org

JavaScript 1.82% TypeScript 64.93% HTML 27.91% Dockerfile 0.28% SCSS 3.09% Makefile 0.10% Kotlin 1.87%
atom-feed rss-generator rss-feed json-feed rss-proxy rss-feed-generator

rss-proxy's Introduction

RSS-proxy

Build Status

RSS-proxy allows you to do create an ATOM or JSON feed of any static website or feeds (web to feed), just by analyzing just the HTML structure. Try the demo. It is an alternative UI to feedless with a reduced feature set. If you want advanced features like fulltext feeds, aggregation, persistence, authentication and others, checkout feedless

Playground

Features

  • Web to Feed
  • Feed to Feed: pipe existing native feeds through rss-proxy to filter them
  • Filters
  • Self Hosting

Advanced Features

If you look for features below, you have to use feedless, the successor of rss-proxy

  • Feed Aggregation
  • Authentication and multi-tenancy
  • JavaScript Support (prerendering)
  • Fulltext Feeds and other content enrichments
  • Persistence
  • CLI
  • GraphQL API
  • Plugins

Changelog

See here

Quickstart using docker

If you have docker or podman installed, do this

docker pull damoeb/rss-proxy:2.1
docker run -p 8080:8080 -e APP_API_GATEWAY_URL=https://foo.bar -it damoeb/rss-proxy:2.1

APP_API_GATEWAY_URL is your outfacing url, which will be used as host for feeds you create.

Then open localhost:8080 in the browser.

Legacy Version 1

If you are interested in running the first prototype, this is how you do it.

docker pull damoeb/rss-proxy:1
docker run -p 3000:3000 -it damoeb/rss-proxy:1

Then open localhost:3000 in the browser.

License

This project uses the following license: GNU GPLv3.

rss-proxy's People

Contributors

damoeb avatar dependabot[bot] avatar nv-bot avatar tmuguet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rss-proxy's Issues

Issues with v1

There is collection of things that should still be adressed for v1

Open

  • Undocumented JSON feed option
  • Hint to user if rules cannot be displayed
  • Raw HTML content should render absolute links
  • Test nitter.net

Done

  • spinner does not work
  • proxy endpoint is obsolete (performance)

full feed?

Great tool, thank you.
will it be easy to have a full feed available? while retrieving the rss

for example https://www.airforcetimes.com/news/

"npm run install" fails

Hi,

Starting from a fresh git checkout, the "Developing RSS-proxy" section mentions to run npm run install, but the command actually fails (tested on macOS with npm 6.13.4 and npm 7.7.6):

tmuguet-2:rss-proxy tmuguet$ npm run install
npm ERR! missing script: install
npm ERR! 
npm ERR! Did you mean this?
npm ERR!     preinstall

Running npm run preinstall seems to work. Is that the correct command?

Arm docker image

Hi,

rss-proxy is really promising !
Do you plan to make an Arm compatible docker image ?
If not, where do you think I should start to build such an image ? Build the container from code ?

Thanks

Announcing v.1

I am working on version 1 of rss-proxy that will bring a better algorithm to detect feed patterns and a visualization in the website of interest. Also, the scoring of feed candiates has been reworked next to a simple filtering option.

Getting Issue in trying to install Version 2

Whenever I run docker-compose up, I get this error

Starting rssprodxy_rich-puppeteer_1 ...
Starting rssprodxy_rich-puppeteer_1 ... error

ERROR: for rssprodxy_rich-puppeteer_1  Cannot start service rich-puppeteer: Decoding seccomp profile failed: invalid character 'c' looking for beginning of value

ERROR: for rich-puppeteer  Cannot start service rich-puppeteer: Decoding seccomp profile failed: invalid character 'c' looking for beginning of value

ERROR: Encountered errors while bringing up the project.

How do I fix this?

JSON2RSS

I knew you were solving my problem. Thank you.
I can't pay a lot of money, but I'm willing to pay.
It should be easy and available even if it is not a developer. With a little knowledge. No-code!!!


Korean Many sites do not provide RSS feeds. I figured out how to find JSON.
It would be very convenient if you could convert it to RSS feed. If this is resolved, you will be able to create rss from a service that does not provide other RSS.

Your service cannot designate a title or set a link directly. I want this kind of service.
https://github.com/RSS-Bridge/rss-bridge/blob/master/bridges/XPathBridge.php


V LIVE

Website URL

// SMTOWN
https://www.vlive.tv/channel/FD53B/board/3530
https://www.vlive.tv/channel/FD53B/board/69
// aespa
https://www.vlive.tv/channel/97CCED/board/7553
https://www.vlive.tv/channel/97CCED/board/7375
// V MUSICAL
https://www.vlive.tv/channel/EDE229/board/4029

Website description

V LIVE : LIVE STREAM, FAN COMMUNNITY
SM Entertainment has closed the artist's official site. From now on, they will post announcements through V LIVE.

What content should be included?

New video, New Post

Additional description

I looked for something that might help.
https://github.com/gdegauquier/vlive-api
https://github.com/robindz/VSharp
https://github.com/Seklfreak/Robyul2/blob/master/modules/plugins/vlive.go


JSON2RSS

https://json2feed.ssig33.com/
https://github.com/ssig33/json2feed

https://crssnt.com/

e.g)
NAVER POST
https://post.naver.com/my.nhn?memberNo=41350603

JSON (async/ajax/api):
https://m.post.naver.com/async/my.nhn?memberNo=41350603&postListViewType=0&isExpertMy=true

help with feed?

Hello,

am trying to get feed for firmware updates on asus site, but i cant get the feed properly, need help
https://www.asus.com/Networking-IoT-Servers/WiFi-Routers/ASUS-WiFi-Routers/RT-AC5300/HelpDesk_BIOS/

is it possible?

thanks

Prefer native feed if available

Add a query param to define if a native feed should be returned, if one is available. Not applicable if there are multiple feeds defined.

Feature request: preview of feeds

Can you implement a preview of the generated feeds?
My usecase:

  • feed is getting explored
  • choose your feed (depending on score)
  • while staying mouseover the chosen feed show a preview of the generated feed (in a popup?) to see if the correct querrys have been selected and how it will / could be displayed in the rss aggregator
    Would be awesome!

Make it easier to fix broken feed specs

If a DOM changes this will most likely result in a broken/empty feed. In that case it would be nice to see a feed entry describing the problem that sends you to the rss-proxy instance where you can patch the problem. As a consequence it might be better to turn the current statelessness (all params in the url) into a stateful server, cause patching a broken link will not result in a new url.

After installing version 1, especially opening js, the memory usage is increasing, how can I release some regularly

root@debian:~# ps aux | sort -k4,4nr | head -n 10
www 5021 0.5 7.5 1876584 304204 ? Sl Jul08 67:48 /usr/lib/chromium-browser/chromium-browser --no-sandbox --disable-gpu --user-data-dir --window-size=1280,1024 --window-position=0,0 --enable-pinch
www 19776 0.0 2.4 598004 98552 ? Ssl Jul11 0:53 /opt/rss-proxy/node_modules/puppeteer/.local-chromium/linux-818858/chrome-linux/chrome --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=TranslateUI --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --enable-blink-features=IdleDetection --headless --hide-scrollbars --mute-audio about:blank --disable-dev-shm-usage --disable-default-apps --disable-extensions --disable-gpu --disable-sync --disable-translate --mute-audio --no-first-run --no-sandbox --safebrowsing-disable-auto-update --remote-debugging-port=0 --user-data-dir=/tmp/puppeteer_dev_chrome_profile-UiOeRn

error when running version 2

ERROR FeedEndpoint - [Sz2K] Unable to discover feeds: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @52af6cff
图片

Error message: "Maintenance required"

feed could not be parsed correctly

webpage: https://www.heise.de

config:
image

clicking this article:
image

error:
https://rssproxy.migor.org/?reason=Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @531be3c5&url=https%3A%2F%2Fwww.heise.de
2022-07-12T13:09:15-+0000

Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @531be3c5

reproduceable
greets
m

The time in published is not the correct time

https://rssproxy.migor.org/api/w2f?v=0.1&url=https%3A%2F%2Fzksecurity.xyz%2Fblog%2F&link=.%2Fa%5B1%5D&context=%2F%2Fdiv%5B1%5D%2Fdiv%5B1%5D%2Fdiv%5B1%5D%2Fdiv&date=.%2Fdiv%5B1%5D%2Ftime%5B1%5D&re=none&out=atom&debug=true

In the above link, the time in published is not the original post publish time but the rss generation time.

support for browserless?

Suggestion: There's a popular headless browser project Browserless, it is very simple to use, maintaining independent puppeteer docker image maybe painful sometime, browserless is a good choice : )

Puppeteer allows you to specify a remote location for chrome via the browserWSEndpoint option. Setting this for browserless is a single line of code change.

Before

const browser = await puppeteer.launch();

After

const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://localhost:3000' });

Docker Image home page

TSError: _ Unable to compile TypeScript:

  • Start server

cd packages/proxy && npm run start

Getting this

[nodemon] 1.19.4
[nodemon] to restart at any time, enter rs
[nodemon] watching dir(s): src/**/*
[nodemon] watching extensions: ts
[nodemon] starting ts-node ./src/app.ts

/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:261
return new TSError(diagnosticText, diagnosticCodes)
^
TSError: _ Unable to compile TypeScript:
src/endpoints/feedEndpoint.ts(5,44): error TS2307: Cannot find module '@rss-proxy/core' or its corresponding type declarations.
src/endpoints/feedEndpoint.ts(15,31): error TS2345: Argument of type 'string | ParsedQs | string[] | ParsedQs[]' is not assignable to parameter of type 'string'.
Type 'ParsedQs' is not assignable to type 'string'.
src/endpoints/feedEndpoint.ts(35,31): error TS2345: Argument of type 'string | ParsedQs | string[] | ParsedQs[]' is not assignable to parameter of type 'string'.
Type 'ParsedQs' is not assignable to type 'string'.

at createTSError (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:261:12)
at getOutput (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:367:40)
at Object.compile (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:558:11)
at Module.m._compile (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:439:43)
at Module._extensions..js (internal/modules/cjs/loader.js:1157:10)
at Object.require.extensions.<computed> [as .ts] (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:442:12)
at Module.load (internal/modules/cjs/loader.js:985:32)
at Function.Module._load (internal/modules/cjs/loader.js:878:14)
at Module.require (internal/modules/cjs/loader.js:1025:19)
at require (internal/modules/cjs/helpers.js:72:18)

[nodemon] app crashed - waiting for file changes before starting...

Blank screen on demo page

Hi! I'm building https://newssnips.fyi/about and wanted to use this service for websites that don't have an rss feed. Would you be interested in helping out on the project by restarting the demo site?

Or, to provide an updated copy of a working version?

Is it possible to reserve the time of <pubDate> of each item ?

Thanks for this great tool. Just wondering if it is possible that the pubDate of each item in the generated RSS feed won't be modified according to the time when this RSS feed is freshened / generated. I mean the pubDate of each item in the feed is essential for me. The time in pubDate is better to stay unchanged. I know this is also a problem for other PAGE2RSS tools like Feed43.com. Wish there is a way to solve this issue.

Version 2 Issues

Version 2 is available for beta testing under https://rssproxy.migor.org

Issues

  • page title and add og tags [ok]
  • menu may be larger then vh
  • maintenance messages should be reworded

Todos

  • grafana [ok]
  • service announcement
  • secure actuator endpoint
  • scale 2 for puppeteer
  • check throttling values
  • check v1 legacy support

How to run rss-proxy behind an nginx reverse proxy

Great tool.

Been struggling a bit to get rss-proxy to run behind an nginx reverse proxy (SWAG). The issue that arises is that rss-proxy appears to redirect to the root directory of the web server. Runs fine without the reverse proxy (http://ip_address:3000) but is exposed to the world.

My configuration:

rss-proxy running in a docker container
SWAG as the nginx reverse proxy also running in a docker container

I've used the reverse proxy to rewrite some portions of the html to route to the correct source (from / to /rss-proxy) and this gets the rendering of the page partially completed. This results in the below page source. The left panel comes up fine (Please enter the URL of the website you want an RSS Feed from) however, a call to core.js.pre-build-optimizer.js causes the right panel to fail

Http failure during parsing for https://redacted_domain_name/
Most likely the website is JavaScript generated, which is not supported by rss-proxy directly. Check the documentation for further help

Again it returns a reference back to the root directory of the web server and does not allow nginx 'sub_filter' command to make replacements.

Any help much appreciated.

================================================
nginx proxy-confs snippet:

sub_filter 'src="/'  'src="/rss-proxy/';
sub_filter_once off;

================================================
proxied page source:

<!doctype html>

<title>RSS-proxy playground</title> <script src="rss-proxy/runtime-es2015.cdfb0ddb511f65fdc0a0.js" type="module"></script><script src="rss-proxy/runtime-es5.cdfb0ddb511f65fdc0a0.js" nomodule defer></script><script src="rss-proxy/polyfills-es5.790a3785e4df737dcc1e.js" nomodule defer></script><script src="rss-proxy/polyfills-es2015.9f9a7e9d82395a8b4bf0.js" type="module"></script><script src="rss-proxy/main-es2015.73d3d495c2160e54d3e6.js" type="module"></script><script src="rss-proxy/main-es5.73d3d495c2160e54d3e6.js" nomodule defer></script>

================================================
core.js.pre-build-optimizer.js error - fetching content from web root...

ERROR Error: Uncaught (in promise): Up: {"headers":{"normalizedNames":{},"lazyUpdate":null},"status":200,"statusText":"OK","url":"https://redacted_domain_name/","ok":false,"name":"HttpErrorResponse","message":"Http failure during parsing for https://redacted_domain_name/","error":{"error":{},"text":"<html lang="en"><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1,viewport-fit=cover"><meta ...

Thunderbird doesn't consider feeds generated by rss-proxy v2.1.0 as valid

https://support.mozilla.org/en-US/kb/how-subscribe-news-feeds-and-blogs

When adding an rss-proxy-generated feed, the messages 'the feed URL is not a valid feed' and ' check validation and retrieve a valid URL' are displayed in the Thunderbird 'feed subscriptions' window.

v1.0.3 (as I recall I've only used the build @0b5334a) was not affected by this issue, to my knowledge.

an example feed source: https://www.iana.org/domains/reserved (selected the feed 'in DOM with 11 articles [BEST]')


v2.1.0 @909fdd2 (Docker)
Thunderbird 102.10.0 (Linux)

What does 'undefined' mean, when fetching feeds?

I'm getting the error when attempting to 'show feeds' for some sources. I've confirmed that the webpages don't require Javascript. This doesn't affect all pages.

The error message appears on the right-hand panel.


rss-proxy v. 1.0.3 @0b5334a build on 16-0-2021

some affected websites or pages:

https://www.eurogamer.net/
https://yle.fi/uutiset/18-194469

Looks like this site does not contain any feed data.

Feed https://blog.path.net/

Looks like this site does not contain any feed data.
Most likely the website is JavaScript generated, which is not supported by rss-proxy directly.
curl https://blog.path.net/
<html>
<head><title>307 Temporary Redirect</title></head>
<body>
<center><h1>307 Temporary Redirect</h1></center>
<hr><center>openresty</center>
</body>
</html>

Looks like that site have anti bot protection

Error when generate RSS feeds - v2 beta

Hi,

When i want to generate feeds with v2 beta, i have an error :
No feeds found. Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @3dd3bcd
image

Sometimes, it worked. I can check the website but i cant generate feeds. I have an other error
java.lang.NullPointerException
image

System: debian11

Prioritize semantic HTML.

I was trying your site on Audacity's blog since it tragically lacks an RSS feed and noticed some issues with your algorithm for detecting articles, and I believe this is because you don't prioritize semantic HTML enough.

To understand what semantic HTML is, see MDN's HTML: A good basis for accessibility.

Audacity luckily does have their articles in <article> tags, which makes them extremely accessible for us, yet it's the fourth best option. (Also, the fourth option contains 10 articles, not 3.)

The actual best feed

The best one according to your algorithm selects some paragraphs in the middle of an article.

The best but actually really bad feed

You would also do best to rank feeds in <aside> and <footer> elements poorly or simply not show them to the user at all, as these clutter the bottom of the list.

Hope this is helpful!

APP_API_GATEWAY_URL don't seem to work

i'm using damoeb/rss-proxy:2.1

either I don't understand APP_API_GATEWAY_URL either it don't works but every URL feed produce by rss-proxy are invalid in my RSS Reader (Miniflux), I tested the urls provided also on several RSS checker online, each time I get the following error :

This website is too slow to answer : Get "http://{myVPSpublicIP:port}/api/tf?url=https%3A%2F%2Fwww.teotimepacreau.fr%2Ffeed%2Ffeed.xml&re=none&out=atom": dial tcp myVPSpublicIP:port: i/o timeout.

{myVPSpublicIP:port} are my real numbers in the err message but i'm not exposing it here

so far I've tested to associate APP_API_GATEWAY_URL with the following values :

but never seem to work, am I doing something wrong ?

Error on Local Instance results in link pointing to migor?

My local instance of rss-proxy tried to pull in some articles on seeking alpha. The articles showed a title of "[ERROR" or something similar and when clicked on pointed to "https://rssproxy.migor.org/?url=https%3A%2F%2Fseekingalpha.com%2Fauthor%2Fhigh-yield-investor%2Fpremium-articles%23". Shouldn't it still be pointing to my local instance? Going directly to migor showed it was giving a "gateway error" but local instance was responding fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.