damoeb / rss-proxy Goto Github PK
View Code? Open in Web Editor NEWRSS-proxy allows you to do create an RSS or ATOM feed of almost any website, just by analyzing just the static HTML structure.
Home Page: https://rssproxy.migor.org
RSS-proxy allows you to do create an RSS or ATOM feed of almost any website, just by analyzing just the static HTML structure.
Home Page: https://rssproxy.migor.org
Thanks for this great tool. Just wondering if it is possible that the pubDate of each item in the generated RSS feed won't be modified according to the time when this RSS feed is freshened / generated. I mean the pubDate of each item in the feed is essential for me. The time in pubDate
is better to stay unchanged. I know this is also a problem for other PAGE2RSS tools like Feed43.com. Wish there is a way to solve this issue.
I'm getting the error when attempting to 'show feeds' for some sources. I've confirmed that the webpages don't require Javascript. This doesn't affect all pages.
The error message appears on the right-hand panel.
rss-proxy v. 1.0.3 @0b5334a build on 16-0-2021
some affected websites or pages:
https://www.eurogamer.net/
https://yle.fi/uutiset/18-194469
Add a query param to define if a native feed should be returned, if one is available. Not applicable if there are multiple feeds defined.
Hi,
Starting from a fresh git checkout, the "Developing RSS-proxy" section mentions to run npm run install
, but the command actually fails (tested on macOS with npm 6.13.4 and npm 7.7.6):
tmuguet-2:rss-proxy tmuguet$ npm run install
npm ERR! missing script: install
npm ERR!
npm ERR! Did you mean this?
npm ERR! preinstall
Running npm run preinstall
seems to work. Is that the correct command?
cd packages/proxy && npm run start
Getting this
[nodemon] 1.19.4
[nodemon] to restart at any time, enter rs
[nodemon] watching dir(s): src/**/*
[nodemon] watching extensions: ts
[nodemon] starting ts-node ./src/app.ts
/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:261
return new TSError(diagnosticText, diagnosticCodes)
^
TSError: _ Unable to compile TypeScript:
src/endpoints/feedEndpoint.ts(5,44): error TS2307: Cannot find module '@rss-proxy/core' or its corresponding type declarations.
src/endpoints/feedEndpoint.ts(15,31): error TS2345: Argument of type 'string | ParsedQs | string[] | ParsedQs[]' is not assignable to parameter of type 'string'.
Type 'ParsedQs' is not assignable to type 'string'.
src/endpoints/feedEndpoint.ts(35,31): error TS2345: Argument of type 'string | ParsedQs | string[] | ParsedQs[]' is not assignable to parameter of type 'string'.
Type 'ParsedQs' is not assignable to type 'string'.
at createTSError (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:261:12)
at getOutput (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:367:40)
at Object.compile (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:558:11)
at Module.m._compile (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:439:43)
at Module._extensions..js (internal/modules/cjs/loader.js:1157:10)
at Object.require.extensions.<computed> [as .ts] (/root/rss-proxy/packages/proxy/node_modules/ts-node/src/index.ts:442:12)
at Module.load (internal/modules/cjs/loader.js:985:32)
at Function.Module._load (internal/modules/cjs/loader.js:878:14)
at Module.require (internal/modules/cjs/loader.js:1025:19)
at require (internal/modules/cjs/helpers.js:72:18)
[nodemon] app crashed - waiting for file changes before starting...
There is collection of things that should still be adressed for v1
Open
Done
I want to anonymize the app requests. Is there any way to use HTTP proxy?
Feed: https://chromium-disclosed-bugs.appspot.com/
Error:
Http failure response for https://rssproxy-v1.migor.org/api/feed/live?url=https%3A%2F%2Fchromium-disclosed-bugs.appspot.com%2F: 504 Gateway Time-out
Most likely the website is JavaScript generated, which is not supported by rss-proxy directly. Check the documentation for further help.
But seems there no JS (I can get info via curl)
Hi,
rss-proxy is really promising !
Do you plan to make an Arm compatible docker image ?
If not, where do you think I should start to build such an image ? Build the container from code ?
Thanks
i'm using damoeb/rss-proxy:2.1
either I don't understand APP_API_GATEWAY_URL
either it don't works but every URL feed produce by rss-proxy are invalid in my RSS Reader (Miniflux), I tested the urls provided also on several RSS checker online, each time I get the following error :
This website is too slow to answer : Get "http://{myVPSpublicIP:port}/api/tf?url=https%3A%2F%2Fwww.teotimepacreau.fr%2Ffeed%2Ffeed.xml&re=none&out=atom": dial tcp myVPSpublicIP:port: i/o timeout.
{myVPSpublicIP:port} are my real numbers in the err message but i'm not exposing it here
so far I've tested to associate APP_API_GATEWAY_URL
with the following values :
but never seem to work, am I doing something wrong ?
Not sure how this is supposed to work. The "Your Atom Feed" button just points to the source url. Switching to "RSS" does not change this. Where is the generated feed?
I was trying your site on Audacity's blog since it tragically lacks an RSS feed and noticed some issues with your algorithm for detecting articles, and I believe this is because you don't prioritize semantic HTML enough.
To understand what semantic HTML is, see MDN's HTML: A good basis for accessibility.
Audacity luckily does have their articles in <article>
tags, which makes them extremely accessible for us, yet it's the fourth best option. (Also, the fourth option contains 10 articles, not 3.)
The best one according to your algorithm selects some paragraphs in the middle of an article.
You would also do best to rank feeds in <aside>
and <footer>
elements poorly or simply not show them to the user at all, as these clutter the bottom of the list.
Hope this is helpful!
If a DOM changes this will most likely result in a broken/empty feed. In that case it would be nice to see a feed entry describing the problem that sends you to the rss-proxy instance where you can patch the problem. As a consequence it might be better to turn the current statelessness (all params in the url) into a stateful server, cause patching a broken link will not result in a new url.
Is that image name right?
I found damoeb/rss-proxy:2.0.0-beta but no damoeb/rss-proxy:2
Hi,
When i want to generate feeds with v2 beta, i have an error :
No feeds found. Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @3dd3bcd
Sometimes, it worked. I can check the website but i cant generate feeds. I have an other error
java.lang.NullPointerException
System: debian11
https://rssproxy.migor.org/api/w2f?v=0.1&url=https%3A%2F%2Fzksecurity.xyz%2Fblog%2F&link=.%2Fa%5B1%5D&context=%2F%2Fdiv%5B1%5D%2Fdiv%5B1%5D%2Fdiv%5B1%5D%2Fdiv&date=.%2Fdiv%5B1%5D%2Ftime%5B1%5D&re=none&out=atom&debug=true
In the above link, the time in published is not the original post publish time but the rss generation time.
It seems like the linked demoserver is down: https://rssproxy-v1.migor.org/
Hi! I'm building https://newssnips.fyi/about and wanted to use this service for websites that don't have an rss feed. Would you be interested in helping out on the project by restarting the demo site?
Or, to provide an updated copy of a working version?
Looks like this site does not contain any feed data.
Most likely the website is JavaScript generated, which is not supported by rss-proxy directly.
curl https://blog.path.net/
<html>
<head><title>307 Temporary Redirect</title></head>
<body>
<center><h1>307 Temporary Redirect</h1></center>
<hr><center>openresty</center>
</body>
</html>
Looks like that site have anti bot protection
Normally a context of an article is just one node. There are websites like HN or arxiv.org that split this article unit into two separate nodes. This state can be represented in xpath https://stackoverflow.com/a/16584668/807017 , so it might be worth giving it a try.
My local instance of rss-proxy tried to pull in some articles on seeking alpha. The articles showed a title of "[ERROR" or something similar and when clicked on pointed to "https://rssproxy.migor.org/?url=https%3A%2F%2Fseekingalpha.com%2Fauthor%2Fhigh-yield-investor%2Fpremium-articles%23". Shouldn't it still be pointing to my local instance? Going directly to migor showed it was giving a "gateway error" but local instance was responding fine.
Can you implement a preview of the generated feeds?
My usecase:
Great tool.
Been struggling a bit to get rss-proxy to run behind an nginx reverse proxy (SWAG). The issue that arises is that rss-proxy appears to redirect to the root directory of the web server. Runs fine without the reverse proxy (http://ip_address:3000) but is exposed to the world.
My configuration:
rss-proxy running in a docker container
SWAG as the nginx reverse proxy also running in a docker container
I've used the reverse proxy to rewrite some portions of the html to route to the correct source (from / to /rss-proxy) and this gets the rendering of the page partially completed. This results in the below page source. The left panel comes up fine (Please enter the URL of the website you want an RSS Feed from) however, a call to core.js.pre-build-optimizer.js causes the right panel to fail
Http failure during parsing for https://redacted_domain_name/
Most likely the website is JavaScript generated, which is not supported by rss-proxy directly. Check the documentation for further help
Again it returns a reference back to the root directory of the web server and does not allow nginx 'sub_filter' command to make replacements.
Any help much appreciated.
================================================
nginx proxy-confs snippet:
sub_filter 'src="/' 'src="/rss-proxy/';
sub_filter_once off;
================================================
proxied page source:
<!doctype html>
<title>RSS-proxy playground</title> <script src="rss-proxy/runtime-es2015.cdfb0ddb511f65fdc0a0.js" type="module"></script><script src="rss-proxy/runtime-es5.cdfb0ddb511f65fdc0a0.js" nomodule defer></script><script src="rss-proxy/polyfills-es5.790a3785e4df737dcc1e.js" nomodule defer></script><script src="rss-proxy/polyfills-es2015.9f9a7e9d82395a8b4bf0.js" type="module"></script><script src="rss-proxy/main-es2015.73d3d495c2160e54d3e6.js" type="module"></script><script src="rss-proxy/main-es5.73d3d495c2160e54d3e6.js" nomodule defer></script>================================================
core.js.pre-build-optimizer.js error - fetching content from web root...
ERROR Error: Uncaught (in promise): Up: {"headers":{"normalizedNames":{},"lazyUpdate":null},"status":200,"statusText":"OK","url":"https://redacted_domain_name/","ok":false,"name":"HttpErrorResponse","message":"Http failure during parsing for https://redacted_domain_name/","error":{"error":{},"text":"<html lang="en"><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1,viewport-fit=cover"><meta ...
feed could not be parsed correctly
webpage: https://www.heise.de
error:
https://rssproxy.migor.org/?reason=Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @531be3c5&url=https%3A%2F%2Fwww.heise.de
2022-07-12T13:09:15-+0000
Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module @531be3c5
reproduceable
greets
m
Suggestion: There's a popular headless browser project Browserless, it is very simple to use, maintaining independent puppeteer docker image maybe painful sometime, browserless is a good choice : )
Puppeteer allows you to specify a remote location for chrome via the browserWSEndpoint option. Setting this for browserless is a single line of code change.
Before
const browser = await puppeteer.launch();
After
const browser = await puppeteer.connect({ browserWSEndpoint: 'ws://localhost:3000' });
Feedparser should use semantic information like http://microformats.org/ or https://schema.org/.
Hi, I've tested this on the mirror at https://rssproxy.blyat.org/
It seems that every feed I generate
Example:
The POC deployment is still available and in use. I will decommission it in about a month. Feeds served from this instance will then render an error and invite the user to reconfigure rss-proxy v1.
After that reconfigure the routes that v1 will point to rssproxy subdomain.
Url of POC: https://rssproxy.migor.org/
Url of v1: https://rssproxy-v1.migor.org/
Version 2 is available for beta testing under https://rssproxy.migor.org
Issues
Todos
Great tool, thank you.
will it be easy to have a full feed available? while retrieving the rss
for example https://www.airforcetimes.com/news/
I knew you were solving my problem. Thank you.
I can't pay a lot of money, but I'm willing to pay.
It should be easy and available even if it is not a developer. With a little knowledge. No-code!!!
Korean Many sites do not provide RSS feeds. I figured out how to find JSON.
It would be very convenient if you could convert it to RSS feed. If this is resolved, you will be able to create rss from a service that does not provide other RSS.
Your service cannot designate a title or set a link directly. I want this kind of service.
https://github.com/RSS-Bridge/rss-bridge/blob/master/bridges/XPathBridge.php
// SMTOWN
https://www.vlive.tv/channel/FD53B/board/3530
https://www.vlive.tv/channel/FD53B/board/69
// aespa
https://www.vlive.tv/channel/97CCED/board/7553
https://www.vlive.tv/channel/97CCED/board/7375
// V MUSICAL
https://www.vlive.tv/channel/EDE229/board/4029
V LIVE : LIVE STREAM, FAN COMMUNNITY
SM Entertainment has closed the artist's official site. From now on, they will post announcements through V LIVE.
New video, New Post
I looked for something that might help.
https://github.com/gdegauquier/vlive-api
https://github.com/robindz/VSharp
https://github.com/Seklfreak/Robyul2/blob/master/modules/plugins/vlive.go
https://json2feed.ssig33.com/
https://github.com/ssig33/json2feed
e.g)
NAVER POST
https://post.naver.com/my.nhn?memberNo=41350603
JSON (async/ajax/api):
https://m.post.naver.com/async/my.nhn?memberNo=41350603&postListViewType=0&isExpertMy=true
None of the choices provided by this page: https://www.fool.com/author/20415 appear to be the list of 21 articles on the page. Is there a tweak I can make to help it out?
Whenever I run docker-compose up
, I get this error
Starting rssprodxy_rich-puppeteer_1 ...
Starting rssprodxy_rich-puppeteer_1 ... error
ERROR: for rssprodxy_rich-puppeteer_1 Cannot start service rich-puppeteer: Decoding seccomp profile failed: invalid character 'c' looking for beginning of value
ERROR: for rich-puppeteer Cannot start service rich-puppeteer: Decoding seccomp profile failed: invalid character 'c' looking for beginning of value
ERROR: Encountered errors while bringing up the project.
How do I fix this?
https://support.mozilla.org/en-US/kb/how-subscribe-news-feeds-and-blogs
When adding an rss-proxy-generated feed, the messages 'the feed URL is not a valid feed' and ' check validation and retrieve a valid URL' are displayed in the Thunderbird 'feed subscriptions' window.
v1.0.3 (as I recall I've only used the build @0b5334a
) was not affected by this issue, to my knowledge.
an example feed source: https://www.iana.org/domains/reserved
(selected the feed 'in DOM with 11 articles [BEST]')
v2.1.0 @909fdd2 (Docker)
Thunderbird 102.10.0 (Linux)
root@debian:~# ps aux | sort -k4,4nr | head -n 10
www 5021 0.5 7.5 1876584 304204 ? Sl Jul08 67:48 /usr/lib/chromium-browser/chromium-browser --no-sandbox --disable-gpu --user-data-dir --window-size=1280,1024 --window-position=0,0 --enable-pinch
www 19776 0.0 2.4 598004 98552 ? Ssl Jul11 0:53 /opt/rss-proxy/node_modules/puppeteer/.local-chromium/linux-818858/chrome-linux/chrome --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=TranslateUI --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --enable-blink-features=IdleDetection --headless --hide-scrollbars --mute-audio about:blank --disable-dev-shm-usage --disable-default-apps --disable-extensions --disable-gpu --disable-sync --disable-translate --mute-audio --no-first-run --no-sandbox --safebrowsing-disable-auto-update --remote-debugging-port=0 --user-data-dir=/tmp/puppeteer_dev_chrome_profile-UiOeRn
not much information in the wiki: https://github.com/damoeb/rss-proxy/wiki/Filters
the following characters, in addition to space, don't work:
,
;
|
Are we limited to a single filter for both include and exclude filter fields?
As the title says, when multiply feeds are extracted, and they are all valid. It's better if we can select multiply feeds.
Hello,
am trying to get feed for firmware updates on asus site, but i cant get the feed properly, need help
https://www.asus.com/Networking-IoT-Servers/WiFi-Routers/ASUS-WiFi-Routers/RT-AC5300/HelpDesk_BIOS/
is it possible?
thanks
I am working on version 1 of rss-proxy that will bring a better algorithm to detect feed patterns and a visualization in the website of interest. Also, the scoring of feed candiates has been reworked next to a simple filtering option.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.