Comments (8)
Currently there's no way to add headers to the HTTP request, but if you want to send a PR, I'd be happy to take a look. We set the user agent to 'rss-parser' right now.
Otherwise, I'd recommend doing the HTTP request yourself, and just using parseString
instead of parseURL
.
from rss-parser.
I would suggest setting the Accept
header to tell it what content-type you want back. Probably application/rss+xml
or text/xml
It probably makes sense for the parser to pass this header automatically as well.
from rss-parser.
Thanks a lot for your answer @bobby-brennan, and sorry for my late follow-up ; This is what I use now to get the feed:
function getFeed (urlfeed, callback) {
var options = {
url: urlfeed,
headers: {
'User-Agent': 'Mozilla/5.0',
'Content-Type': 'application/x-www-form-urlencoded'
}
};
var req = request (options, urlfeed);
var feedparser = new FeedParser ();
var feedItems = [];
req.on ('response', function (res) {
var stream = this;
if (res.statusCode === 200 && res.headers['content-type'].includes('xml')) {
stream.pipe (feedparser);
} else {
callback (res.headers['content-type']);
return;
}
});
req.on ('error', function (res) {
console.log ('getFeed: Error reading %s (%s) .', urlfeed, res);
});
feedparser.on ('readable', function () {
try {
var item = this.read ();
if (item !== null) { //2/9/17 by DW
feedItems.push (item);
}
}
catch (err) {
console.log ('getFeed: err.message == ' + err.message);
}
}).on ('end', function () {
var meta = this.meta;
callback ('Feed OK', feedItems, meta.title);
}).on ('error', function (err) {
// console.log ("getFeed: Error reading (%s) feed: %s.", urlfeed, err.message);
callback ('Bad feed');
});
}
router.get('/feed', function(req, res) {
getFeed(req.query.feedurl, function (err, feedItems, feedTitle) {
if (feedItems) {
res.send({
feedItems: feedItems,
feedTitle: feedTitle
});
} else {
res.send({error:err});
}
});
});
But http://www.blendernation.com/feed/ keeps sending back "text/html; charset=UTF-8" as res.headers['content-type']
... Very puzzling (and rare, right now I have only found 2 or 3 servers that behave like that) ; when trying to wget
or curl
the URI, it retrieves a proper XML/Atom file...? Any ideas as to exactly what either or both 'User-Agent' and 'Content-Type' values would permit the retrieval of the feed ?
Thanks a lot for your answer @bobby-brennan, and sorry for my late follow-up ; This is what I use now to get the feed:
function getFeed (urlfeed, callback) {
var options = {
url: urlfeed,
headers: {
'User-Agent': 'Mozilla/5.0',
'Content-Type': 'application/x-www-form-urlencoded'
}
};
var req = request (options);
var feedparser = new FeedParser ();
var feedItems = [];
req.on ('response', function (res) {
var stream = this;
if (res.statusCode === 200 && res.headers['content-type'].includes('xml')) {
stream.pipe (feedparser);
} else {
callback (res.headers['content-type']);
return;
}
});
req.on ('error', function (res) {
console.log ('getFeed: Error reading %s (%s) .', urlfeed, res);
});
feedparser.on ('readable', function () {
try {
var item = this.read ();
if (item !== null) { //2/9/17 by DW
feedItems.push (item);
}
}
catch (err) {
console.log ('getFeed: err.message == ' + err.message);
}
}).on ('end', function () {
var meta = this.meta;
callback ('Feed OK', feedItems, meta.title);
}).on ('error', function (err) {
// console.log ("getFeed: Error reading (%s) feed: %s.", urlfeed, err.message);
callback ('Bad feed');
});
}
router.get('/feed', function(req, res) {
getFeed(req.query.feedurl, function (err, feedItems, feedTitle) {
if (feedItems) {
res.send({
feedItems: feedItems,
feedTitle: feedTitle
});
} else {
res.send({error:err});
}
});
});
But http://www.blendernation.com/feed/ keeps sending back "text/html; charset=UTF-8" as res.headers['content-type']
... Very puzzling (and rare, right now I have only found 2 or 3 servers that behave like that) ; when trying to wget
or curl
the URI, it retrieves a proper XML/Atom file...? Any ideas as to exactly what either or both 'User-Agent' and 'Content-Type' values would permit the retrieval of the feed ?
from rss-parser.
It probably makes sense for the parser to pass this header automatically as well.
That would be great ; If I can help in any way (I'm thinking testing, to the extent of writing the proper tests) just ask.
from rss-parser.
I'm going to do this in v3, as changing the headers sent to the server might cause a different response.
Can you try it with the v3 branch? npm install https://github.com/bobby-brennan/rss-parser#v3
from rss-parser.
(note: the v3 API changed a bit, so check the README)
from rss-parser.
This is merged in version 3.0.0
from rss-parser.
As pointed out in #111, adding ?format=xml
may fix this issue for some feeds.
from rss-parser.
Related Issues (20)
- Add `parseBuffer` method HOT 1
- Removing http & https web request library dependencies HOT 2
- Get first 3 items separately
- Parser throws on 304s
- Not able to get data from CDATA coming in description tag in RSS HOT 1
- Cointelegraph RSS not being parsed HOT 2
- `HPE_INVALID_HEADER_TOKEN` error due to `Invalid header value char`
- Explain custom fields in TypeScript sample. HOT 1
- Convert item.title with html tags to html
- ReadMe.Remove non-working Typescript code that is very confusing.
- add build-related files and dirs to .npmignore file
- Socket remains open after a timeout HOT 1
- How can I limit the data from the rss on NextJS? HOT 1
- Error: 500 Cannot read properties of undefined (reading 'prototype') HOT 2
- Getting breaking change errors for reactjs project HOT 8
- Uncaught TypeError: Cannot read properties of undefined (reading 'protoype') HOT 3
- CVE-2023-0842 high-severity vulnerability in current version of dependency xml2js HOT 1
- Broken on webpack 5 HOT 3
- I am not able to fetch few RSS urls like BBC and Times Of India HOT 1
- ParseURL fails with ERR_ADDRESS_UNREACHABLE in Android Emulator using Ionic/Capacitor in Android HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rss-parser.