Giter VIP home page Giter VIP logo

Comments (8)

rbren avatar rbren commented on May 20, 2024 1

Currently there's no way to add headers to the HTTP request, but if you want to send a PR, I'd be happy to take a look. We set the user agent to 'rss-parser' right now.

Otherwise, I'd recommend doing the HTTP request yourself, and just using parseString instead of parseURL.

from rss-parser.

rbren avatar rbren commented on May 20, 2024 1

I would suggest setting the Accept header to tell it what content-type you want back. Probably application/rss+xml or text/xml

It probably makes sense for the parser to pass this header automatically as well.

from rss-parser.

yPhil-gh avatar yPhil-gh commented on May 20, 2024

Thanks a lot for your answer @bobby-brennan, and sorry for my late follow-up ; This is what I use now to get the feed:

function getFeed (urlfeed, callback) {

  var options = {
    url: urlfeed,
    headers: {
      'User-Agent': 'Mozilla/5.0',
      'Content-Type': 'application/x-www-form-urlencoded'
    }
  };

  var req = request (options, urlfeed);
  var feedparser = new FeedParser ();
  var feedItems = [];
  req.on ('response', function (res) {
    var stream = this;
    if (res.statusCode === 200 && res.headers['content-type'].includes('xml')) {
      stream.pipe (feedparser);

    } else {
      callback (res.headers['content-type']);
      return;
    }
  });
  req.on ('error', function (res) {
    console.log ('getFeed: Error reading %s (%s) .', urlfeed, res);
  });
  feedparser.on ('readable', function () {
    try {
      var item = this.read ();
      if (item !== null) { //2/9/17 by DW
        feedItems.push (item);
      }
    }
    catch (err) {
      console.log ('getFeed: err.message == ' + err.message);
    }
  }).on ('end', function () {
    var meta = this.meta;
    callback ('Feed OK', feedItems, meta.title);
  }).on ('error', function (err) {
    // console.log ("getFeed: Error reading (%s) feed: %s.", urlfeed, err.message);
    callback ('Bad feed');
  });
}

router.get('/feed', function(req, res) {

  getFeed(req.query.feedurl, function (err, feedItems, feedTitle) {
        if (feedItems) {
            res.send({
                feedItems: feedItems,
                feedTitle: feedTitle
            });
        } else {
          res.send({error:err});
        }
    });

});

But http://www.blendernation.com/feed/ keeps sending back "text/html; charset=UTF-8" as res.headers['content-type']... Very puzzling (and rare, right now I have only found 2 or 3 servers that behave like that) ; when trying to wget or curl the URI, it retrieves a proper XML/Atom file...? Any ideas as to exactly what either or both 'User-Agent' and 'Content-Type' values would permit the retrieval of the feed ?
Thanks a lot for your answer @bobby-brennan, and sorry for my late follow-up ; This is what I use now to get the feed:

function getFeed (urlfeed, callback) {

  var options = {
    url: urlfeed,
    headers: {
      'User-Agent': 'Mozilla/5.0',
      'Content-Type': 'application/x-www-form-urlencoded'
    }
  };

  var req = request (options);
  var feedparser = new FeedParser ();
  var feedItems = [];
  req.on ('response', function (res) {
    var stream = this;
    if (res.statusCode === 200 && res.headers['content-type'].includes('xml')) {
      stream.pipe (feedparser);

    } else {
      callback (res.headers['content-type']);
      return;
    }
  });
  req.on ('error', function (res) {
    console.log ('getFeed: Error reading %s (%s) .', urlfeed, res);
  });
  feedparser.on ('readable', function () {
    try {
      var item = this.read ();
      if (item !== null) { //2/9/17 by DW
        feedItems.push (item);
      }
    }
    catch (err) {
      console.log ('getFeed: err.message == ' + err.message);
    }
  }).on ('end', function () {
    var meta = this.meta;
    callback ('Feed OK', feedItems, meta.title);
  }).on ('error', function (err) {
    // console.log ("getFeed: Error reading (%s) feed: %s.", urlfeed, err.message);
    callback ('Bad feed');
  });
}

router.get('/feed', function(req, res) {

  getFeed(req.query.feedurl, function (err, feedItems, feedTitle) {
        if (feedItems) {
            res.send({
                feedItems: feedItems,
                feedTitle: feedTitle
            });
        } else {
          res.send({error:err});
        }
    });

});

But http://www.blendernation.com/feed/ keeps sending back "text/html; charset=UTF-8" as res.headers['content-type']... Very puzzling (and rare, right now I have only found 2 or 3 servers that behave like that) ; when trying to wget or curl the URI, it retrieves a proper XML/Atom file...? Any ideas as to exactly what either or both 'User-Agent' and 'Content-Type' values would permit the retrieval of the feed ?

from rss-parser.

yPhil-gh avatar yPhil-gh commented on May 20, 2024

It probably makes sense for the parser to pass this header automatically as well.

That would be great ; If I can help in any way (I'm thinking testing, to the extent of writing the proper tests) just ask.

from rss-parser.

rbren avatar rbren commented on May 20, 2024

I'm going to do this in v3, as changing the headers sent to the server might cause a different response.

Can you try it with the v3 branch? npm install https://github.com/bobby-brennan/rss-parser#v3

from rss-parser.

rbren avatar rbren commented on May 20, 2024

(note: the v3 API changed a bit, so check the README)

from rss-parser.

rbren avatar rbren commented on May 20, 2024

This is merged in version 3.0.0

from rss-parser.

rbren avatar rbren commented on May 20, 2024

As pointed out in #111, adding ?format=xml may fix this issue for some feeds.

from rss-parser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.