Giter VIP home page Giter VIP logo

Comments (8)

vanniktech avatar vanniktech commented on May 30, 2024 1

I also maintain a RSS Reader and you won't believe how many things you have to take into consideration. I saw that you started implementing some like supporting multiple feed standards, multiple date formats etc.

I'm stripping the bom away before parsing the Feed. If I remember correctly, I'm actually stripping anything away until the first < because there are also feeds that like to place strings before the actual xml tags ...

Another thing that I saw when you replaced your parsing with a multiplatform library is that you don't parse HTML. However HTML websites can actually reference their feeds (<link rel="alternate" type="application/rss+xml" title="RSS" href="/rss.xml" />) so it's nice when you for instance try to add a website, automatically parse the html get all of the feeds (there can be more) and then add them automatically in one go.

from twine.

owocado avatar owocado commented on May 30, 2024 1

Hi, thanks for reporting the issue. There might be some incorrect tag present the RSS feed. Is the issue reproducible when using Atom feed as well?

I will take a look at why the RSS feed parsing is failing.

hello, yes I have tried both atom & RSS feed links and same issue happens.

from twine.

vanniktech avatar vanniktech commented on May 30, 2024 1

I'm operating on a ByteArray which I then convert to a String using the desired charset (this is important, many Russian feeds otherwise break), and .removePrefix("\uFEFF") // Bom. then I use byteArrayString.indexOfAny to find any of the desired starting tags such as <?xml, <html or from all the specifications that I use and drop the string. For now this seems to be handling all of the cases. As for the Russian Feed I think it was this one https://www.opennet.ru/opennews/opennews_review.rss which you can use to test that it works correctly and display the cyrillic alphabet.

from twine.

owocado avatar owocado commented on May 30, 2024

hmm I searched for issues here and saw you mentioned W3's RSS feed validator so I tried that tool and looks like said Microsoft RSS feeds do not validate. could that be the reason? 👀

from twine.

msasikanth avatar msasikanth commented on May 30, 2024

Hi, thanks for reporting the issue. There might be some incorrect tag present the RSS feed. Is the issue reproducible when using Atom feed as well?

I will take a look at why the RSS feed parsing is failing.

from twine.

msasikanth avatar msasikanth commented on May 30, 2024

It looks like the following feeds include BOM at start of the document/feed which is causing the parser to not properly move ahead with parsing.

from twine.

msasikanth avatar msasikanth commented on May 30, 2024

Another thing that I saw when you replaced your parsing with a multiplatform library is that you don't parse HTML. However HTML websites can actually reference their feeds () so it's nice when you for instance try to add a website, automatically parse the html get all of the feeds (there can be more) and then add them automatically in one go.

Hi, I do parse HTML when trying to fetch a feed and find the link to the feed.

from twine.

msasikanth avatar msasikanth commented on May 30, 2024

I'm stripping the bom away before parsing the Feed. If I remember correctly, I'm actually stripping anything away until the first < because there are also feeds that like to place strings before the actual xml tags ...

@vanniktech how are you doing this? Are you using regex to strip the BOM or a different method? At the moment I am trying regex to strip it away and it is working. Just want to check if there is a alternative you have tried.

from twine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.