Giter VIP home page Giter VIP logo

nreadability's People

Contributors

marek-stoj avatar

Watchers

James Cloos avatar

nreadability's Issues

Opening paragraphs of some nytimes articles dropped

For example, http://www.nytimes.com/2010/11/09/books/09book.html and 
http://www.nytimes.com/2010/11/14/world/asia/14myanmar.html?hp

I've attached a patch that fixes the problem, along with a new unit test.  (I 
generated the patch with cygwin svn, so let me know if the line endings are 
messed up or anything.)

The fix itself was simple; readability.js, when determining which siblings to 
append to the main content body, gives a bonus when a sibling has the same 
class name.  I added that logic to CreateArticleContentElement().

And thanks for porting Readability to C#!  It's really been a great help on a 
few projects.


Original issue reported on code.google.com by [email protected] on 13 Nov 2010 at 9:44

Attachments:

Automatically fetching subsequent pages of a multi-page article

One thing that's great about readability.js is that it can often fetch 
subsequent pages from the first page of a multi-page article.  It would be 
great if nreadability could do that as well.

The meat of this is findNextPageLink(), which I've already implemented as a 
method in NReadabilityTranscoder for my own use.  It seems to work well, but 
before I go any further I'd like to know (1) if this is a feature you'd like to 
include in nreadability, and (2) if so, what you want its interface to look 
like.

There are a couple of options that I can see.  First, a new, public method:

  TranscodeFromWeb(string url, out bool mainContentExtracted) { }

potentially with the overload 

  TranscodeFromWeb(string url, out bool mainContentExtracted, IPageFetcher fetcher) { }

so the http requests can be mocked for testing (or a public property). 

Another option is to create a new class NReadabilityWebTranscoder that calls 
NReadabilityTranscoder as many times as necessary.  NReadabilityTranscoder 
would be updated to set the nextPage parameter as a public property (or output 
parameter), if appropriate.  



Original issue reported on code.google.com by [email protected] on 16 Nov 2010 at 7:09

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.