Giter VIP home page Giter VIP logo

node-read's Introduction

NPM

Node-read

Get Readable Content from any page. Based on Arc90's readability project.

Features

  1. Blazingly Fast. This project is based on Cheerio engine, which is 8x times faster than JSDOM.

Why not Node-readability

Before starting this project I used Node-readability, but the dependencies of that project plus the slowness of JSDOM made it very frustrating to work with. The compiling of contextify module (dependency of JSDOM) failed 9/10 times. And if you wanted to use node-readability with node-webkit you had to manually rebuild contextify with nw-gyp, which is not the optimal solution.

So I decided to write my own version of Arc90's Readability using the fast Cheerio engine with the least number of dependencies.

The Usage of this module is similiar to node-readability, so it's easy to switch.

Install

npm install node-read

Usage

read(html [, options], callback)

Where

  • html url or html code.
  • options is an optional options object
  • callback is the callback to run - callback(error, article, meta)

Example

var read = require('node-read');

read('http://howtonode.org/really-simple-file-uploads', function(err, article, res) {

  // Main Article.
  console.log(article.content);
  
  // Title
  console.log(article.title);

  // HTML 
  console.log(article.html);
  
  // DOM
  console.log(article.dom);
  
});

TODO

  • Examples, Docs
  • Get Comments with articles
  • Get the Author of the article
  • Better removal of unnecessary nodes
  • Better scoring of content:
    • Based on siblings
    • Based on content length, common words
    • Link density, Image density, other common elements density

node-read's People

Contributors

bndr avatar tjatse avatar rarira avatar scheeser avatar abeltramo avatar bryant1410 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.