Giter VIP home page Giter VIP logo

html2markdown's Introduction

HTML2Markdown

Javascript implementation for converting HTML to Markdown text. Browser and Node.js support.

Installation

npm install html2markdown

Usage in node.js

var html2markdown = require('html2markdown');

console.log(html2markdown('<h1>Hello markdown!</h1>'));

Usage in browser

<script type="text/javascript" src="markdown_dom_parser.js"></script>
<script type="text/javascript" src="html2markdown.js"></script>

console.log(html2markdown("<h1>Hello markdown!</h1>"))

This call will return convert the html and return the mardown string like ""# H1\n\n"

Changes in this implementation

  • Added new htmldomparser. A simple html parser implementation that assumes parsing is done in browser. Shold be compatible with john Resig's parser.
  • Parser implementation provided support for ignoring tags that you do not want to convert.
  • Parser also has an option to ignore dom elements with hidden styles.
  • Added rules for parsing PRE, CODE, SPAN, DIV, TD, DL, DT
  • Added support for ignoring tags that you do not want to convert.
  • Improved "startBlock" method and renamed it to "block"
  • Added support for nested lists
  • Fixed some showdown rendering issues when a link has a nested image
  • Some readability changes like collapse whitespace, treat images as block elements, do not output text if elements are empty.
  • Added support for converting relative url's to absolute url's
  • Dropped wordwrap function as it does not seem a good idea to introduce new lines in the converter. and wordwrap behaviro was not consistents as elements can be nested.
  • Added support for refeence style images and links (option driven to choose between inline markdown formatting and refernce style formatting)
  • Added ton's of unit tests.

Known conversion issues

If HTML tag is of following form. Then, currently showdown fails to render

    <a href="/some_link">
        <h1>
                <img src="/some_image_lin"/>
        </h1>
    </a>

Testing

In Node.js

npm install
npm test

In Browser

Just open SpecRunner.html in your browser.

html2markdown's People

Contributors

alexgorbatchev avatar hgilani avatar kates avatar sudodoki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

html2markdown's Issues

I want to read a html, and convert to markdown, but failed

var html2markdown = require('html2markdown');
var fs = require('fs');
var http = require('http');
http.createServer(function(req, res){
	fs.readFile('index.html',function (err, data){
		if (err) throw new Error(err);

		res.writeHead(200, {'Content-Type': 'text/html','Content-Length':data.length});
		res.write(html2markdown(data));
		res.end();
	});
}).listen(8000);

ERROR:

`
L:\github\Java\springmvc-cookbook\cloudstreetmarket-parent\cloudstreetmarket-webapp\src\main\webapp\static\node_modules\html2markdown\markdown_html_parser.js:105
match = html.match( startTag );
^

TypeError: html.match is not a function
at HTMLParser (L:\github\Java\springmvc-cookbook\cloudstreetmarket-parent\cloudstreetmarket-webapp\src\main\webapp\static\node_modules\html2markdown\markdown_html_parser.js:105:19)
at html2markdown (L:\github\Java\springmvc-cookbook\cloudstreetmarket-parent\cloudstreetmarket-webapp\src\main\webapp\static\node_modules\html2markdown\html2markdown.js:197:2)
at module.exports (L:\github\Java\springmvc-cookbook\cloudstreetmarket-parent\cloudstreetmarket-webapp\src\main\webapp\static\node_modules\html2markdown\index.js:7:10)
at L:\github\Java\springmvc-cookbook\cloudstreetmarket-parent\cloudstreetmarket-webapp\src\main\webapp\static\js\html2markdown.js:16:13
at FSReqWrap.readFileAfterClose [as oncomplete] (fs.js:445:3)

`

Copy arbitrary HTML

I understand the reason why, by default, html2markdown should ignore certain unsafe HTML. However, in my case, I would like html2markdown to copy HTML it doesn't understand to the markdown.

This is required when our (trusted) users wish to embed a youtube video or twitter embed.

Can't parse the html page

If I pass a html from a url , it got an error:

Parse Error: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" lang="en">

<head>
  <link href='http://fonts.googleapis.com/css?family=Inconsolata' rel='stylesheet' type='text/css'><title>Aggregation Pipeline &mdash; MongoDB Manual 2.4.8</title><link rel="shortcut icon" href="http://media.mongodb.org/favicon.ico" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.