Giter VIP home page Giter VIP logo

hoedown's Introduction

Hoedown

Build Status

Hoedown is a revived fork of Sundown, the Markdown parser based on the original code of the Upskirt library by Natacha Porté.

Features

  • Fully standards compliant

    Hoedown passes out of the box the official Markdown v1.0.0 and v1.0.3 test suites, and has been extensively tested with additional corner cases to make sure its output is as sane as possible at all times.

  • Massive extension support

    Hoedown has optional support for several (unofficial) Markdown extensions, such as non-strict emphasis, fenced code blocks, tables, autolinks, strikethrough and more.

  • UTF-8 aware

    Hoedown is fully UTF-8 aware, both when parsing the source document and when generating the resulting (X)HTML code.

  • Tested & Ready to be used on production

    Hoedown has been extensively security audited, and includes protection against all possible DOS attacks (stack overflows, out of memory situations, malformed Markdown syntax...).

    We've worked very hard to make Hoedown never leak or crash under any input.

    Warning: Hoedown doesn't validate or post-process the HTML in Markdown documents. Unless you use HTML_ESCAPE or HTML_SKIP, you should strongly consider using a good post-processor in conjunction with Hoedown to prevent client-side attacks.

  • Customizable renderers

    Hoedown is not stuck with XHTML output: the Markdown parser of the library is decoupled from the renderer, so it's trivial to extend the library with custom renderers. A fully functional (X)HTML renderer is included.

  • Optimized for speed

    Hoedown is written in C, with a special emphasis on performance. When wrapped on a dynamic language such as Python or Ruby, it has shown to be up to 40 times faster than other native alternatives.

  • Zero-dependency

    Hoedown is a zero-dependency library composed of some .c files and their headers. No dependencies, no bullshit. Only standard C99 that builds everywhere.

  • Additional features

    Hoedown comes with a fully functional implementation of SmartyPants, a separate autolinker, escaping utilities, buffers and stacks.

Bindings

You can see a community-maintained list of Hoedown bindings at the wiki. There is also a migration guide available for authors of Sundown bindings.

Help us

Hoedown is all about security. If you find a (potential) security vulnerability in the library, or a way to make it crash through malicious input, please report it to us by emailing the private Hoedown Security mailing list. The Hoedown security team will review the vulnerability and work with you to reproduce and resolve it.

Unicode character handling

Given that the Markdown spec makes no provision for Unicode character handling, Hoedown takes a conservative approach towards deciding which extended characters trigger Markdown features:

  • Punctuation characters outside of the U+007F codepoint are not handled as punctuation. They are considered as normal, in-word characters for word-boundary checks.

  • Whitespace characters outside of the U+007F codepoint are not considered as whitespace. They are considered as normal, in-word characters for word-boundary checks.

Install

Just typing make will build Hoedown into a dynamic library and create the hoedown and smartypants executables, which are command-line tools to render Markdown to HTML and perform SmartyPants, respectively.

If you are using CocoaPods, just add the line pod 'hoedown' to your Podfile and call pod install.

Or, if you prefer, you can just throw the files at src into your project.

hoedown's People

Contributors

andre-d avatar bdolman avatar blaenk avatar bnoordhuis avatar brandonc avatar brief avatar clemensg avatar craigbarnes avatar cuviper avatar cweider avatar davidszotten avatar fhahn avatar fsx avatar gregleaver avatar jbergstroem avatar jjallaire avatar jnovinger avatar js avatar julian7 avatar kjk avatar marklodato avatar mattr- avatar mildsunrise avatar robin850 avatar samb avatar soffes avatar spladug avatar stevewolter avatar uranusjr avatar vmg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hoedown's Issues

Strange behaviour in executables

Try to do ./hoedown in your shell,
then write something.

You'll notice you have to send an
EOF (Ctrl-D) twice for the program
to output the HTML.

Add support for yaml frontmatter

Add support for the yaml frontmatter as used in the static website generator Jekyll (see http://jekyllrb.com/docs/frontmatter/).

Example:

---
layout: post
title: Blogging Like a Hacker

---

Github renders the frontmatter as a table like this on top of the document:
screen shot 2014-06-26 at 10 38 12

Another option would be to just ignore it altogether as it's not content of the document but rather metadata.

This was originally requested to the markdown editor MacDown here: MacDownApp/macdown#14

Support for GitHub flavored Markdown

This might be a tough one, but since GitHub left Sundown in a dust and Redcarpet is apparently now their Markdown parser of choice my question is: What about possible future extensions to GFM?

Do you plan to reimplement such a possible changes Hoedown? Should Hoedown support traditional Markdown only?

Again thanks for your time and effort!

Entities are not escaped

When enabling HOEDOWN_HTML_ESCAPE, people usually expects that all the special HTML symbols are treated as literal text. So this:

<pack> moved <pack>

Renders as:

<pack> moved <pack>

However, that's not the case with entities. This:

pack &hellip; pack

Renders as:

pack … pack

Is that correct behaviour?
Should we treat entities just like we treat regular text and HTML, when ESCAPE is enabled?

Support for a special HTML render

In Sundown you are able to extend the standard HTML render with a customized version to handle code blocks. I wrote this class to integrate Pygments to highlight the code:

//! @brief This handler override the blockCode() method of Sundown HTML render, to highlight the source code using Pygmentize class.
class SyntaxHighlighter extends HTML {

  public function blockCode($code, $language) {
    $highlightedSource = Pygmentize::highlight($code, $language);

    return $highlightedSource;
  }

} 

To use it, you have to create an instance of the SyntaxHighlighter and pass it to the Sundown constructor like this:

$render = new SyntaxHighlighter(
    [
      'filter_html' => TRUE,
      'hard_wrap' => TRUE
    ]
);

$markdown = new Sundown\Markdown($render,
    [
      'no_intra_emphasis' => TRUE,
      'tables' => TRUE,
      'fenced_code_blocks' => TRUE,
      'autolink' => TRUE,
      'strikethrough' => TRUE,
      'lax_html_blocks' => TRUE,
      'space_after_headers' => TRUE,
      'superscript' => TRUE
    ]
);

May I do that in Hoedown also or this feature is not supported yet?

I'm using php-ext-hoedown.

BUFPUTSL

The BUFPUTSL macro is in buffer.h, so it'll get defined
whenever the user #includes a header of Hoedown.

It should either get namespaced, or, to avoid clobbering all
the code, defined locally at the beginning of each source file.

support kramdown tables

kramdown table definition is a slight variation from the php markdown extra table already supported.

Rename hoedown_markdown_* to hoedown_document_*

I've decided we need to rename hoedown_markdown_* to hoedown_document_* as it's way clearer what the intention is through the API.

Compare:

  ob = hoedown_buffer_new(OUTPUT_UNIT);
  renderer = hoedown_html_renderer_new(0, 0);
  markdown = hoedown_markdown_new(extensions, 16, renderer);
  hoedown_markdown_render(ob, (uint8_t*) input.data, input.size, markdown);

...to...

  ob = hoedown_buffer_new(OUTPUT_UNIT);
  renderer = hoedown_html_renderer_new(0, 0);
  document = hoedown_document_new(extensions, 16, renderer);
  hoedown_document_render(ob, (uint8_t*) input.data, input.size, document);

/cc @jmendeth

Documentation

We should document the API.
What system should we use?

wishlist: nonstandard document link labels

i'm having problems with the form of "link labels" that can come in from user generated content.

A classic example would be as follows ( with periods used do denote indenting )

For example [Click Here][1]

..[1]: http://example.com

The problem is in the last line, where the label is.

If there are 4 or more spaces, the line is interpreted as preformatted text. if there are [0,1,2,3] spaces, the document renders with it interpreted as a label.

I wanted to wishlist an option flag to allow for non-standard spacing at the end of the document to handle these.

The markdown is being parsed technically correct -- but with an increasing number of user-generated markdown coming from javascript editors, I've been seeing stuff like this slip through.

Intra emphasis doesn't work at all

I have many articles using something like this:

please use *call_user_func('nome_funzione')* and you'll be happy

I'm expecting (like GitHub does)

please use call_user_func('nome_funzione') and you'll be happy

instead Hoedown generates:

please use calluserfunc('nome_funzione') and you'll be happy

where user is underlined. (The underline style is not supported by GitHub so I can't render it).

When using NO_INTRA_EMPHASIS = TRUE, Hoedown works like expected, but all the other intra emphasis don't display. For example when there is italic in combination with bold.

The fact is that when NO_INTRA_EMPHASIS = TRUE a single word is not considered a single word anymore. IMHO this makes no sense at all. A style MUST be applied to a single word, and call_user_func is a single word, not a phrase composed by three different words. Even calluserfunc on GitHub is rendered properly, but Hoedown doesn't.

A style symbol must be closed to the word (or a phrase) and must contain any other style symbol. For example you should be able to write:

**today is a ~~good~~ day *Filippo*, please use call_user_func() or don't use ~~call_user_func()~~**

today is a good day Filippo, please use call_user_func() or don't use call_user_func()

New Python binding.

The current Python binding segfaults on OS X and hasn't seen any activity in several months. Due to this, I decided to give it a go and wrote a new binding. I'm opening this issue as I'm not sure how you'd like to handle the bindings wiki page.

Markdown AST inquiry

Gentlemen,
first let allow me thank you for your effort with Hoedown. What you have done so far is truly remarkable!

I am running my own fork of Sundown that extends it in two major ways:

  1. Adds three additional "rendering" hooks to allow me build parsed Markdown abstract syntax tree (AST) instead of rendering
  2. Adds source maps so you can tell where in the Markdown source document the block being renders comes from

This two "extensions" allow me to build the Markdown AST in memory so it can be processed later by some other tools (e.g. the API Blueprint Parser in my case).

My question is: Would you care about such a contribution to the Hoedown project so it can be used to build Markdown ASTs?

If so, should Hoedown just support building Markdown ASTs thanks to sufficient hooks and source maps or should it also offer a full AST on its own?

Thank you for consideration.

Use of C99 features

From the README.md:

[...] standard C99 that builds everywhere.

If Hoedown is C99, then I suppose there should be no problem with using the bool type with #include <stdbool.h>. Would increase code readability and, you know, a bool takes less memory than an int.

According to this answer:

[...] will work only if you use C99 and it's the "standard way" to do it. Choose this if possible.

Should we transfer all boolean uses in Hoedown to bool?
Would there be any problems of compatibility?

Support manual IDs

As a feature, we could support PHP-extra style IDs for headers, tables, etc:

## How to use    {#usage}
<h2 id="usage">How to use</h2>

It was somehow proposed in #63, we should look into it.
That would solve the problem with autogenerated header IDs we discussed a while ago.

Footnotes won't render if there's no link callback

As per this optimization, footnotes won't be parsed unless there's a link callback defined.

Moreover, the method should be refactored, since char_link can parse links, images and footnotes. It'd be better named char_bracket for example.

When will you release new version?

I want to modify my product to use the new API.
When will you tag the new version?

(I know this is not an issue. But I don't find any channel to ask the question.)

Cannot report security vulnerabilities

The README file suggests filing a Github Issue for a security vulnerability. This is the wrong way to handle security vulnerabilities -- they should be reported discreetly, directly to the developer(s) of the project. If a vulnerability is reported in a public forum like an Issue, the vulnerability immediately becomes a 0-day, and attackers have all the information they need to exploit the vulnerability in the wild. See http://alexgaynor.net/2013/oct/19/security-process-open-source-projects/ for more information.

Pipe character can not be used within tables

I am writing this assuming that hoedown is what powers the Github Markdown since I discovered this issue affects it. The only way around this issue as you can see is to inline HTML which is not great.

Source:

Foo | Bar | State
------ | ------ | -----
`Code | Pipe` | Broken | Happy
`Escaped Code \| Pipe` | Broken | Happy
Escaped \| Pipe | Broken | Happy

Output:

Foo Bar State
`Code Pipe` Broken
Escaped Code | Pipe Broken Happy
Escaped Pipe Broken

Expected:

Foo Bar State
Code | Pipe Broken Happy
Escaped Code &#124; Pipe Broken Happy
Escaped | Pipe Broken Happy

Problem when a link ends with )

The following markdown text is not properly converted:

The history of [Phil Fish](https://en.wikipedia.org/wiki/Phil_Fish_(video_game_developer)).

This is the output. As you can see there is one more parenthesis:

The history of Phil Fish)

Those links are common in Wikipedia.

Since you must have a line break, or a space after an url, or another url, or still you have reached the end of the document, to solve this issue you must consider the last round bracket, like the parenthesis that closes the link.

The following markdown link [foo](http://example.com/foo_(bar), should be translated to http://example.com/foo_(bar. Using this approach every wikipedia link just works as is.

Both Marked and the dead Sundown have this bug, but Showdown doesn't.

Variable indent on bullets can cause nested UL element

Here's a sample bit of markdown:

List items can wrap across multiple lines:

  * This is the first bullet,
    which is longer than one line.
  * This is the second bullet,
which is also on multiple lines.
* You don't have to indent the bullet
But there must be a space following the bullet marker.

This should result in a single UL element with three LI elements inside, but the third bullet is wrapped in another UL as well.

Unused parameters

If you remove the -Wno-unused-parameter switch, you get hundreds of warnings about unused arguments. We should review and maybe fix some of them.

Segfault on large input

I tried running ./hoedown 100MB-file.md just out of curiosity and got a segfault. gdb seems to think it's happening in a call to memcpy every time I run it, which suggests it's an out-of-bounds write to a buffer.

I haven't had chance to debug it much, but grepping around in the code turns up a few unchecked errors, e.g. line 2745 of src/markdown.c calls hoedown_buffer_grow (which calls realloc) and proceeds without checking the return value.

What's the policy on fixing this kind of thing? Using assert/abort/xmalloc in the library or returning an error code to the caller?

toc_data.nesting_level in HTML renderer

What is the best way to set options->toc_data.nesting_level in hoedown_html_renderer?
options->toc_data.nesting_level is needed for generating <a id="..."> tag in HTML generation.

My suggestion is modifying hoedown_html_renderer's signature to following:

void hoedown_html_renderer(struct hoedown_callbacks *callbacks, struct hoedown_html_renderopt *options, unsigned int render_flags, int nesting_level)

Source maps in Hoedown

Derived from #22.

The renderers should be given the position of the block they're rendering.
An easy and low-level way to do that would be to pass a size_t pos as
last argument to the callbacks where possible, indicating the position of
the block in the input buffer.

That would however make callbacks longer and most of the time this
feature isn't gonna be used.

vsnprintf may return a different value (-1) under Windows

Because vsnprintf under Windows returns -1 to indicate that the output has been truncated (see http://msdn.microsoft.com/en-us/library/1kt27hek(v=vs.110).aspx for details), hoedown_buffer_printf just returns at https://github.com/hoedown/hoedown/blob/master/src/buffer.c#L175 when we need to grow the buffer. The following markdown text is from a test for a Perl binding ( https://github.com/tokuhirom/Text-Markdown-Hoedown/blob/master/t/02_toc.t#L7 ), which is failing to render properly under Windows now.

#1
## 1.1
### 1.1.1
## 1.2
### 1.2.1
#2
## 2.2

Unexepected result mixing bold and italic

Using **bold and *italic*** won't work like expected. In fact, you obtain *bold and italic instead of bold and italic.

There are probably many other cases this won't work. As you can see GitHub works fine.

Add GitHub convenience defines

Most people exposed to markdown will know the flavour from Stack Overflow or GitHub. It would be nice if the library included predefined HOEDOWN_EXT_GITHUB and HOEDOWN_HTML_GITHUB defines which already provided the needed flags to mimic the style allowed on Github.

Rename API functions

Rename all instances of the sd_* functions to either hd_* or hoedown_* (this will break API compatibility, which is a good thing).

Executable should parse options

Derived from #19. It would be great if the executable parsed
options as extensions, rendering flags and renderers.

Example: hoedown --fenced-code-blocks --tables my.markdown

While that would increase the complexity of the code
as an example, it'd show how to pass options to the
parser / renderer.

Also, --version and --smartypants.

The problem with HTML

Not renderer independent

Hoedown describes itself as renderer agnostic. However, this is not true: we have two callbacks exclusively dedicated to HTML (blockhtml and raw_html_tag). So the parser actually parses HTML.

This can be problematic when writing renderers for output other than HTML. Imagine I'm writing a LaTeX renderer. < and > are not special symbols for LaTeX and it's easy to find them in LaTeX sources (for example, in math comparisons).

In that case, I'd expect normal_text to parse any text found in my Markdown, no matter if it contains HTML-sensitive characters.

Stateless parsing

But even then, Hoedown doesn't parse HTML as it deserves. If you're using Hoedown in production, inputs like this:

Some regular text. </p></div> <div class="comment"><div class="user">admin</div> I'm the admin and you should shut up.</div>

will break out of its containers and manipulate the page in unexpected ways (they could make other comments appear as if the administrator had entered them).

Or this:

Meaning of: <img src="nothing" onerror="alert('LOOK!')">

Sure, Hoedown can have SKIP_STYLES and all that, but you'll still need a postprocessor on top of it, if you're using it in production.

Conclusion

IMO it's better to remove all these HTML-specific callbacks and SKIP_XXX flags.
They are misleading and will fool users into thinking they are secure, when they are not. It's better to not provide any "security" features and recommend people to use a good postprocessor.

Support for tables

Hello,

Here is my code:

sub markdown {
    my $markdown_text = shift;
    load_class 'Text::Markdown::Hoedown';
    return Text::Markdown::Hoedown::markdown(
        $markdown_text,
        extensions => int( 0
            | Text::Markdown::Hoedown::HOEDOWN_EXT_TABLES
            | Text::Markdown::Hoedown::HOEDOWN_EXT_AUTOLINK
            | Text::Markdown::Hoedown::HOEDOWN_EXT_FENCED_CODE
            | Text::Markdown::Hoedown::HOEDOWN_EXT_NO_INTRA_EMPHASIS
        ),
        html_options => int( 0
            | Text::Markdown::Hoedown::HOEDOWN_HTML_SKIP_STYLE
        ),
    );
}

Here is failed spec for tables extension:

#!/usr/bin/env perl

use Test::Spec;

use lib::abs qw( ../../../../lib );

describe "SRX::Util::Markdown" => sub {
    before all  => sub {
        use_ok "SRX::Util::Markdown", "markdown";
    },
    it "should parse Markdown" => sub {
        like(
            markdown( '_yo_ __yo__' ),
            qr{<em>yo</em> <strong>yo</strong>},
        );
    },
    it "should parse Markdown Tables" => sub {
        like(
            markdown( "\n| a | b |\n| - | - |\n| c | d |\n\n" ),
            qr{<table>},
        );
    },
    it "should parse Markdown Code" => sub {
        like(
            markdown( "```package Def;\nsub {}\n```" ),
            qr{<code>},
        );
    },
};

runtests unless caller;

Fail:

not ok 3 - SRX::Util::Markdown should parse Markdown Tables
#   Failed test 'SRX::Util::Markdown should parse Markdown Tables'
#   at t/lib/SRX/Util/Markdown.t line 18.
#                   '<p>| a | b |
# | - | - |
# | c | d |</p>
# '
#     doesn't match '(?^:<table>)'

Use tests from mdtest

The mdtest suite seems like it could be useful. It seems to have a bunch of extra tests for plain Markdown, plus some more for extensions that originated in Markdown Extra, e.g. tables. Any interest in incorporating it here somehow?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.