Giter VIP home page Giter VIP logo

markdown's Introduction

markdown's People

Contributors

bennn avatar greghendershott avatar leifandersen avatar lexi-lambda avatar samth avatar stchang avatar tfeb avatar wilbowma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

markdown's Issues

Parse error from header separated by many blank lines

[Reported by @stchang. Moving to its own issue.]

Getting a parse error for two headers separated by more than 3 lines.

> (parse-markdown "# abc\n\n\n\n## def")
parsack.rkt:345:0: parse ERROR: at <string>:5:1:10
unexpected: "non-empty input"
  expected: "end-of-file"
> (parse-markdown "# \n\n\n\n## ")
parsack.rkt:345:0: parse ERROR: at <string>:5:1:7
unexpected: "non-empty input"
  expected: "end-of-file"
> (parse-markdown "# \n\n\n\n\n## ")
parsack.rkt:345:0: parse ERROR: at <string>:6:1:8
unexpected: "non-empty input"
  expected: "end-of-file"
> (parse-markdown "# \n\n\n## ")
'((h1 ((id ""))) (p () " ##"))

Images not parsed correctly if the path contains space

Images links are not parsed correctly if the path contains space

![hello](hel lo.jpg)

will not produce an image sexpr.

The commonmark spec says that the path to the image should conform to the same rules as a regular link. If you want a space in a regular link you should encapsulate it with <>, which would make the above example

![hello](<hel lo.jpg>)

which looks stupid, and also does not work with racket markdown.

Conforming to some kind of spec is of course nice, but wouldn't doing the expected thing (rendering the image anyway) also be nice? GitHub markdown does what I would expect (both examples work), pandoc as well.

Best regards
Linus

performance experiments

I've been trying to improve the performance of the markdown parser through parsack improvements but haven't had much luck so far. I think it's just the bracketed-style of the grammar leads to too much backtracking, which my parser doesn't handle well.

That did give me an idea to profile some examples (ie your tests) to see if I could improve the ordering on any of the <or> options, to cut down on the backtracking.

For example the following changes produces a decent speedup of perf-test.rkt:

  1. Flip the choices in $normal-char.
  2. Remove the singleton <or> in source-url.
  3. In $inline, drop $smart-punctuation down to above $code.
  4. In $inline, drop the 4+ ... down to above $footnote-ref.

Obvious caveats:

  1. I don't know how representative perf-tests.rkt is.
  2. I know sometimes the order of the <or> choices matters, when the choices are not disjoint, so I don't know if I've actually messed something up. But all the tests do pass.
  3. I do occasionally get a perf warning from random-test.rkt. I guess this is unsurprising since random-test emphasizes special chars?

Some numbers:

Before: non-strict: 3200, strict: 3340
After: non-strict: 2598, strict: 2547

Have you played around with the <or> ordering any?

Parsing HTML can omit spaces

Parse:

<p>X <a>Y</a> Z</p>

Expected:

(p () "X " (a () "Y") " Z")

Actual:

Note lack of spaces:

(p () "X" (a () "Y") "Z")

which as HTML is:

<p>X<a>Y</a>Z</p>

Newlines not preserved in <pre> blocks

<pre>...</pre> blocks need to be detected early on -- as blocks, and before any line-break consolidation that is acceptable or desirable for intra-block elements. (In other words, the fix is not to detect all HTML elements earlier on.)

Parsing after footnote

[^def]: Definition of footnote def.


aaaaa

Fail to parse because $footnode-def don't eat the two #\newline (don't know if it's responsability).

diff --git a/markdown/parse.rkt b/markdown/parse.rkt
index 30f2b85..60d29ee 100644
--- a/markdown/parse.rkt
+++ b/markdown/parse.rkt
@@ -805,7 +805,7 @@
             (optional $indent)
             (xs <- (sepBy $raw-lines
                           (try (>> $blank-line $indent))))
-            (optional $blank-line)
+            (many $blank-line)
             (return
              (begin
                (on-footnote-def! label (string-join xs "\n"))

Solve the problem.

can't nest styles

I can't nest code in bold, italic in bold, bold in italic, etc. It causes the outer style to fail to be interpreted.

Generate scribble?

It would be truly awesome if this could generate scribble data types, and maybe it wouldn't be so hard, given the xexpr you're already generating.

Reference links with non-string labels not matched

A link with an empty URI like this:

[label][]

Is a reference link. It must be defined elsewhere in the Markdown:

[label]: www.example.com/

Now that we allow arbitrary markdown in link labels (see #5), if a reference link label is something that's not a simple string, for example the following as a result of adding smart quotes (see #13 ):

[some 'quotes' here][]
...
[some 'quotes' here]: www.example.com/

Then the reference is unresolved, because the reference link definition is still trying to match on the original string "some 'quotes' here", as opposed to what it is after running it through intra-block, namely '("some " lsquo "quotes" rsquo " here").

Input for which parser doesn't terminate

Discovered via randomized testing:

#lang racket

(require markdown)

(define input @~a{___*  '   "[<br />

<br />        ipsum'<div>"(<    ")_<]<div>[**<br /> **lorem*>***__```]```***"ipsum
<br />'*)***ipsum___    <br />***    *lorem___*ipsum**<br /></div><div> _</div>    <ipsumipsum &__
ipsum_[   <div>__<div>&ipsum(


(***<div>lorem`]  ___ </div>&lorem("**[ipsum__&   *

loremlorem*<br />lorem]``')___lorem&___''   ``">



```lorem<div>**    </div>[ lorem`]ipsum``<br />**&


__lorem   <br />lorem`*lorem  </div>lorem_ 'lorem```
})

(parse-markdown input)

Doesn't terminate (for at least 30 seconds).

can't nest literal HTML entities in link

I was trying to do:

[\[\[Stuff\]\]](stuff.html)

but since (as Issue #8 says) I can't escape brackets, I tried using the literal HTML entities &#91; and &#93; instead, but this caused the link to fail to be correctly interpreted. I worked around it with literal HTML, as always.

Questions about some corner cases

I'm trying to set up random testing of markdown inputs as well. I still have some kinks to work out but it found one case that I thought I would bring up.

A trailing space at the end of the input gets parsed as br but as " " when it's not at the end. Is this expected?

> (parse-markdown "![UoFEK](pFivY) ")
'((img ((src "pFivY") (alt "UoFEK"))) (br ()))
> (parse-markdown "![UoFEK](pFivY) [ToFEwaE](VYiYppJd)")
'((img ((src "pFivY") (alt "UoFEK"))) " " (a ((href "VYiYppJd")) "ToFEwaE"))

fenced code block close-marker is sensitive to white space

The second example has an extra space at the end and therefore doesn't parse properly. Is this the intended behavior?

$ racket
Welcome to Racket v6.0.0.3.
-> (require markdown)
-> (parse-markdown "```racket\n(define x 10)\n```")
'((pre ((class "brush: racket")) (code () "(define x 10)")))
-> (parse-markdown "```racket\n(define x 10)\n``` ")
'((p () (code () "racket\n(define x 10)")))

Open HTML tag at start of a long file causes parser to take forever

I wrote a long blog post using Frog and left an unclosed HTML tag towards the front of the file (<TODO>). When trying to convert the file to HTML the parser hung. It sent my CPU to 100% and didn't finish after 5 min. The memory usage was oscillating around 200MB so it appears to be doing a lot of backtracking.

This sounds related to issue #43 but wanted to provide an test case I came across.

Self-closing HTML tags aren't being recognized as HTML

This:

<img src="img/yunocoros.jpg"/> The topic of coroutines (or
fibers, or continuations) for JavaScript comes up from time to time

does not preserve the <img src="/img/yunocoros.jpg"/> HTML.

The problem seems to be with "self-closing" (if that's the correct terminology) tags.

<img src="foo" />

does not work, whereas

<img src="foo"></img>

does work

cf greghendershott/frog#20

Double-hyphen in URL

Example:

Jay McCarthy posted about a macro to do a [`case` with `break` in Racket](http://jeapostrophe.github.io/2013-06-24-cas-cad--post.html).

Creates an invalid xexpr:

'(p () "Jay McCarthy posted about a macro to do a " (a () ((href "http://jeapostrophe.github.io/2013-06-24-cas-cad" (span mdash) "post.html")) (code () "case") " with " (code () "break") " in Racket") ".")

Note the (span mdaash) in the href attribute, from the -- in the URI.

parser cannot handle Windows-style line breaks

I believe the parser does not handle Windows-style line breaks, ie "\r\n", properly. For example, all the tests that rely on test.md fail in Windows.

Here is another example:

> (parse-markdown "[test][1]\n\n[1]:http://test.com \"test-title\"")
'((p () (a ((href "http://test.com") (title "test-title")) "test")))
> (parse-markdown "[test][1]\r\n\r\n[1]:http://test.com \"test-title\"")
Reference link not defined: (linkref "1")
'((a ((href "")) "test") "\r \r [1]:http://test.com " ldquo "test-title" rdquo)

Undeclared dependency

raco setup: --- summary of missing dependencies ---
raco setup: undeclared dependency detected
raco setup:   for package: "markdown"
raco setup:   on packages:
raco setup:    "base"
raco setup:    "sandbox-lib"
raco setup:    "scribble-lib"
raco setup:    "srfi-lite-lib"

Parsing of not-quite-malformed Markdown takes a long time

When the markdown parser parses the text below, it takes a long time to do it:

% time .../m2h try-md.md 
Doing try-md.md
'#(#<void>)
.../m2h try-md.md  6.68s user 0.17s system 100% cpu 6.836 total
%

(the m2h script does little more than parse the markdown and then dump it to HTML).

Now, the text here is obviously not very good markdown – it's just a list of filenames – but it is just text, and I would have expected the parser to broadly cope with it, even if the result isn't pretty. The case this is a reduction of consisted of about double this number of lines, and I thought the parser had crashed. Is there perhaps something O(n^2) happening here?

Now, one response to ‘the parser is slow when I give it rubbish markdown’ is ‘well, don't do that, then’. That is perfectly fair. But if there's some way of getting the parser to give up gracefully, that would be nice.


font-util-1.3.0_1   Create an index of X font files in a directory
fontconfig-2.11.0_1,1 XML-based font configuration API for X Windows
fontsproto-2.1.2    Fonts extension headers
freetype2-2.5.2     Free and portable TrueType font rendering engine
gdk-pixbuf2-2.28.2  Graphic library for GTK+
gettext-0.18.3.1    GNU gettext package
git-1.8.5.2         Distributed source code management tool
glib-2.36.3_1       Some useful routines of C programming (current stable versi
gmake-3.82_1        GNU version of 'make' utility
gnome_subr-1.0      Common startup and shutdown subroutines used by GNOME scrip
gnomehier-3.0       A utility port that creates the GNOME directory tree
gobject-introspection-1.36.0_2 Generate interface introspection data for GObject libraries
graphite2-1.2.4     Rendering capabilities for complex non-Roman writing system
gtk-update-icon-cache-2.24.22 Gtk-update-icon-cache utility from the Gtk+ toolkit
gtk3-3.8.8          Gimp Toolkit for X11 GUI (current stable version)
harfbuzz-0.9.25     OpenType text shaping engine
hicolor-icon-theme-0.12 A high-color icon theme shell from the FreeDesktop project
icu-50.1.2          International Components for Unicode (from IBM)
inputproto-2.3      Input extension headers
intltool-0.50.2     Tools to internationalize various kinds of data files
jasper-1.900.1_12   An implementation of the codec specified in the JPEG-2000 s
jbigkit-1.6         Lossless compression for bi-level images such as scanned pa
jpeg-8_4            IJG's jpeg compression utilities
kbproto-1.0.6       KB extension headers
lcms2-2.5           Accurate, fast, and small-footprint color management engine
libICE-1.0.8,1      Inter Client Exchange library for X11
libSM-1.2.2,1       Session Management library for X11
libX11-1.6.2,1      X11 library
libXau-1.0.8        Authentication Protocol library for X11
libXcomposite-0.4.4,1 X Composite extension library
libXcursor-1.1.14   X client-side cursor loading library
libXdamage-1.1.4    X Damage extension library
libXdmcp-1.1.1      X Display Manager Control Protocol library
libXext-1.3.2,1     X11 Extension library
libXfixes-5.0.1     X Fixes extension library
libXfont-1.4.7,1    X font library
libXft-2.3.1        Client-sided font API for X applications
libXi-1.7.2,1       X Input extension library
libXinerama-1.1.3,1 X11 Xinerama library
libXrandr-1.4.2     X Resize and Rotate extension library
libXrender-0.9.8    X Render extension library
libXt-1.1.4,1       X Toolkit library
libXtst-1.2.2       X Test extension
libcheck-0.9.11     Unit test framework for C
libgcrypt-1.5.3     General purpose crypto library based on code used in GnuPG
libgpg-error-1.12   Common error values for all GnuPG components
libiconv-1.14_1     A character set conversion library
libpthread-stubs-0.3_4 This library provides weak aliases for pthread functions
libtool-2.4.2_2     Generic shared library support script
libxcb-1.9.3        The X protocol C-language Binding (XCB) library
libxml2-2.8.0_3     XML parser library for GNOME
libxslt-1.1.28_1    The XSLT C library for GNOME

`(\_ -> ...)` does not generate <code> correctly

Haskell lambda function does not generate <code> correctly, it works on Mou markdown editor and Github.

This is Haskell lambda `(\_ -> ...)` code.

Result:
This is Haskell lambda (_ -> ...) code.

Github result:
This is Haskell lambda (\_ -> ...) code.

I read the code, when I put code before escape in function intra-block, it worked fine :).

Possibly incorrect html emitted with script tag

See what happens to the async attribute.

➜  markdown git:(master) racket markdown/main.rkt
<script async class="speakerdeck-embed" data-id="f0b571b0759a0131f0bd026a5a2b7ed1" data-ratio="1.33333333333333" src="//speakerdeck.com/assets/embed.js"></script>
<!DOCTYPE html>^D
<html>
 <head>
  <meta charset="utf-8" /></head>
 <body>
  <script async="async" class="speakerdeck-embed" data-id="f0b571b0759a0131f0bd026a5a2b7ed1" data-ratio="1.33333333333333" src="//speakerdeck.com/assets/embed.js"></script></body></html>%

Block image should parse links in label

(parse-markdown "![[A _label_](/url/)](/png/)")
; Actual =>
'((div ((class "figure"))
       (img ((src "/png/")
             (alt "[A _label_](/url/)")))
       (p ((class "caption"))
          "[A " (em () "label") "](/url/)")))
; Expected => 
'((div ((class "figure"))
       (img ((src "/png/")
             (alt "[A _label_](/url/)")))
       (p ((class "caption"))
          (a ((href "/url/")) "A " (em () "label")))))

Bold/italic in the middle of words does not work: should it?

If you type something like **fr**ozen bl**og** then I think you should get frozen blog: you currently get something with embedded asterisks instead (this is using markdown in frog of course). I don't know what the markdown standard, as far as there is one, says about this.

I think this used to work (at least, when I wrote that markup originally I assume I checked the output!).

This is a non-major issue.

No license file

Unlicensed software is nonfree software... could we get a COPYING? :)

Support MathJax delimiters

In the sense that text between \[ and \] or between \( and \) is used literally, not parsed as markdown.

See greghendershott/frog#129 (comment)

Because \ already has a meaning in markdown -- for example \[ means a literal [, not e.g. part of markdown syntax for a link -- this may need to be \\[ and so on.

can't escape brackets

I believe markdown allows escaping square brackets with a backslash but this doesn't work.

Footnote definitions not appearing in numerical order

Footnote definitions should appear in numerical order, but currently they are appearing in the order they are defined in the Markdown text.

Another way to put it: Footnotes definitions should appear in the order of their usage not of their definition.

Example:

A usage[^foo] and another[^bar].

[^bar]: Bar note.

[^foo]: Foo note.

The foo note will be number 1 and bar will be number 2. The footnotes should appear in that order.

Obviously someone can work around this by reordering the footnote definitions, but it should work correctly especially when using the label variant.

can't nest images in links

I've tried nesting images in links either with:

[![foo](foo.jpg)](foo.html)

or

[<img src="foo.jpg"/>](foo.html)

and both cause the outer brackets to fail to be treated as a link. I can work around with literal HTML.

Backtick (a.k.a. quasiquote) in fenced code block

(parse-markdown "```racket\n'(foo)\n```")
;; => '((pre ((class "brush: racket")) (code () "'(foo)"))) 
;; good

(parse-markdown "```racket\n`(foo)\n```")
;; => '((p () (code () "racket\n`(foo)")))
;; bad

First glance, I don't understand why. $verbatim/fenced is using $any-line which should not be affected by the ```. Somehow it is getting parsed as an inline $code, instead.

Allow configurability of the xexprs generated by the Markdown parse

Specific case first:

The Markdown library currently generates (h1 () ...) for Markdown ==== headers, and (h2 () ...) for ---- headers. For my application I'd prefer these to be (h2 () ...) and (h3 () ...) respectively. That's easy to fix, since I can just walk the tree adjusting them post-parse. However it would be neater if I could ask the parser to do that for me.

I see that this overlaps with the discussion of tree-walking, and its costs, in pull request #48, so might relate to that.

Looking at $setext-heading/para/plain in parse.rkt, I see that this h1/h2 interpretation is implemented by a (match ...) expression, so that looks like a natural location for a parameter, but you'll have a better idea than me of how expensive that would be.

There's a potential more general point, in that there are other places where it looks natural for a user to impose a different interpretation on the parse – functions $_emph and $_strong – but those cases are more marginal, and it might be that the sort of configurability I'm suggesting above is not worth generalising.

observe `xexpr-drop-empty-attributes`?

The xexpr-drop-empty-attributes parameter in xml/xexpr controls whether empty attribute lists are omitted from X-expressions.

parse-markdown makes X-expressions, but does not observe this parameter. Can it? Should it?

(parameterize ([xexpr-drop-empty-attributes #t])
  (parse-markdown "I am _emph_ and I am **strong**."))

> '((p () "I am " (em () "emph") " and I am " (strong () "strong") "."))

;; vs. '((p "I am " (em "emph") " and I am " (strong "strong") "."))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.