Giter VIP home page Giter VIP logo

mistletoe's Introduction

mistletoe

Build Status Coverage Status PyPI is wheel

mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable.

Apart from being the fastest CommonMark-compliant Markdown parser implementation in pure Python, mistletoe also supports easy definitions of custom tokens. Parsing Markdown into an abstract syntax tree also allows us to swap out renderers for different output formats, without touching any of the core components.

Remember to spell mistletoe in lowercase!

Features

  • Fast: mistletoe is the fastest implementation of CommonMark in Python. See the performance section for details.

  • Spec-compliant: CommonMark is a useful, high-quality project. mistletoe follows the CommonMark specification to resolve ambiguities during parsing. Outputs are predictable and well-defined.

  • Extensible: Strikethrough and tables are supported natively, and custom block-level and span-level tokens can easily be added. Writing a new renderer for mistletoe is a relatively trivial task.

    You can even write a Lisp in it.

Output formats

Renderers for the following "core" output formats exist within the mistletoe main package:

  • HTML
  • LaTeX
  • AST (Abstract Syntax Tree; handy for debugging the parsing process)
  • Markdown (Can be used to reflow the text, or make other types of automated changes to Markdown documents)

Renderers for the following output formats can be found in the contrib package:

  • HTML with MathJax (mathjax.py)
  • HTML with code highlighting (using Pygments) (pygments_renderer.py)
  • HTML with TOC (for programmatical use) (toc_renderer.py)
  • HTML with support for GitHub wiki links (github_wiki.py)
  • Jira Markdown (jira_renderer.py)
  • XWiki Syntax (xwiki20_renderer.py)
  • Scheme (scheme.py)

Installation

mistletoe is tested for Python 3.5 and above. Install mistletoe with pip:

pip3 install mistletoe

Alternatively, clone the repo:

git clone https://github.com/miyuchina/mistletoe.git
cd mistletoe
pip3 install -e .

This installs mistletoe in "editable" mode (because of the -e option). That means that any changes made to the source code will get visible immediately - that's because Python only makes a link to the specified directory (.) instead of copying the files to the standard packages folder.

See the contributing doc for how to contribute to mistletoe.

Usage

Usage from Python

Here's how you can use mistletoe in a Python script:

import mistletoe

with open('foo.md', 'r') as fin:
    rendered = mistletoe.markdown(fin)

mistletoe.markdown() uses mistletoe's default settings: allowing HTML mixins and rendering to HTML. The function also accepts an additional argument renderer. To produce LaTeX output:

import mistletoe
from mistletoe.latex_renderer import LaTeXRenderer

with open('foo.md', 'r') as fin:
    rendered = mistletoe.markdown(fin, LaTeXRenderer)

To reflow the text in a Markdown document with a max line length of 20 characters:

import mistletoe
from mistletoe.markdown_renderer import MarkdownRenderer

with open('dev-guide.md', 'r') as fin:
    with MarkdownRenderer(max_line_length=20) as renderer:
        print(renderer.render(mistletoe.Document(fin)))

Finally, here's how you would manually specify extra tokens via a renderer. In the following example, we use HtmlRenderer to render the AST. The renderer itself adds HtmlBlock and HtmlSpan tokens to the parsing process. The result should be equal to the output obtained from the first example above.

from mistletoe import Document, HtmlRenderer

with open('foo.md', 'r') as fin:
    with HtmlRenderer() as renderer:     # or: `with HtmlRenderer(AnotherToken1, AnotherToken2) as renderer:`
        doc = Document(fin)              # parse the lines into AST
        rendered = renderer.render(doc)  # render the AST
        # internal lists of tokens to be parsed are automatically reset when exiting this `with` block

Important: As can be seen from the example above, the parsing phase is currently tightly connected with initiation and closing of a renderer. Therefore, you should never call Document(...) outside of a with ... as renderer block, unless you know what you are doing.

Usage from command-line

pip installation enables mistletoe's command-line utility. Type the following directly into your shell:

mistletoe foo.md

This will transpile foo.md into HTML, and dump the output to stdout. To save the HTML, direct the output into a file:

mistletoe foo.md > out.html

You can use a different renderer by including the full path to the renderer class after a -r or --renderer flag. For example, to transpile into LaTeX:

mistletoe foo.md --renderer mistletoe.latex_renderer.LaTeXRenderer

and similarly for a renderer in the contrib package:

mistletoe foo.md --renderer mistletoe.contrib.jira_renderer.JiraRenderer

mistletoe interactive mode

Running mistletoe without specifying a file will land you in interactive mode. Like Python's REPL, interactive mode allows you to test how your Markdown will be interpreted by mistletoe:

mistletoe [version 0.7.2] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
>>> some **bold** text
... and some *italics*
...
<p>some <strong>bold</strong> text
and some <em>italics</em></p>
>>>

The interactive mode also accepts the --renderer flag:

mistletoe [version 0.7.2] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
Using renderer: LaTeXRenderer
>>> some **bold** text
... and some *italics*
...
\documentclass{article}
\begin{document}

some \textbf{bold} text
and some \textit{italics}
\end{document}
>>>

Who uses mistletoe?

mistletoe is used by projects of various target audience. You can find some concrete projects in the "Used by" section on Libraries.io, but this is definitely not a complete list. Also a list of Dependents is tracked by GitHub directly.

Run mistletoe from CopyQ

One notable example is running mistletoe as a Markdown converter from the advanced clipboard manager called CopyQ. One just needs to install the Convert Markdown to ... custom script command and then run this command on any selected Markdown text.

Why mistletoe?

"For fun," says David Beazley.

Further reading

Copyright & License

mistletoe's People

Contributors

alexkolson avatar allets avatar anderskaplan avatar andy0130tw avatar asb avatar averms avatar cctile avatar choeppler avatar chrisbresten avatar chrisjsewell avatar derekn avatar doerwalter avatar fanduzi avatar fqxp avatar franferrax avatar freakwill avatar grollicus avatar huettenhain avatar joel-coffman avatar liuq avatar lordfirespeed avatar miyuchina avatar nijel avatar nikolas avatar not-my-profile avatar pbodnar avatar rogdham avatar sebqq avatar sglyon avatar vallentin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mistletoe's Issues

Normalising Line-Endings

When \r\n is used for line endings, the behaviour can be a bit surprising:

>>> from mistletoe import Document, HTMLRenderer

>>> HTMLRenderer().render(Document("foo  \r\nbar"))
'<p>foo  \r\nbar</p>\n'

>>> HTMLRenderer().render(Document("foo  \nbar"))
'<p>foo<br />\nbar</p>\n'

I stumbled on this today and I thought I'd report it. Not sure what the correct behaviour should be though. I only found it mentioned here: https://talk.commonmark.org/t/carriage-returns-and-code-blocks/2519

URLs seem to prevent code blocks from closing properly

Hey everyone! I have recently converted a markdown document to Jira format with mistletoe and ran into the following bug. Compiling this document:

# URL Problem
```
http://google.com/
``` 
this is not part of the code block.

yields this Jira code:

h1. URL Problem
{code:}
http://google.com/
``` 
this is not part of the code block.
{code}

I am not sure why that is.

Multi-level Lists

Multi-level lists don't parse properly. I only ever get two levels deep.

import mistletoe

mdtext = """
- a
    - b
        - c
            - d
        - e
    - f
"""

htmltext = mistletoe.markdown(mdtext)
print(htmltext)

yields:

<ul>
<li>a
<ul>
<li>b
</li>
<li>c
</li>
<li>d
</li>
<li>e
</li>
<li>f
</li>
</ul>
</li>
</ul>

There should be a <ul> prior to 'c' and 'd'. Tested with the latest dev branch (0.6.2).

Thanks for the hard work.

Extracting content by intercepting render_raw_text

Thanks for this nice project.

I may be a noob in this, but I was able to parse a readme get the returned content wrapper in their html elements. But the returned content is one giant text. So, my question is: is there any built-in function to extract only contents but not codes? as in get only data from ..<p></p> or <h5></h5> ...

Inline elements don't parse across newlines

When parsing Markdown, mistletoe doesn't parse inlines across line boundaries. Example source:

here is a file with *emph* and also
it will have *emph across
lines.*  uh-oh. Let's try **strong** and
also, of course, **strong across
lines.** uh-oh-uh-oh.

And example output:

<p>here is a file with <em>emph</em> and also
it will have *emph across
lines.*  uh-oh. Let&#x27;s try <strong>strong</strong> and
also, of course, **strong across
lines.** uh-oh-uh-oh.
</p>

Most parsers recognize inlines over newlines. It'd be nice if mistletoe did, too. Would it be as simple as setting a re flag?

Mistletoe plant and logo

Hi,

first of all, thanks for the tool: it's a piece of cake for markdown parsing (and also for custom rendering)!

Just for the sake of precision, I'd like to point out a common misconception about mistletoe, which is not the plant depicted in the logo. The plant of the logo is either the Ruscus aculeatus, also called butcher's-broom or christmas berry or the Ilex aquifolium, called christmas holly. Instead, the mistletoe is the Viscum album, which has white berries and plain thick leaves (not spiny as the other two plants).

I'd like also to make clear that letting you change the logo is not my purpose, but I want only to friendly make you aware of which is the “right“ mistletoe. :-)

All the best,

Luca

Fenced Code Blocks / CommonMark

Given the following markdown document:

This
```
code
block
```
does not convert properly.

Then the output of md2jira (for example) will be the following:

This
{{`}}
code
block
{{`}}
does not convert properly.

I am well-aware that triple backtick code blocks are not the original Markdown standard, but they have made it into common mark and I consider them very useful.

For me personally, this might be the only feature from common mark that I am missing, but would you be generally interested in implementing all of common mark or accept pull requests to that end?

Specify initial header level on HTML output

Hi,

Sorry if this is a silly question, but I'm trying to parse some markdown and embed it as html inside of a page on my web application. This page already has a couple levels of Header tags, so I'd like for the headings inside the markdown to start at <h3> rather than <h1>. Is it possible to pass in an initial header level to the HTMLRenderer?

Thanks!

Tiny followup on table shorthands

I think a bug might have been introduced in 566b234: Consider the following table:

### Test

 One | Two
 --- | ---
 A   | BC
 D   | ED

When I translate it to Jira, I get the following:

h3. Test
||A||B||
|A|B|
|D|E|

It seems to have removed the last character in every row because I left out the closing bar.

Calling children on RawText crashes

  File "/usr/local/lib/python3.5/dist-packages/mistletoe/span_token.py", line 96, in children
    if isinstance(self._children, GeneratorType):
AttributeError: 'RawText' object has no attribute '_children

ASTRenderer crashes on Tables with headers

I may have found a bug in ASTRenderer in generating tables, but perhaps I'm not doing something right. I'd deeply appreciate your help.

I tried the following script:

#!/usr/bin/env python3

import mistletoe, sys
from mistletoe.ast_renderer import ASTRenderer

filename = sys.argv[1]

with open(filename, 'r') as fin:
   rendered = mistletoe.markdown(fin,ASTRenderer)

With this markdown code:

|       | col 1    |
|-------|----------|
| row 1 | cell 1,1 |

And I get this error:

  File "test_mistletoe.py", line 9, in <module>
    rendered = mistletoe.markdown(fin, ASTRenderer)
  File "//anaconda/lib/python3.5/site-packages/mistletoe/__init__.py", line 19, in markdown
    return renderer.render(Document(iterable))
  1 #!/usr/bin/env python3
  File "//anaconda/lib/python3.5/site-packages/mistletoe/ast_renderer.py", line 15, in render
    return json.dumps(get_ast(token), indent=2) + '\n'
  File "//anaconda/lib/python3.5/json/__init__.py", line 237, in dumps
    **kw).encode(obj)
  File "//anaconda/lib/python3.5/json/encoder.py", line 200, in encode
    chunks = list(chunks)
  File "//anaconda/lib/python3.5/json/encoder.py", line 429, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "//anaconda/lib/python3.5/json/encoder.py", line 403, in _iterencode_dict
    yield from chunks
  File "//anaconda/lib/python3.5/json/encoder.py", line 324, in _iterencode_list
    yield from chunks
  File "//anaconda/lib/python3.5/json/encoder.py", line 403, in _iterencode_dict
    yield from chunks
  File "//anaconda/lib/python3.5/json/encoder.py", line 436, in _iterencode
    o = _default(o)
  File "//anaconda/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <mistletoe.block_token.TableRow object at 0x1093806d8> is not JSON serializable

Newlines are inserted into code tags

Parsing
`x=1`
yields

<code>
x=1
</code>

but the newline between the x=1 and the </code> means that when the HTML is rendered, there is a space (i.e. it looks like x=1[SPACE]).

HTMLRenderer parsing AutoLink as HTMLSpan

Broken in mistletoe 0.5.

The cause is that HTMLSpan is added in the first place of the token matching precedence. If we want AutoLink to work correctly, its precedence must be lower than AutoLink.

The difficulty fixing this is that even though span_token.add_token takes an optional position argument, it is called in BaseRenderer.__init__() and there's not a defined way for renderer subclasses to reach that. Plus it means increasing the complexity of HTMLRenderer, which, up to this point, has been serving as the example for custom renderer classes.

I'll have to think about it for a bit. Today is a great day for bugs it seems.

Crashes when rendering tables with headings

Broken in mistletoe 0.4.1. Spotted by @cctile in #10. Derp.

The cause is that since 0.4.1 token.children returns a list (or tuple), and HTMLRenderer and LaTeXRenderer tries to call next on the list (or tuple).

I could patch it the easy way, but the neater solution will change the Table class slightly, and make future render_table functions less gross.

Issue with read method in Paragraph class

Line 208 in block_token.py is line_buffer.append(next(line)). Variable "line" does not exist and crashes on that line when called. Changing line to lines fixes this problem.

lists should not contain <p>

Why do you use <p></p> in lists ? This is not the way Commonmark does it.

Other strange behavior: in the same document, some lists get the paragraphs, some others don't. Could you explain how it is supposed to work ?

Encoding problem ?

Thanks for sharing this project, which looks promising.

I've encountered this simple problem and I suspect it's an encoding issue. Do you have any idea ?? As you can see, headings are not properly parsed (# is sort of escaped ??).

Thank you for your help.

/Users/lionel/Public/Github/ThesisWebSite/mkdocs-material/docs/thesis/mistune/test.md
<p># Header level 1</p>
<p>sdlkjsdlkjsd</p>
<p>## Header level 2</p>
<p>Ceci est uyn texte en français avec des accenbtes éèàüö %% !?</p>
<p>### Header level 3</p>
<p>Text block with <strong>emphasis</strong> and a <a href="http://www.url.fr">url</a></p>
<p>* item 1</p>
<p>* item 2</p>
<p>* item 3</p>
---------------
<h1>Header level 1</h1>
<p>sdlkjsdlkjsd</p>
<h2>Header level 2</h2>
<p>Ceci est uyn texte en français avec des accenbtes éèàüö %% !?</p>
<h3>Header level 3</h3>
<p>Text block with <strong>emphasis</strong> and a <a href="http://www.url.fr">url</a></p>
<ul>
<li>item 1</li>
<li>item 2</li>
<li>item 3</li>
</ul>

My python code:

# coding: utf-8
import sys
import os
import re
import subprocess
import mistletoe
import mistune


fname = 'test.md'
currentdir =  os.path.dirname(os.path.abspath(__file__))
fullpath = currentdir + "/" + fname

print(fullpath)

f = open(currentdir + "/" + fname, 'r', encoding='utf-8')
filedata = ''
for line in f.readlines():
        filedata += line


mistletoerender = mistletoe.markdown(filedata)
print(mistletoerender)

print("\n---------------\n")

mistunerender = mistune.markdown(filedata)
print(mistunerender)

My test.md file:

# Header level 1

sdlkjsdlkjsd

## Header level 2

Ceci est uyn texte en français avec des accenbtes éèàüö %% !?

### Header level 3

Text block with **emphasis** and a [url](http://www.url.fr)

* item 1
* item 2
* item 3

Feature request: Less strict table parsing

Hey there!
To get the following table:

Column1 Column2
Data Data
Data Data

the Mistletoe parser would require the following code:

| Column1 | Column2 |
| ------: | :------ |
|    Data | Data    |
|    Data | Data    |

Many Markdown implementations I know allow the following:

| Column1 | Column2
| ------: | :------
|    Data | Data
|    Data | Data

and in fact, while typing this, I figured out that Github will even parse the following:

Column1 | Column2
------: | :------
   Data | Data
   Data | Data

I don't see a particular use for the latter, but the first one has often made it much easier for me to maintain visually appealing, ASCII-formatted tables where the last column contains content of very different lengths.

Would it be easy for mistletoe to support this as well? @miyuchina If this is not out of scope but you are busy, I can also try implementing this myself.

Only the fist inline code element is rendered in version 0.7

I am encountering an odd quirk in mistletoe 0.7:

C:\> mistletoe
[warning] readline library not available.
mistletoe [version 0.7] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
>>> This `code` and this `code` should all be `code`
... ^Z

<p>This <code>code</code> and this `code` should all be `code`</p>
>>>

As always, thanks a lot for the great library.

CommonMark compliance

Rationale

Starting from version 0.5, mistletoe will be striving towards spec compliance with CommonMark. If you have issues / pull requests regarding CommonMark compliance, I'd be happy to review them in a new thread.

I'm hoping spec compliance would give mistletoe more predictable behavior, and help stabilize the interface.

This issue will be used as a place for progress reports and meta-discussions. Once mistletoe starts passing more test cases, I'll add further documentations to record which tests are failing or ignored. All relevant links should be expected to appear in this issue; if I'm missing one, please comment below!

Drawbacks

John Gruber, the original author of Markdown, had a few qualms with CommonMark. Discussions can be found on the CommonMark forum here. Here are my personal concerns about pushing for spec compliance:

  1. CommonMark is strongly defined, whereas mistletoe wants to be flexible in its implementation, where users can influence the parsing and rendering process as much as they want.

  2. CommonMark has a limited functionality set, whereas mistletoe follows the rule of "sane defaults," and generally wants to support GitHub Flavored Markdown as well. Tables in pure Markdown, for example, are not supported in CommonMark, but are supported in mistletoe.

  3. To comply with the spec, CommonMark might impose implementation restrictions. I've had an unfounded suspicion that to pass all the test cases, one would have to more or less stick to the reference implementation, or the parsing strategy outlined in the appendix. This would, again, mean less flexibility for the users.

Due to reasons above, mistletoe might not achieve a 100% compliance with CommonMark, though it is still possible. In future versions I will still value flexibility over compliance. Divergence from the spec will be documented.

Test suite

A list of CommonMark specs, including the latest release, can be found here. I will be pulling a test suite from CommonMark's GitHub repo. Assuming you are in test/commonmark directory:

  • commonmark.json is the current test suite mistletoe is testing itself against, in JSON format;
  • running python3 commonmark.py tests mistletoe against commonmark.json, printing out failed test cases and a total count;
  • running ./spec.sh pulls down the latest test suite from CommonMark's repository.

The test suite might not be up-to-date. Let me know if so!

Miscellaneous

Relevant xkcd.

FootnoteLink removing trailing spaces

First of all, this is a great project and incidentally it offers the only way to have a decent Markdown to Jira converter. I encountered one tiny bug in the converter:

Assume the following markdown document:

Test [link] will remove space.

[link]: http://www.nullteilerfrei.de/

Then the output of md2jira will be the following:

Test [link|http://www.nullteilerfrei.de/]will remove space.

As you can see, there should be a space right after the link.

Code blocks inside of lists do not preserve whitespace if indented by 4 spaces

(All code snippets run inside of the mistletoe command)

If I have a list with a code block inside of it and the code block is indented exactly 2 spaces inside of the list item, everything works as expected:

>>> 1. item
... 
...   ```
...   code
...       indented
...   ```
... 
<ol>
<li>item
</li>
</ol>
<pre><code>code
    indented
</code></pre>

Note the spaces before "indented" in the code tag.

However, if the code block is indented 4 spaces inside of a list, the indentation is not preserved:

>>> 1. item
... 
...     ```
...     code
...         indented
...     ```
... 
<ol>
<li><p>item</p>
<pre><code>code
indented
</code></pre>
</li>
</ol>

Token precendence

I'm having difficulty with token precendence. In particular, I'm writing a Renderer that extends from HTMLRenderer, but I'd like to parse markdown inside of <aside> tags (i.e. <aside>*foo*</aside> would become <aside><strong>foo</strong></aside>). So I was hoping to write a custom token that would detect asides and render_inner on it. My custom token doesn't trigger though, because the <aside> *foo* </aside> is matched by the existing mistletoe HTML tokens first. I even tried overriding _tokens_from_module so that I could change the order of the tokens but that didn't seem to help.

Any ideas on how I could accomplish this?

Thanks!

compatibility with Python 2 💡

Hi, and thanks for this module.

I'm developing an API documentation tool for python, because I find Sphinx over-engineer, reSt less readable then Markdown, and using inspect instead of ast limiting.
Anyhow, for this project I was using Python-Markdown, but I'm bothered by critics toward Markdown lack of precise specs (some call it not semantic), Therefore I've been looking for a Markdown python module that implements Commonmark. Unfortunately Python-Markdown's creator is quite against it... point of view I guess.
I also need extensions to allow people to create, for example, admonitions like in Sphinx and reSt and this made Commonmark-py not a suitable candidate for my tool.

Now mistletoe seems to have all I need with one exception: It's only for Python3.
If I really want to make a documentation tool, I can't just ignore that a lot of people (me included) are still using python2.

So my question is very simple: Are you willing to make mistletoe compatible with both python2 and python3? Would you accept pushes to your repository to make this happen? Or are you completely against it?

I have already started this conversion. Most of the changes are really minimal.

Cheers,
Dan

Invalid code blocks rendering with comments

First of all, thanks for your library :-)

I found a bug regarding code blocks rendering.

If you have comments (starting with #) in a code block, it will not be rendered properly.

# Title

​​​​```python
# Some comment
foo = 'bar'
​​​​```

will be rendered as

<h1>Title</h1>
<p><code>`</code>python # Some comment</p>
<p>foo = &#x27;bar&#x27; <code>`</code></p>

However without the comment, it's rendered properly

<h1>Title</h1>
<pre>
<code class="lang-python">
foo = &#x27;bar&#x27;
</code>
</pre>

Markdown-to-Markdown renderer

Hi, great project! I selected it versus the alternatives because I want to render the Markdown back into MarkDown. Is there a simple pass type Renderer that will render it back to its original input form? (My larger use case is a want to edit nodes in the AST to do some programmatic improvements of user entered markdown). Cheers!

crash when rendering text between underscores

Steps to reproduce:

$ mistletoe
>>> a _b_
<CTRL-D>

Backtrace:

  File "/home/dmerej/.venvs/mistletoe/bin/mistletoe", line 11, in <module>
    load_entry_point('mistletoe', 'console_scripts', 'mistletoe')()
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/__main__.py", line 46, in main
    interactive()
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/__main__.py", line 34, in interactive
    print('\n' + mistletoe.markdown(contents), end='')  # dump output
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/__init__.py", line 18, in markdown
    return renderer.render(Document(iterable))
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 73, in render
    return self.render_map[token.__class__.__name__](token, footnotes)
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/html_renderer.py", line 184, in render_document
    return self.render_inner(token, token.footnotes)
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 89, in render_inner
    rendered = [self.render(child, footnotes) for child in token.children]
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 89, in <listcomp>
    rendered = [self.render(child, footnotes) for child in token.children]
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 73, in render
    return self.render_map[token.__class__.__name__](token, footnotes)
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/html_renderer.py", line 117, in render_paragraph
    return '<p>{}</p>\n'.format(self.render_inner(token, footnotes))
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 89, in render_inner
    rendered = [self.render(child, footnotes) for child in token.children]
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 89, in <listcomp>
    rendered = [self.render(child, footnotes) for child in token.children]
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 73, in render
    return self.render_map[token.__class__.__name__](token, footnotes)
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/html_renderer.py", line 52, in render_emphasis
    return template.format(self.render_inner(token, footnotes))
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 89, in render_inner
    rendered = [self.render(child, footnotes) for child in token.children]
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/base_renderer.py", line 89, in <listcomp>
    rendered = [self.render(child, footnotes) for child in token.children]
  File "/home/dmerej/src/dmerej/mistletoe/mistletoe/span_tokenizer.py", line 21, in tokenize
    while index < len(content):
TypeError: object of type 'NoneType' has no len()

List parsing requests

Hello again, and thanks one more time for this great tool. I am having some minor troubles with bullet point lists; consider the following Markdown:

First List
- this is
- the first list
  - and this is
  - its sublist
- moving on.
  
Second List

- this is
- the second list
  - and this is
  - its sublist
- moving on.

Third List

- this is
- the third list
    - and this is
    - its sublist
- moving on.

You will notice that only the third list is parsed as intended by Mistletoe:

  1. The first one is not parsed at all because (I think) it is not preceded by a paragraph
  2. The sublist of the second list is not recognized because it looks like 4 spaces of indentation are required.

I would prefer if any new level of indentation would be recognized as starting a sublist, and it would be great if a new paragraph was not required before a bullet point list.

Various bugs in HTML tokens

I'm seeing the following behavior:

< hello >
# test

gets parsed to

&lt; hello &gt;
<h1>test</h1>

This is close to correct, but the first line should also be in <p> tags.

More worryingly:

<!-- hello -->

# test

gets parsed to

<!-- hello -->
# test

I'm not sure what's going on there - it may be related to #36, but I'm not sure. I would have expected

<!-- hello -->
<h1>test</h1>

Plans for advanced LaTeX support?

Hi there!

I am also looking to a good solution to write Markdown + LaTeX (preferably, with custom macros and environments) on static blog generator, for now, I have been using a mixture of pandoc and MathJaX.

But I believe your idea could lead to something a lot more cleaner, do you think it'd be feasible [1] to add support for:

  • \def, \newcommand, and so on.
  • environments
  • local packages (amsthm, mathtools, etc…)

Thank you for your time and your project, that's awesome!

[1]: Meaning that anyone could contribute a PR to do this.

FileWrapper backstep after StopIteration

When you reach the end of a FileWrapper, continuing to call __next__ still increases _index. As a result, you would need to call backstep that many times.

Example code:

lines = FileWrapper(['a', 'b'])

for line in lines:
    print(line)  # will print 'a' then 'b'

lines.backstep()

# at that point next(lines) still raise StopIteration

# so you need an extra call to backstep if you want to get the last line
lines.backstep()  #  <--- this line is what the issue is about

next(lines)  # gives 'b'

For me that's one call too many to backstep.

Moreover, if the StopIteration is raised more than once, you need to call backstep that many times (which means that you need to know how many times the exception was raised).

Is the current behaviour the intended one, or is it open to be changed?


Below is a suggested patch to try the suggested behaviour:

--- a/mistletoe/block_tokenizer.py
+++ b/mistletoe/block_tokenizer.py
@@ -10,8 +10,8 @@ class FileWrapper:
         self._anchor = 0
 
     def __next__(self):
-        self._index += 1
-        if self._index < len(self.lines):
+        if self._index + 1 < len(self.lines):
+            self._index += 1
             return self.lines[self._index]
         raise StopIteration

Note: all tests are still passing with the patch applied.

Citation-style footnote support

Hi,

If I'm not wrong Footnotes are not supported by mistletoe.

I saw something about footlink but not footnotes.

Example:

This is a footnote[^1]

[^1] Footnote definition here

Links to URLs ending in brackets are truncated

URLs like https://en.wikipedia.org/wiki/Set_(mathematics) which end with a bracket seem to get truncated by Mistletoe. It seems to grab a URL by searching for the first ) character, omitting brackets and any following characters. For instance:

$ mistletoe
mistletoe [version 0.5.4] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
>>> [link](https://en.wikipedia.org/wiki/Set_(mathematics))
... 
<p><a href="https://en.wikipedia.org/wiki/Set_%28mathematics">link</a>)
</p>

I'm not sure if this is a serious issue---after all, we can encode the bracket in the URL as %29. For what it's worth, GitHub's flavour of markdown seems to be happy with URLs ending in (at least one) bracket. The string [link](https://en.wikipedia.org/wiki/Set_(mathematics)) results in link, and all is well.

Active custom tokens before __enter__()

Sorry, I couldn't find a better place to leave this, but wanted to share my experience using the library nonetheless.

I used mistletoe to write a spell checker script for jupyter notebooks in about an hour! My script basically reads in the notebook, tokenizes all markdown cells with mistletoe, and then I have a custom renderer that only prints out a cell number followed by the misspelled words in that cell.

It was all very easy and runs efficiently, so nice work on a great library.

The one piece of constructive feedback I can offer is that I only got my custom tokens to be active during parsing if I force MyRenderer.__enter__() to be called (e.g. I use the renderer as a context manager). I don't remember seeing that pointed out in the README, and was only able to figure it out by reading the source for BaseRenderer.

Document side-effects of renderers' initialisation

Hello, this is possibly an issue concerning the doc and not the code.

  • Parsing outside of the renderer's context manager:
d = Document('a <b> c')
with HTMLRenderer() as r:
    print(r.render(d))  # <p>a &lt;b&gt; c</p>
  • Parsing inside of the renderer's context manager:
with HTMLRenderer() as r:
    d = Document('a <b> c')
    print(r.render(d))  # <p>a <b> c</p>

Not sure where the difference in output comes from. CommonMark asks for the second output though, which seems to be what is performed in mistletoe.markdown and by the mistletoe command line.


$ python -V
Python 3.7.0
$ pip freeze
mistletoe==0.7.1

Wrapping an <aside> in backticks does strange things

I'm seeing some weird behavior when wrapping an <aside> in backticks:

Wrapped in an `<aside>`

Becomes

<p>
Wrapped in an
<code>
<aside>
</aside>
</code>
.
</p>

Rather than the expected

<p>
Wrapped in an
<code><aside></code>
.
</p>

Heading within paragraphs are not recognized

It appears that many of the tests in the commonmark suite are failing. The one I'm particularly interested in right now is "example 48":

Foo bar
# baz
Bar foo

should get translated to

<p>Foo bar</p>
<h1>baz</h1>
<p>Bar foo</p>

but instead becomes

<p>Foo bar
# baz
Bar foo</p>

Are there any plans to fix this and/or the other failing commonmark tests?

Thanks!

List parsing fails if sub list changes leader

Hi,

There appears to be a bug parsing lists where sublists change the leader character.

For instance when trying to parse markdown that looks like this;

+ test
+ test
    - test
+ test

or

+ test
+ test
    1. test
+ test

I get the following error;

\lib\site-packages\mistletoe_init_.py:19: in markdown
return renderer.render(Document(iterable))
\lib\site-packages\mistletoe\block_token.py:117: in init
self._children = tuple(tokenize(lines, root=self))
/lib\site-packages\mistletoe\block_tokenizer.py:61: in tokenize
token = token_type(token_type.read(lines))


self = <mistletoe.block_token.List object at 0x0422D5D0>, items = []
def init(self, items):
self._children = items
self.loose = self.class.loose

  leader = self.children[0].leader

E IndexError: list index out of range

I believe this may have been introduced in the recent changes to the list parsing functionality as I'm sure this worked previously.

Thanks,
John

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.