Giter VIP home page Giter VIP logo

Comments (34)

anderskaplan avatar anderskaplan commented on May 27, 2024 7

@huguesdevimeux hi, I've just created a draft PR for this. Please check it out and let me know how it works for you!

from mistletoe.

chrisjsewell avatar chrisjsewell commented on May 27, 2024 3

Heya, just to note https://github.com/executablebooks/markdown-it-py provides a markdown -> markdown render via https://github.com/executablebooks/mdformat

from mistletoe.

pbodnar avatar pbodnar commented on May 27, 2024 3

@ALL, the PR has been merged into the master branch and it will available in the coming release. 🎉 Testing and feedback are welcome. :)

from mistletoe.

nickovs avatar nickovs commented on May 27, 2024 2

I've been taking a look at this just now since I have active need for it at work. The use case that I have is that we manage a bunch of processes internally using Markdown wiki pages; some of these pages are generated by humans and some by machine. I need to be able to have code that can add, modify and/or delete content in the sections in the middle of the pages and ideally I'd like to be able to do this in a structured way. I can extract the content but at the moment I can't regenerate the content after editing it.

As for thoughts about how to do this, I think that the key piece that is missing is for the renderer for a given token to be able to look back up the stack at the tokens above. This would be fairly easy to do just by having BaseRenderer.render() push the token being rendered onto a stack before it makes the call through the render_map and pop it back off afterwards. Doing this would be useful to improve the rendering of nested strong and emphasis and also might make some cases like tables a little easier to keep looking nice.

from mistletoe.

chrisjsewell avatar chrisjsewell commented on May 27, 2024 2

Yeh no worries it's on the todo list 😅 executablebooks/markdown-it-py#10 (comment)

from mistletoe.

lhayhurst avatar lhayhurst commented on May 27, 2024 2

(OP here). Amazing! Incredible fortitude seeing this 6.5 year old ticket through to completion. 🥳

from mistletoe.

nickovs avatar nickovs commented on May 27, 2024 1

OK. I have a naive version working for the documents that I care about. I will get it to a state where a parse of the samples in the tests and parses of my rendered rendered versions of the first pass look the same and then I'll send it to you.

from mistletoe.

nickovs avatar nickovs commented on May 27, 2024 1

It looks like I missed the 0.7.1 release window! What I have is somewhat untested but works for my purposes. I’ll send you a PR of what I’ve got when I get back to my computer and you can give me your comments.

Sent with GitHawk

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024 1

Sorry for the late reply, I've been busy with other commitments for the past half month. Hopefully in the next week or so I can squeeze in some time to work on this feature.

I already have two commits on a local branch implementing location information. There are tricky cases, and I still need to think about how they fit together in the Markdown renderer. This is just to say that I'm working on it, and will keep posting updates to this thread.

from mistletoe.

matthubb avatar matthubb commented on May 27, 2024 1

2 years later bump?

This is the most promising thread I could find for a Markdown -> AST -> Markdown solution, but nothing published so far?

from mistletoe.

anderskaplan avatar anderskaplan commented on May 27, 2024 1

I'd like to see this too! In particular, to get as close as possible to a bit-perfect roundtrip. The use case would be to use it for translation.

I'd be happy to contribute this. Can't make any promises as to when it will be finished, but I've done some research and I think it should be possible.

The approach would be to add the necessary information (e.g., if '_' or '*' was used for emphasis) to the tokens, and then create a new renderer class.

from mistletoe.

anderskaplan avatar anderskaplan commented on May 27, 2024 1

@huguesdevimeux just so you know, I will soon put up a PR for this. I've got it working for everything except tables. As I wrote above, I'm aiming for a near-perfect roundtrip. Some whitespace will be lost, that's inevitable, but apart from that the rendered document should look just like the input. As it happens, this approach solves the problem that @miyuchina mentioned above!

But, the PR builds on top of some other PR's, so those will have to go in first.

I can publish a draft PR if you'd like to see it, and maybe try it out. Probably sometime later this week.

from mistletoe.

pbodnar avatar pbodnar commented on May 27, 2024 1

@mikez, thanks for your feedback. :)

Regarding your remark, maybe you could file an issue describing the problem in more detail? Note that mistletoe still doesn't support "classical footnotes" as given your example - see #47.

from mistletoe.

mikez avatar mikez commented on May 27, 2024 1

@pbodnar Thank you for the clarification. Markdown Extra and MultiMarkdown have footnotes, but CommonMark and GitHub Flavored Markdown (GFM) do not at this time. You follow CommonMark, so now I understand why my example can have unpredictable behavior.

from mistletoe.

pbodnar avatar pbodnar commented on May 27, 2024 1

Regarding competing, ready-made markdown renderers like this MarkdownRenderer, I've just found out that the markdown-it-py project actually also has one: they have it in a separate Python package called mdformat which can be used on its own, or together with the MarkdownIt API as described here. It would be interesting to compare the 2 renderers...

UPDATE: I've just realized the existence of mdformat was already mentioned above. :)

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

Thanks for the interest! Unfortunately rendering back to Markdown does require implementing a complete renderer, as the original syntax information is lost in the parsed AST.

Such a renderer is certainly planned for mistletoe, though it does require a bit of work. If you're interested at all in implementing this feature yourself, feel free to open a pull request and we'll see how it goes. Otherwise, it would be a planned feature for the next release.

from mistletoe.

lhayhurst avatar lhayhurst commented on May 27, 2024

Thanks for the reply! Cool, that is what I thought. My friend ( @dgroo) and I are going to take a shot at writing the MarkDown renderer (starting from the HTML one), but we're both a little busy right now, so if this is something you are hoping to get done quickly, please let me know :-)

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

I'm going to add a "help-wanted" tag to this issue, since I don't think I'd be getting around to this anytime soon. If you're interested in this feature, add your thumps-up to @lhayhurst 's topmost comment. Comment below if you're in a pinch!

For potential contributors, take a look at mistletoe.html_renderer module. It would serve as a good example for writing your own renderer classes, and you will find most token attributes there.

Also a reminder to branch off your changes from the dev branch, not the master branch!

from mistletoe.

nickovs avatar nickovs commented on May 27, 2024

Has any progress been made on this? I too need a MarkDown renderer for Mistletoe. If there's work in progress then I would be happy to take a look at using that as a starting point and see if I can build something.

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

Thank you @nickovs for taking this task on yourself! I think the main difficulty is working through all the edge cases that a Markdown document can contain, and this is partly why I've been putting this issue off. For example:

**_foo_**

... should be parsed as:

<strong><em>foo</em></strong>

But using a naive implementation, e.g.,

def render_strong(self, token):
    return '**{}**'.format(self.render_inner(token))

def render_emphasis(self, token):
    return '*{}*'.format(self.render_inner(token))

... we would have the output:

***foo***

... which gets parsed as:

<em><strong>foo</strong></em>

And things get trickier when we have escape characters, which influence the parsing process, but in some cases are not reflected in the abstract syntax tree.

I have some thoughts on how to get around this, but it would require some additional work apart from implementing a renderer. What are your thoughts, and what do you think would be your use case for such a renderer?

Edit and thank you @huettenhain!

from mistletoe.

lhayhurst avatar lhayhurst commented on May 27, 2024

Hi, thank you for picking this up! I've been knee-deep in job-work recently and unable to complete the task :-(

from mistletoe.

nickovs avatar nickovs commented on May 27, 2024

@miyuchina Since you mentioned that this was already planned as a feature for mistletoe, when I send you a pull request would you like me to put this into the mistletoe directory or the contrib directory? It seems to me that it should be core functionality for the library, which would suggest the former.

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

@nickovs Yes, go ahead and put it in the mistletoe directory! I like the idea, but for now, if you do end up implementing this, is it okay if you only override the render function in your new renderer? Don't worry too much about writing tests, they can come later.

I'm thinking about adding location information to each token, e.g., a Paragraph knows it has lines 3-6 of the original document, and an Emphasis knows it's characters 12-20. This would potentially help with features like incremental compilation. For implementing MarkdownRenderer, there's a simpler (and faster?) way that allows us to avoid handling edge cases one by one:

  • if we see an unmodified token, copy the relevant text region from the original document;
  • if we see a modified token, render according to the new render method.

But adding location information to tokens needs quite a bit of work, so if you want to go through with your method, feel free!

from mistletoe.

miyuchina avatar miyuchina commented on May 27, 2024

@nickovs no rush of course, but I'd love to include your Markdown renderer in version 0.7.1, which I plan to release this coming weekend. Do you think it can be finished before then, or do you think we should give it more time?

from mistletoe.

gruns avatar gruns commented on May 27, 2024

I'm thinking about adding location information to each token, e.g.,
a Paragraph knows it has lines 3-6 of the original document, and an
Emphasis knows it's characters 12-20.

This is information is required, in some capacity, to preserve tokens
with abiguous Markdown representations, like headers, emphasis, list
item prefixes, etc. Without such, there's no way to preserve the
input's character choice. E.g. mistletoe can't know whether to render
the input **Strong** as **Strong** (correct) or __Strong__
(incorrect).

@nickovs Any progress on your PR? And how does your implementation
handle the above situation?

from mistletoe.

Jyhess avatar Jyhess commented on May 27, 2024

Hi, any news on this feature?
Like @nickovs we are documenting our project with Markdown, and we need a parser to extract or add some information. Mistletoe is great for parsing, with a data tree easily manipulable (thank for this work). We just need a way to write modified structure.
I don't have time yet to write it by myself, but I can test it and provide feedback.

from mistletoe.

pbodnar avatar pbodnar commented on May 27, 2024

@chrisjsewell, that looks promising, thanks for the tip. 👍 I think it would help you if you mentioned this, or how to use different renderers (which ones?) generally, somewhere at the top of your docs for markdown-it-py. I've searched through them quickly and I couldn't find much info on that topic.

from mistletoe.

pbodnar avatar pbodnar commented on May 27, 2024

A brief summary and feedback after some time:

I'm thinking about adding location information to each token, e.g.,
a Paragraph knows it has lines 3-6 of the original document, and an
Emphasis knows it's characters 12-20.

This is information is required, in some capacity, to preserve tokens with abiguous Markdown representations, like headers, emphasis, list item prefixes, etc. Without such, there's no way to preserve the input's character choice. E.g. mistletoe can't know whether to render the input **Strong** as **Strong** (correct) or __Strong__ (incorrect).

So far the use cases presented here, like this one, seem NOT to need any location information? Instead, it should be sufficient (or even required) to know what enclosing characters were used in the input for a given token (which should be relatively easy to do). OTOH location information (BTW a feature freshly requested in #144) would be useful if we wanted to keep the original text 100% untouched (which might be quite a challenge)? Please let me know if I have overlooked anything here.

@nickovs Any progress on your PR? And how does your implementation handle the above situation?

Unfortunately, it looks like there are no branches or PRs available yet. So we would either have to start from scratch, or to inspire from other projects. ;)

from mistletoe.

huguesdevimeux avatar huguesdevimeux commented on May 27, 2024

Hello,

Sorry, I'm late to the party. I'm working on this feature (no promise at all) for a personal project, and this thread is the closest one I could find on AST → MD, in python.

For reference, such renderer as already been coded in js here by @DamonOehlman. Most of the logic can be found here.
That being said, the issue @miyuchina mentioned is seemingly not fixed by this renderer.

I will give a try on implementing this.

from mistletoe.

pbodnar avatar pbodnar commented on May 27, 2024

@huguesdevimeux, thanks for your contribution to this topic.

Just be aware that @anderskaplan is currently probably working on this as well, while also greatly helping us fix many other things "on the way", so I'm not sure how far he actually got with this one (no published branch for this yet?)

For reference, such renderer as already been coded in js here by @DamonOehlman. Most of the logic can be found here.
That being said, the issue @miyuchina mentioned is seemingly not fixed by this renderer.

Just checked, I can confirm the linked JS renderer does seem like the basic "naive" implementation, i.e. not considering types of headings or strong texts from the original markdown text. As suggested by me and confirmed by @anderskaplan just above, these cases shouldn't be that difficult to cover by extending the AST, not sure about the rest - but I still think we don't need to keep all the original formatting...

from mistletoe.

huguesdevimeux avatar huguesdevimeux commented on May 27, 2024

@huguesdevimeux just so you know, I will soon put up a PR for this. I've got it working for everything except tables. As I wrote above, I'm aiming for a near-perfect roundtrip. Some whitespace will be lost, that's inevitable, but apart from that the rendered document should look just like the input. As it happens, this approach solves the problem that @miyuchina mentioned above!

But, the PR builds on top of some other PR's, so those will have to go in first.

I can publish a draft PR if you'd like to see it, and maybe try it out. Probably sometime later this week.

Ok, then, perfect. I'm curious to see what you did, though :).

from mistletoe.

mikez avatar mikez commented on May 27, 2024

+1 on rendering back to Markdown. :)

For my use case, it would be useful if the location of references and footnotes were preserved in the ast.
Why: Sometimes, there may be two different lists of footnotes: a notes section [^a], [^b], [^c], ... and a references section [^1], [^2], [^3], akin to how Wikipedia has it.

from mistletoe.

anderskaplan avatar anderskaplan commented on May 27, 2024

Removed the draft status on the PR now.

from mistletoe.

mikez avatar mikez commented on May 27, 2024

@anderskaplan @pbodnar
🎉 Tested and works as expected. :)

Minor remark

Consider this markdown text:

lorem[^a] ipsum[^b].

## Notes

[^a]: dolor
[^b]: sit amet

When trying to traverse the ast, I was confused why [^a] turns into a LinkReferenceDefinition, but [^b] is turned into a RawText and merged with "ipsum" to ipsum[^b].

from mistletoe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.