dishmint / markdown2wl Goto Github PK

View Code? Open in Web Editor NEW

12.0 4.0 2.0 1.74 MB

Lex and Parse Markdown in Wolfram Language

License: MIT License

Mathematica 100.00%

import-export markdown mathematica wolfram-language lexer parser

markdown2wl's People

Contributors

Stargazers

Watchers

Forkers

rohitn ewandawson

markdown2wl's Issues

Table and Codeblock parsing

This is generally working outside of MarkdownParser.wl. Needs to be integrated into the package.

Implement Import and Export Converters

Implement Import and Export converters for markdown files

Write test.md from scratch

Forego the use of the 'exhibit of markdown' example markdown page and write one up from scratch. It will be much easier to assess the effectiveness of the parse with succinct examples.

Add MarkdownLexer tests

With MarkdownLexer implemented, I should make sure a comprehensive set of tests are added.

Update Documentation for Public symbols

Make sure all exposed symbols are reflected in the documentation.

CopyAsMarkdown

Add CopyAsMarkdown menu item so one can copy a cell into markdown formatted text

Recognize indentation level of list items

Add parser support for SubItems etc.

[MarkdownRender] Nested MarkdownElements not rendering correctly

Some nested MarkdownElements are not rendering correctly

branch here: https://github.com/dishmint/Markdown2WL/tree/feature/renderer/dishmint/MarkdownRender_Init

The parser tests need to be verified. I constructed them by running ImportMarkdown on the test input from the markdown field in the commonmark v0.30 JSON tests. There's no guarantee they are correct right now. A brief look at some of the initial tests suggests they probably all are not correct. (or at least not expected)

Support HTML

Recognize HTML and parse as HTMLElements.

It may be as simple as detecting the html snippets and importing them as symbolic HTML.

Implement MarkdownParser

The parser refines tokens into their final symbolic form

Parser
- Emphasis (Italics, Bold, InlineCode, LaTex)
- Blocks (CodeBlock, Tables, Lists)
- Links (Hyperlinks, Footnotes, Images)

Custom Markdown Stylesheets

In the default stylesheet item indentation is limited, so I think this warrants a custom stylesheet.

Stylesheets to reference:

Outline

Don't hardcode indentation-length for space characters in regex rules

The \\s{2} could be a problem if people use different indentation-lengths.

(* OrderedListItems *)
    RegularExpression[ "^((\\s{2}|\\t)*)((\\d\\.)+\\d?)\\s(.*)$" ] :> $TokenLevelData[ <| "Token" -> "OrderedListItem", "Level" -> GetIndentationLevel["$1"], "Data" -> "$5" |> ]

There are three possibilities: (1) set the indentation-length with an option, (2) set the indentation-length per flavor, or (3) normalize all leading indentation to spaces or tabs

Markdown2WL/FaizonZaman/WLMarkdown/Kernel/TokenRules.wl

Line 22 in 7c60e2b

ExtractFootnoteURL mysteriously Privated

For some reason ExtractMarkdownFootnoteURL enters the Private context when extracting the 3rd footnote from test.md

 First[Private`ExtractMarkdownFootnoteURL[3, Private`footFile]]]

MarkdownElement["Item",
    <|
        "Type" -> "Unordered",
        "IndentationLevel" -> 0,
        "IndentationType" -> None, 
        "Content" -> {
                "A named link to",
                MarkdownElement[Hyperlink, "MarkItDown", First[Private`ExtractMarkdownFootnoteURL[3, Private`footFile]]], 
                ". The easiest way to do these is to select what you want to make a link and hit ",
                MarkdownElement["InlineCode", "Ctrl+L"], "."
                }
    |>
]

Lex links after blocks?

Ordered sub items with d.d form not parsing as items in VerificationTest

Parser test failure

TestID:

OrderedItemTest2

Input:

"  1.1 here is an ordered sub item"

Expected:

MarkdownElement["Item",
    Association[
        "Type" \[Rule] "Ordered", "IndentationLevel" \[Rule] 1,
        "IndentationType" \[Rule] "Whitespace",
        "Content" \[Rule] "here is an ordered sub item"
    ]
]

Output:

"  1.1 here is an ordered sub item"

Convert to Paclet

Convert current implementation to a paclet

Ensure Rules Conform to CommonMark

Here's the link to CommonMark .30 tests

An example test:

VerificationTest[
    ImportMarkdown["< test string >"],
    ...
]

I should also note the version being tested, I could do that in the TestID, or have different test files for different CommonMark versions.

Add parser tests

Use the markdown fields from these tests as input, and by default set the expected output in the VerificationTest to be whatever the parser returns.

Then go through and verify they all are parsing correctly.

https://spec.commonmark.org/0.30/spec.json

Handle rendering by stylesheet instead of MarkdownExpression-StyledExpression conversion

In exporting Markdown I wonder if it makes sense to keep the Markdown in Symbolic form. Converting the Markdown expressions to Styled expressions obfuscates the markdown expressions, or adds fluff to it? I'm considering including a stylesheet, that way on the export mechanism just needs to interface with the symbolic markdown, rather than the 'rendered' expressions.