dishmint / markdown2wl Goto Github PK
View Code? Open in Web Editor NEWLex and Parse Markdown in Wolfram Language
License: MIT License
Lex and Parse Markdown in Wolfram Language
License: MIT License
This is generally working outside of MarkdownParser.wl. Needs to be integrated into the package.
Forego the use of the 'exhibit of markdown' example markdown page and write one up from scratch. It will be much easier to assess the effectiveness of the parse with succinct examples.
With MarkdownLexer implemented, I should make sure a comprehensive set of tests are added.
Make sure all exposed symbols are reflected in the documentation.
Add CopyAsMarkdown menu item so one can copy a cell into markdown formatted text
Add parser support for SubItems etc.
Some nested MarkdownElements are not rendering correctly
branch here: https://github.com/dishmint/Markdown2WL/tree/feature/renderer/dishmint/MarkdownRender_Init
The parser tests need to be verified. I constructed them by running ImportMarkdown on the test input from the markdown
field in the commonmark v0.30 JSON tests. There's no guarantee they are correct right now. A brief look at some of the initial tests suggests they probably all are not correct. (or at least not expected)
Recognize HTML and parse as HTMLElements.
It may be as simple as detecting the html snippets and importing them as symbolic HTML.
The parser refines tokens into their final symbolic form
In the default stylesheet item indentation is limited, so I think this warrants a custom stylesheet.
Stylesheets to reference:
The \\s{2}
could be a problem if people use different indentation-lengths.
(* OrderedListItems *)
RegularExpression[ "^((\\s{2}|\\t)*)((\\d\\.)+\\d?)\\s(.*)$" ] :> $TokenLevelData[ <| "Token" -> "OrderedListItem", "Level" -> GetIndentationLevel["$1"], "Data" -> "$5" |> ]
There are three possibilities: (1) set the indentation-length with an option, (2) set the indentation-length per flavor, or (3) normalize all leading indentation to spaces or tabs
For some reason ExtractMarkdownFootnoteURL enters the Private context when extracting the 3rd footnote from test.md
First[Private`ExtractMarkdownFootnoteURL[3, Private`footFile]]]
MarkdownElement["Item",
<|
"Type" -> "Unordered",
"IndentationLevel" -> 0,
"IndentationType" -> None,
"Content" -> {
"A named link to",
MarkdownElement[Hyperlink, "MarkItDown", First[Private`ExtractMarkdownFootnoteURL[3, Private`footFile]]],
". The easiest way to do these is to select what you want to make a link and hit ",
MarkdownElement["InlineCode", "Ctrl+L"], "."
}
|>
]
Parser test failure
TestID:
OrderedItemTest2
Input:
" 1.1 here is an ordered sub item"
Expected:
MarkdownElement["Item",
Association[
"Type" \[Rule] "Ordered", "IndentationLevel" \[Rule] 1,
"IndentationType" \[Rule] "Whitespace",
"Content" \[Rule] "here is an ordered sub item"
]
]
Output:
" 1.1 here is an ordered sub item"
Convert current implementation to a paclet
Here's the link to CommonMark .30 tests
An example test:
VerificationTest[
ImportMarkdown["< test string >"],
...
]
I should also note the version being tested, I could do that in the TestID
, or have different test files for different CommonMark versions.
Use the markdown fields from these tests as input, and by default set the expected output in the VerificationTest
to be whatever the parser returns.
Then go through and verify they all are parsing correctly.
In exporting Markdown I wonder if it makes sense to keep the Markdown in Symbolic form. Converting the Markdown expressions to Styled expressions obfuscates the markdown expressions, or adds fluff to it? I'm considering including a stylesheet, that way on the export mechanism just needs to interface with the symbolic markdown, rather than the 'rendered' expressions.
For the sake of lean files and organization, I think I should move the lexer code to a separate file.
The 3 backtick sequence indicating the start and end of a code block is not parsing correctly. Each pair of backticks is parsing as an InlineCode MarkdownElement.
Support YAML frontmatter. In what form though I'm not sure, as objects, or just styled text?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.