juliadocs / markdownast.jl Goto Github PK
View Code? Open in Web Editor NEWAbstract syntax tree representation of Markdown documents in Julia
Home Page: https://juliadocs.github.io/MarkdownAST.jl
License: Other
Abstract syntax tree representation of Markdown documents in Julia
Home Page: https://juliadocs.github.io/MarkdownAST.jl
License: Other
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
I.e. what CommonMark does and can largely be lifted from there. However, we should probably drop the dependency on Crayons for it (x-ref: MichaelHatherly/CommonMark.jl#41 (comment)).
The precise implementation of the Table element in CommonMark is currently unclear to me (not sure how the different sub-elements should be organized).
TableBody
and TableHeader
elements altogether, and just interpret the first row as the header.TableCell
a singleton, removing its fields. They could also be turned into some sort of dynamic properties that are determined by traversing the tree.TableComponent
subtype AbstractElement
directly and don't use it for Table
(to ensure that the internal nodes of a table are not allowed to exists in other contexts).rows
, nrows
, ncols
etc. to dynamically determine the number of columns and rows of a table.CommonMark has implemented dedicated inlines for these characters. But do we actually need them here?
MarkdownAST.jl/src/markdown.jl
Lines 467 to 469 in 72c5f6c
The Document
element is not really a block, since it should not be contained in other elements (e.g. an Admonition
probably should not contain a Document
node as a child). Instead, it could subtype AbstractElement
directly.
This is currently not implemented here, but likely something we need to support converting from standard library Markdown objects containing interpolations.
MarkdownAST.jl/src/markdown.jl
Line 466 in 72c5f6c
Lines 119 to 127 in 72c5f6c
CommonMark iterates over the whole tree if you do for node in tree
, which is currently not implemented here (we only have children
).
I would argue, however, that it is not intuitively clear which type of iteration (over direct children? over whole tree? over direct and indirect children, but not parents?) should be the default. Hence I would advocate that for each iterator there should be a function (like children
) that returns the iterator. For the whole-tree iteration it could be called tree(node)
.
Is there an easy way to convert a MarkdownAST tree back to the markdown string it represents? I managed to do it via conversion back to the stdlib-Markdown:
julia> using MarkdownAST: @ast, Document, Heading, Paragraph
julia> using Markdown: MD
julia> md = @ast Document() do
Heading(1) do
"Top-level heading"
end
Paragraph() do
"Some paragraph text"
end
end
julia> print(string(convert(MD, md)))
# Top-level heading
Some paragraph text
It would nice if string(md)
or something like that could work directly.
I forgot some closing backticks in a jldoctest block (fixed here LilithHafner/AliasTables.jl#29) which should, in order of preference
Right now we're at a 3 or 4 here: https://aliastables.lilithhafner.com/v1.0.0/#AliasTables.probabilities
Moved from LuxDL/DocumenterVitepress.jl#116
.. and define const Node = GenericNode{Nothing}
. This way MarkdownAST.Node
would always refer to a concrete type, and would also make it more clear in other packages when they define their own Node
with their own T
.
This is inspired by how IOBuffer
is really an instance of GenericIOBuffer{T}
.
For consistency with other elements (AbstractBlock
/ AbstractInline
, HTMLBlock
/HTMLInline
, DisplayMath
/InlineMath
).
The various tree mutation methods (push!
, insert_after!
etc.) do not enforce the requirements on elements that are described by the iscontainer
and can_contain
methods.
We have these internal fields like .nxt
and .first_child
, and we do some unnecessary get/setproperty stuff:
Lines 103 to 137 in 99e0f82
We should clean that up and make sure that you can only access documented fields. Internally, we can use getfield
and setfield!
where necessary. But let's do that in 0.2.0.
X-ref: #19.
Continuing from the discussion on Discourse and in the context of implementing JuliaDocs/DocumenterCitations.jl#6, it would be extremely useful to implement the Base
functions replace
and replace!
on AST trees.
I would propose the following implementation:
using Pkg
Pkg.activate(temp=true)
Pkg.add("MarkdownAST")
import MarkdownAST
"""
replace(f::Function, root::Node)
Creates a copy of the tree where all child nodes of `root` are recursively
replaced by the result of `f(child)`.
The function `f(child::Node)` must return either a new `Node` to replace
`child` or a Vector of nodes that will be inserted as siblings, replacing
`child`.
Note that `replace` does not allow the construction of invalid trees, and
element replacements that require invalid parent-child relationships (e.g., a
block element as a child to an element expecting inlines) will throw an error.
# Example
The following snippet removes links from the given AST. That is, it replaces
`Link` nodes with their link text (which may contain nested inline markdown
elements):
```julia
new_mdast = replace(mdast) do node
if node.element isa MarkdownAST.Link
return [MarkdownAST.copy_tree(child) for child in node.children]
else
return node
end
end
```
"""
function Base.replace(f::Function, root::MarkdownAST.Node{M}) where M
new_root = MarkdownAST.Node{M}(root.element, deepcopy(root.meta))
for child in root.children
replaced_child = replace(f, child)
transformed = f(replaced_child)
if transformed isa MarkdownAST.Node
push!(new_root.children, transformed)
elseif transformed isa Vector
append!(new_root.children, transformed)
else
error("Function `f` in `replace(f, root::MarkdownAST.Node)` must return either a Node or a Vector of nodes, not $(repr(typeof(transformed)))")
end
end
return new_root
end
"""
replace!(f::Function, root::Node)
Acts like `replace(f, root)`, but modifies `root` in-place.
"""
function Base.replace!(f::Function, root::MarkdownAST.Node{M}) where M
new_root = replace(f, root)
while !isempty(root.children)
# `Base.empty!(root.children)` would be nice!
MarkdownAST.unlink!(first(root.children))
end
append!(root.children, new_root.children)
return root
end
It might be nice to also implement Base.empty(::MarkdownAST.NodeChildren)
(see comment): is there a better way to do that than the loop that I implemented?
To test the behavior in the context of my original intent with DocumenterCitations
:
## TEST ######################################################################
#
# As a test, we're resolving simple citation links in a format similar to
# https://juliadocs.org/DocumenterCitations.jl/stable/gallery/#Custom-style:-Citation-key-labels
#
# That test replaces a single Link node with a list of new inline nodes that
# mix text and links to a `references.md` page.
#
# Also, to test the simpler transformation of a node with a single new node, we
# replace Strong (bold) nodes with Emph (italic) nodes โ This could also be
# donw with MarkdownAST.copy_tree directly, but it's just a test.
Pkg.add(url="https://github.com/JuliaDocs/Documenter.jl", rev="master")
import Markdown
import Documenter
MD = raw"""
# Quantum Control
**[Quantum optimal control](https://qutip.org/docs/latest/guide/guide-control.html)**
[BrumerShapiro2003;BrifNJP2010;KochJPCM2016;SolaAAMOP2018;MorzhinRMS2019;
Wilhelm2003.10132;KochEPJQT2022](@cite) attempts to steer a quantum system in
some desired way.
## Methods used
We use the following methods:
* *[Krotov's method](https://github.com/JuliaQuantumControl/Krotov.jl)*
[Krotov1996](@cite), and
* [**GRAPE** (*Gradient Ascent Pulse Engineering*)](https://github.com/JuliaQuantumControl/GRAPE.jl)
[KhanejaJMR2005;FouquieresJMR2011](@cite).
This concludes the document.
"""
function parse_md_string(mdsrc)
mdpage = Markdown.parse(mdsrc)
return convert(MarkdownAST.Node, mdpage)
end
mdast = parse_md_string(MD)
println("====== IN =======")
println("AS AST:")
@show mdast
println("AS TEXT:")
print(string(convert(Markdown.MD, mdast)))
println("=== TRANSFORM ===")
replace!(mdast) do node
if node.element == MarkdownAST.Link("@cite", "")
text = first(node.children).element.text # assume no nested markdown
keys = [strip(key) for key in split(text, ";")]
n = length(keys)
if n == 1
k = keys[1]
new_md = "[[$k]](references.md#$k)"
else
k1 = keys[1]
k2 = keys[end]
if n > 2
new_md = "[[$k1](references.md#$k1)-[$k2](references.md#$k2)]"
else
new_md = "[[$k1](references.md#$k1), [$k2](references.md#$k2)]"
end
end
return Documenter.mdparse(new_md; mode=:span)
# We probably wouldn't want to use `Documenter`, but it shouldn't be
# hard to copy in a stripped-down version of `mdparse` here.
elseif node.element == MarkdownAST.Strong()
# Not sure if `copy_tree(f, node)` is really the most elegant way to do
# this, but I wanted to try out how `copy_tree` can modify a node's
# `element`.
return MarkdownAST.copy_tree(node) do node, element
element == MarkdownAST.Strong() ? MarkdownAST.Emph() : element
end
else
return node
end
end
println("====== OUT =======")
println("AS AST:")
@show mdast
println("AS TEXT:")
print(string(convert(Markdown.MD, mdast)))
println("====== END =======")
Second, to test the simple example from the docstring:
# TEST 2: delete links (example from the docstring) ##########################
println("\n\n=====================================")
println("TEST2: ORIGINAL MD WITH LINKS REMOVED")
mdast = parse_md_string(MD)
replace!(mdast) do node
if node.element isa MarkdownAST.Link
return [MarkdownAST.copy_tree(child) for child in node.children]
else
return node
end
end
print(string(convert(Markdown.MD, mdast)))
println("====== END =======")
@mortenpi Would you like me to start working a PR for this with proper testing and documentation?
Any comments on the prototype?
Currently, the package does not export anything, so everything has to be explicitly included. We probably want to export some (or all) of the following things:
@ast
macroNode
typenext
, insert_after!
etc).AbstractElement
, AbstractInline
etc).Text
is ambiguous with a Base
export; but the @ast
macro does not actually need the Text()
method, so it could remain unexported).There are different representations of Node
object that are useful in different cases:
Something short that just says that this object is a Node
with some element (current behavior).
Full AST printout (the current showast
function, replicating the input of the @ast
macro). This is useful when working with the tree manually, but is technical and can get pretty long.
This can maybe be combined with (1), in that for large printouts we just put an ellipsis like we do for large arrays.
Pretty-printed document (the behaviour of CommonMark). This is useful for users who do not want to be concerned with the technical details of the representation, and also relevant for e.g. rendering docstrings.
We need to decide on which one should be the default output if a Node
is returned in the REPL, and how to access the other option.
From MichaelHatherly/CommonMark.jl#41 (comment):
node[]
for theAbstractContainer
instancePerhaps
container()
rather than overloadinggetindex
, which adds inconsistencies in how you access particular parts of the nodes.e.g.
next()
Probably too generic, unless we're expecting to not
export
?
element
though (~AbstractElement
).next
and previous
are indeed quite generic.Maybe, to avoid the whole issue of exporting generic functions (e.g. parent
, children
, container
/element
are also kind of generic), we stick to having them be clearly documented fields/properties, e.g. .element
, .next
, .previous
, .parent
.
I would argue that setproperty!
for many of them should be disallowed, so that it wouldn't be possible to construct nonsensical trees. You can always still call setfield()
if you really need low level access to the underlying fields (e.g. in basic functions such as insert_after!
).
Another bikeshedding question is whether to have them be called nxt
/prv
or next
/prev
or next
/previous
. While slightly more verbose, I would advocate for the latter option for clarity.
Furthermore, we could also overload the iterator over children, such that you could add children with push!(node.children, child)
and pushfirst!(node.children, child)
. Semantically, node.children
feels array/list-like, and so overloading push!
/pushfirst!
seems appropriate.
Currently when e.g. iterating over children(node)
you can mutate the tree while the iteration is happening. This will likely lead to unexpected behavior (note: changing or updating the AbstractElement
is fine).
We should minimally document that you should not do that. However, I wonder if there is something else we could do to make sure that you don't get bad behavior. A few options:
Collect Node
s into an array when iterator is constructed and then naively iterate over that array instead. If some of them get unlinked etc., then that won't affect the iteration per se. However, this will mean allocating a potentially big array (especially in the whole-tree case).. we could have an keyword argument for iterator functions to allow for unsafe, but efficient iteration (e.g. children(node, unsafe=true)
?
Make the tree immutable during iterators. This would mean attaching some global metadata to each node (e.g. something as simple as a Ref{Bool}
).
From MichaelHatherly/CommonMark.jl#41 (comment):
Instead of
append_child
andprepend_child
, I overloadpush!
andpushfirst!
for this. I felt that "append"/"prepend" could be confusing, as in the standard libraryappend!
andprepend!
concatenate two collections, rather than adding an element. However, at the same time, I am not really sure it makes sense to think of a node as a "collection of its children", which this choice implies.Those were intentionally not added to the
push!
andpushfirst!
methods since I didn't feel they could really reasonably be classed as "array-like" enough for it not to be punning.
We should change away from push!(::Node, ...)
and pushfirst!(::Node, ...)
for adding children, as it's not really semantically appropriate. But I don't really have a good idea for an alternative name, and still not a huge fan of "append" and "prepend".
A different option, together with #10, would be push!(node.children, child)
and pushfirst!(node.children, child)
.
Hi, in trying to upgrade to Documenter 1.0 I've hit this issue (with DataToolkitBase).
ERROR: LoadError: MethodError: no method matching iterate(::Markdown.Paragraph)
Closest candidates are:
iterate(::RegexMatch, Any...)
@ Base regex.jl:284
iterate(::ExponentialBackOff)
@ Base error.jl:260
iterate(::ExponentialBackOff, ::Any)
@ Base error.jl:260
...
Stacktrace:
[1] _convert(nodefn::MarkdownAST.NodeFn{Nothing}, c::MarkdownAST.Item, child_convert_fn::typeof(MarkdownAST._convert_block), md_children::Markdown.Paragraph)
@ MarkdownAST ~/.julia/packages/MarkdownAST/CZtZT/src/stdlib/fromstdlib.jl:33
[2] _convert_block(nodefn::MarkdownAST.NodeFn{Nothing}, b::Markdown.List)
@ MarkdownAST ~/.julia/packages/MarkdownAST/CZtZT/src/stdlib/fromstdlib.jl:65
[3] _convert(nodefn::MarkdownAST.NodeFn{Nothing}, c::MarkdownAST.Item, child_convert_fn::typeof(MarkdownAST._convert_block), md_children::Vector{Any})
@ MarkdownAST ~/.julia/packages/MarkdownAST/CZtZT/src/stdlib/fromstdlib.jl:34
[4] _convert_block(nodefn::MarkdownAST.NodeFn{Nothing}, b::Markdown.List)
@ MarkdownAST ~/.julia/packages/MarkdownAST/CZtZT/src/stdlib/fromstdlib.jl:65
[5] _convert(nodefn::MarkdownAST.NodeFn{Nothing}, c::MarkdownAST.Document, child_convert_fn::typeof(MarkdownAST._convert_block), md_children::Vector{Any})
@ MarkdownAST ~/.julia/packages/MarkdownAST/CZtZT/src/stdlib/fromstdlib.jl:34
[6] convert (repeats 2 times)
@ Documenter ~/.julia/packages/MarkdownAST/CZtZT/src/stdlib/fromstdlib.jl:23 [inlined]
[7] convert
@ Documenter ~/.julia/packages/MarkdownAST/CZtZT/src/stdlib/fromstdlib.jl:21 [inlined]
[8] (::Documenter.var"#49#50"{MarkdownAST.Node{Nothing}, Documenter.Page, Documenter.Document, LineNumberNode, Module, MarkdownAST.CodeBlock})()
@ Documenter ~/.julia/packages/Documenter/Meee1/src/expander_pipeline.jl:630
[9] cd(f::Documenter.var"#49#50"{MarkdownAST.Node{Nothing}, Documenter.Page, Documenter.Document, LineNumberNode, Module, MarkdownAST.CodeBlock}, dir::String)
@ Base.Filesystem ./file.jl:112
[10] runner(::Type{Documenter.Expanders.EvalBlocks}, node::MarkdownAST.Node{Nothing}, page::Documenter.Page, doc::Documenter.Document)
@ Documenter ~/.julia/packages/Documenter/Meee1/src/expander_pipeline.jl:610
[...]
[20] top-level scope
@ ~/.julia/dev/DataToolkitBase/docs/make.jl:19
in expression starting at /home/tec/.julia/dev/DataToolkitBase/docs/make.jl:19
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.