syntax-tree / hast Goto Github PK

Hypertext Abstract Syntax Tree format

hast's Introduction

Hypertext Abstract Syntax Tree format.

hast is a specification for representing HTML (and embedded SVG or MathML) as an abstract syntax tree. It implements the unist spec.

This document may not be released. See releases for released documents. The latest released version is 2.4.0.

Introduction
- Where this specification fits
- Virtual DOM
Types
Nodes (abstract)
- Literal
- Parent
Nodes
- Comment
- Doctype
- Element
- Root
- Text
Other types
Glossary
List of utilities
Related HTML utilities
References
Security
Related
Contribute
Acknowledgments
License

Introduction

This document defines a format for representing hypertext as an abstract syntax tree. Development of hast started in April 2016 for rehype. This specification is written in a Web IDL-like grammar.

Where this specification fits

hast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.

hast relates to JavaScript in that it has an ecosystem of utilities for working with compliant syntax trees in JavaScript. However, hast is not limited to JavaScript and can be used in other programming languages.

hast relates to the unified and rehype projects in that hast syntax trees are used throughout their ecosystems.

Virtual DOM

The reason for introducing a new “virtual” DOM is primarily:

The DOM is very heavy to implement outside of the browser, a lean and stripped down virtual DOM can be used everywhere
Most virtual DOMs do not focus on ease of use in transformations
Other virtual DOMs cannot represent the syntax of HTML in its entirety (think comments and document types)
Neither the DOM nor virtual DOMs focus on positional information

Types

If you are using TypeScript, you can use the hast types by installing them with npm:

npm install @types/hast

Nodes (abstract)

`Literal`

interface Literal <: UnistLiteral {
  value: string
}

Literal (UnistLiteral) represents a node in hast containing a value.

`Parent`

interface Parent <: UnistParent {
  children: [Comment | Doctype | Element | Text]
}

Parent (UnistParent) represents a node in hast containing other nodes (said to be children).

Its content is limited to only other hast content.

Nodes

`Comment`

interface Comment <: Literal {
  type: 'comment'
}

Comment (Literal) represents a Comment ([DOM]).

For example, the following HTML:

<!--Charlie-->

Yields:

{type: 'comment', value: 'Charlie'}

`Doctype`

interface Doctype <: Node {
  type: 'doctype'
}

Doctype (Node) represents a DocumentType ([DOM]).

For example, the following HTML:

<!doctype html>

Yields:

{type: 'doctype'}

`Element`

interface Element <: Parent {
  type: 'element'
  tagName: string
  properties: Properties?
  content: Root?
  children: [Comment | Element | Text]
}

Element (Parent) represents an Element ([DOM]).

A tagName field must be present. It represents the element’s local name ([DOM]).

The properties field represents information associated with the element. The value of the properties field implements the Properties interface.

If the tagName field is 'template', a content field can be present. The value of the content field implements the Root interface.

If the tagName field is 'template', the element must be a leaf.

If the tagName field is 'noscript', its children should be represented as if scripting is disabled ([HTML]).

For example, the following HTML:

<a href="https://alpha.com" class="bravo" download></a>

Yields:

{
  type: 'element',
  tagName: 'a',
  properties: {
    href: 'https://alpha.com',
    className: ['bravo'],
    download: true
  },
  children: []
}

`Root`

interface Root <: Parent {
  type: 'root'
}

Root (Parent) represents a document.

Root can be used as the root of a tree, or as a value of the content field on a 'template' Element, never as a child.

`Text`

interface Text <: Literal {
  type: 'text'
}

Text (Literal) represents a Text ([DOM]).

For example, the following HTML:

<span>Foxtrot</span>

Yields:

{
  type: 'element',
  tagName: 'span',
  properties: {},
  children: [{type: 'text', value: 'Foxtrot'}]
}

Other types

`Properties`

interface Properties {}

Properties represents information associated with an element.

Every field must be a PropertyName and every value a PropertyValue.

`PropertyName`

typedef string PropertyName

Property names are keys on Properties objects and reflect HTML, SVG, ARIA, XML, XMLNS, or XLink attribute names. Often, they have the same value as the corresponding attribute (for example, id is a property name reflecting the id attribute name), but there are some notable differences.

These rules aren’t simple. Use hastscript (or property-information directly) to help.

The following rules are used to transform HTML attribute names to property names. These rules are based on how ARIA is reflected in the DOM ([ARIA]), and differs from how some (older) HTML attributes are reflected in the DOM.

Any name referencing a combinations of multiple words (such as “stroke miter limit”) becomes a camelcased property name capitalizing each word boundary. This includes combinations that are sometimes written as several words. For example, stroke-miterlimit becomes strokeMiterLimit, autocorrect becomes autoCorrect, and allowfullscreen becomes allowFullScreen.
Any name that can be hyphenated, becomes a camelcased property name capitalizing each boundary. For example, “read-only” becomes readOnly.
Compound words that are not used with spaces or hyphens are treated as a normal word and the previous rules apply. For example, “placeholder”, “strikethrough”, and “playback” stay the same.
Acronyms in names are treated as a normal word and the previous rules apply. For example, itemid become itemId and bgcolor becomes bgColor.

Exceptions

Some jargon is seen as one word even though it may not be seen as such by dictionaries. For example, nohref becomes noHref, playsinline becomes playsInline, and accept-charset becomes acceptCharset.

The HTML attributes class and for respectively become className and htmlFor in alignment with the DOM. No other attributes gain different names as properties, other than a change in casing.

Notes

property-information lists all property names.

The property name rules differ from how HTML is reflected in the DOM for the following attributes:

View list of differences

charoff becomes charOff (not chOff)
char stays char (does not become ch)
rel stays rel (does not become relList)
checked stays checked (does not become defaultChecked)
muted stays muted (does not become defaultMuted)
value stays value (does not become defaultValue)
selected stays selected (does not become defaultSelected)
allowfullscreen becomes allowFullScreen (not allowFullscreen)
hreflang becomes hrefLang, not hreflang
autoplay becomes autoPlay, not autoplay
autocomplete becomes autoComplete (not autocomplete)
autofocus becomes autoFocus, not autofocus
enctype becomes encType, not enctype
formenctype becomes formEncType (not formEnctype)
vspace becomes vSpace, not vspace
hspace becomes hSpace, not hspace
lowsrc becomes lowSrc, not lowsrc

`PropertyValue`

typedef any PropertyValue

Property values should reflect the data type determined by their property name. For example, the HTML <div hidden></div> has a hidden attribute, which is reflected as a hidden property name set to the property value true, and <input minlength="5">, which has a minlength attribute, is reflected as a minLength property name set to the property value 5.

In JSON, the value null must be treated as if the property was not included. In JavaScript, both null and undefined must be similarly ignored.

The DOM has strict rules on how it coerces HTML to expected values, whereas hast is more lenient in how it reflects the source. Where the DOM treats <div hidden="no"></div> as having a value of true and <img width="yes"> as having a value of 0, these should be reflected as 'no' and 'yes', respectively, in hast.

The reason for this is to allow plugins and utilities to inspect these non-standard values.

The DOM also specifies comma separated and space separated lists attribute values. In hast, these should be treated as ordered lists. For example, <div class="alpha bravo"></div> is represented as ['alpha', 'bravo'].

There’s no special format for the property value of the style property name.

Glossary

See the unist glossary.

List of utilities

See the unist list of utilities for more utilities.

hastscript — create trees
hast-util-assert — assert nodes
hast-util-class-list — simulate the browser’s classList API for hast nodes
hast-util-classnames — merge class names together
hast-util-embedded — check if a node is an embedded element
hast-util-excerpt — truncate the tree to a comment
hast-util-find-and-replace — find and replace text in a tree
hast-util-from-dom — transform from DOM tree
hast-util-from-html — parse from HTML
hast-util-from-parse5 — transform from Parse5’s AST
hast-util-from-selector — parse CSS selectors to nodes
hast-util-from-string — set the plain-text value of a node (textContent)
hast-util-from-text — set the plain-text value of a node (innerText)
hast-util-from-webparser — transform Webparser’s AST to hast
hast-util-has-property — check if an element has a certain property
hast-util-heading — check if a node is heading content
hast-util-heading-rank — get the rank (also known as depth or level) of headings
hast-util-interactive — check if a node is interactive
hast-util-is-body-ok-link — check if a link element is “Body OK”
hast-util-is-conditional-comment — check if node is a conditional comment
hast-util-is-css-link — check if node is a CSS link
hast-util-is-css-style — check if node is a CSS style
hast-util-is-element — check if node is a (certain) element
hast-util-is-event-handler — check if property is an event handler
hast-util-is-javascript — check if node is a JavaScript script
hast-util-labelable — check if node is labelable
hast-util-parse-selector — create an element from a simple CSS selector
hast-util-phrasing — check if a node is phrasing content
hast-util-raw — parse a tree again
hast-util-reading-time — estimate the reading time
hast-util-sanitize — sanitize nodes
hast-util-script-supporting — check if node is script-supporting content
hast-util-select — querySelector, querySelectorAll, and matches
hast-util-sectioning — check if node is sectioning content
hast-util-shift-heading — change heading rank (depth, level)
hast-util-table-cell-style — transform deprecated styling attributes on table cells to inline styles
hast-util-to-dom — transform to a DOM tree
hast-util-to-estree — transform to estree (JavaScript AST) JSX
hast-util-to-html — serialize as HTML
hast-util-to-jsx — transform hast to JSX
hast-util-to-jsx-runtime — transform to preact, react, solid, svelte, vue, etc
hast-util-to-mdast — transform to mdast (markdown)
hast-util-to-nlcst — transform to nlcst (natural language)
hast-util-to-parse5 — transform to Parse5’s AST
hast-util-to-portable-text — transform to portable text
hast-util-to-string — get the plain-text value of a node (textContent)
hast-util-to-text — get the plain-text value of a node (innerText)
hast-util-to-xast — transform to xast (xml)
hast-util-transparent — check if node is transparent content
hast-util-truncate — truncate the tree to a certain number of characters
hast-util-whitespace — check if node is inter-element whitespace

Related HTML utilities

a-rel — List of link types for rel on a / area
aria-attributes — List of ARIA attributes
collapse-white-space — Replace multiple white-space characters with a single space
comma-separated-tokens — Parse/stringify comma separated tokens
html-tag-names — List of HTML tag names
html-dangerous-encodings — List of dangerous HTML character encoding labels
html-encodings — List of HTML character encoding labels
html-element-attributes — Map of HTML attributes
html-event-attributes — List of HTML event handler content attributes
html-void-elements — List of void HTML tag names
link-rel — List of link types for rel on link
mathml-tag-names — List of MathML tag names
meta-name — List of values for name on meta
property-information — Information on HTML properties
space-separated-tokens — Parse/stringify space separated tokens
svg-tag-names — List of SVG tag names
svg-element-attributes — Map of SVG attributes
svg-event-attributes — List of SVG event handler content attributes
web-namespaces — Map of web namespaces

References

unist: Universal Syntax Tree. T. Wormer; et al.
JavaScript: ECMAScript Language Specification. Ecma International.
HTML: HTML Standard, A. van Kesteren; et al. WHATWG.
DOM: DOM Standard, A. van Kesteren, A. Gregor, Ms2ger. WHATWG.
SVG: Scalable Vector Graphics (SVG), N. Andronikos, R. Atanassov, T. Bah, B. Birtles, B. Brinza, C. Concolato, E. Dahlström, C. Lilley, C. McCormack, D. Schepers, R. Schwerdtfeger, D. Storey, S. Takagi, J. Watt. W3C.
MathML: Mathematical Markup Language Standard, D. Carlisle, P. Ion, R. Miner. W3C.
ARIA: Accessible Rich Internet Applications (WAI-ARIA), J. Diggs, J. Craig, S. McCarron, M. Cooper. W3C.
JSON The JavaScript Object Notation (JSON) Data Interchange Format, T. Bray. IETF.
Web IDL: Web IDL, C. McCormack. W3C.

Security

As hast represents HTML, and improper use of HTML can open you up to a cross-site scripting (XSS) attack, improper use of hast is also unsafe. Always be careful with user input and use hast-util-santize to make the hast tree safe.

mdast — Markdown Abstract Syntax Tree format
nlcst — Natural Language Concrete Syntax Tree format
xast — Extensible Abstract Syntax Tree

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help. Ideas for new utilities and tools can be posted in syntax-tree/ideas.

A curated list of awesome syntax-tree, unist, mdast, hast, xast, and nlcst resources can be found in awesome syntax-tree.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by @wooorm.

Special thanks to @eush77 for their work, ideas, and incredibly valuable feedback!

Thanks to @andrewburgess, @arobase-che, @arystan-sw, @BarryThePenguin, @brechtcs, @ChristianMurphy, @ChristopherBiscardi, @craftzdog, @cupojoe, @davidtheclark, @derhuerst, @detj, @DxCx, @erquhart, @flurmbo, @Hamms, @Hypercubed, @inklesspen, @jeffal, @jlevy, @Justineo, @lfittl, @kgryte, @kmck, @kthjm, @KyleAMathews, @macklinu, @medfreeman, @Murderlon, @nevik, @nokome, @phiresky, @revolunet, @rhysd, @Rokt33r, @rubys, @s1n, @Sarah-Seo, @sethvincent, @simov, @s1n, @StarpTech, @stefanprobst, @stuff, @subhero24, @tripodsan, @tunnckoCore, @vhf, @voischev, and @zjaml, for contributing to hast and related projects!

License

CC-BY-4.0 © Titus Wormer

hast's People

Contributors

Stargazers

Watchers

hast's Issues

Interested in htmlparser2 AST to HAST ?

I need htmlparser2 because it's much less strict than parse5. It would allow us to parse any HTML content without any preprocessing based on the browser specification this e.g useful for tools like formatter etc...

Simplify doctypes

Subject of the issue

The doctype node is currently more complex than what HTML supports.

Now that we have xast, there’s definitely no reason to be so complex.

Actual behaviour

There are name, public, and system fields which could have any string value, adding unnecessary complexity.

Expected behaviour

HTML allows the value of name to be case-insensitive html, no public, and an optional system of about:legacy-compat.

I propose removing name, public, and system, and thus making a doctype look like this:

{"type": "doctype"}

Other tools (specifically, hast-util-to-html) could have an option to add the doctype legacy string (SYSTEM "about:legacy-compat"), and maybe even have an option to use HTML instead of html.

What should the format of properties be?

The current hast uses VProperties (from virtual-dom) for attributes on elements.

As previously outlined in the readme:

VTree’s property format is built for speed, not for easy of use. Developers
need intimate knowledge of where things should go. Should className go on
node.properties or node.properties.attributes? (Answer: “To set
className on SVG elements, it's necessary to use [attributes]”).
That, and the hassle of using
a = node.properties.attributes; a.className = (a.className ? a.className + ' ' : '') + 'foo';
to add a class, simply sucks.

With hast, I’d like a single object (properties) instead of multiple (properties, properties.*, properties.attributes).

In addition, I’d like to support Array’s as values (in addition to string, number, or boolean) to reflect “space separated tokens” and “comma separated tokens”.

space separated values: className, rel, httpEquiv, ping, sizes, sandbox, headers, sorted, accept-charset, for, autocomplete, autofill, itemtype, itemref, itemprop, accesskey, dropzone;
comma separated values: srcset, accept, coords, cols, rows.

Most properties are space-separated, so that should be the default when compiling an Array property value, with a white-list for comma separated values.

Also, the style property should be an object mapping camelCased CSS properties, e.g., borderBottomWidth, to strings.

Need `raw` node type

Type for Raw HTML

There's no type for raw node.

Problem

In TypeScript, when try to push raw type nodes (e.g. u('raw', 'some raw html')) on to a hast tree, you will get a complier error, because raw is not included in acceptable children types.

Expected behaviour

Should there be a Raw type declared publicly?

How should unknown elements be handled?

Unknown elements, such as SVG and MathML, but also ng-button will probably just be recognised as elements.

I’m not yet sure how self-closing elements are handled (e.g., in SVG)...

What’s the difference between hast and PostHTML

PostHTML looks very similar to hast and its plugins are to some extent reminiscent of mdast's attacher—transformer pattern.

The main difference seems to be that hast is going to use the same tree format as mdast and retext and will benefit from unist utils, but maybe there are some other notable differences.

Have you seen that project before and how can you compare it to hast?

Incorrect parsing of the `sizes` attribute

The sizes attribute is parsed differently depending on whether it is part of a link element or an img element. The pull request referenced above would change sizes to be commaOrSpaceSeparated which would be parsed here as splitting on commas first, and then on spaces after rejoining the array into a space separated string.

For the element:

<img
    sizes="(max-width: 600px) 100vw, 800px"
    src="example.png"
    srcset="example.png 400px, example.png 800px"
/>

the sizes attribute would be parsed and turned into the array

["(max-width:", "600px)", "100vw", "800px"]

instead of the expected

["(max-width: 600px) 100vw", "800px"]

I was thinking of changing the above referenced line to be:

result = result.indexOf(',') >= 0 ? commas(result) : spaces(result)

but that breaks existing tests since they seem to be handling the case of "comma AND space separated".

I'm also not sure if that change would cover all cases since it seems that the sizes grammar would allow for a solo <media-condition> <source-size-value> which shouldn't be split up

`fragment` node type

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Problem

It could be useful to have a special type for fragments.

A fragment node type could help some plugins which need to preserve some isolated data for itself or replace a node with an array of nodes. Imagine a component for this case, that results in multiple nodes and has no single 'root' element. The idea is similar to JSX's fragment.

The plugins I bear in mind:

include — another file is imported but it has multiple nodes. It can be done with a splice() like below:
```
parent.children.splice(index, 1, ...loadedRoot.children)
```
but having a fragment node enables storing additional information (original properties, actual file location etc) in the node for possible future use.
components

There is no limitation for number of children for the <template> element, i.e. a web component could yield multiple children instead itself. If a component gets replaced (like in rehype-componets plugin), it might be useful to preserve old data for other plugins that work on data based on that plugin.

Solution

Having a following interface could be useful.

interface Fragment <: Parent {
  type: "fragment"
}

Alternatives

With rehype, nested root nodes effectively work as fragments. This was tested running the following example:

const tree = {
  type: 'root',
  children: [
    { type: 'text', value: 'outside nested root\n' },
    {
      type: 'element',
      tagName: 'div',
      children: [
        {
          type: 'root',
          children: [{
          type: 'element',
          tagName: 'span',
          children: [{ type: 'text', value: 'inside nested root' }],
        }]}
      ]
    }
  ],
}

const file = await rehype()
  .data('settings', { fragment: true })
  .stringify(tree)

console.log(String(file))

This code does not throw and yields the following HTML:

outside nested root
<div><span>inside nested root</span></div>

The proposal is actually made for a sanity in specific cases. I understand that this is not a part of HTML and is rather a common practice in frameworks like React, Vue etc. and could be rejected as well as #20. However, as it could be useful for some plugins and does not require a lot of effort in the implementation, I am bringing this to your consideration.

Possible typo in code example and a question about difference between hast and DOM property values

I'm new to hast so I might be misunderstanding this, but I think the "no" in <div hidden=no></div> in the following excerpt of readme.md should be in quotation marks:

The DOM is strict in reflecting HTML and hast is not.
Where the DOM treats <div hidden=no></div> as having a value of true and
<img width="yes"> as having a value of 0, these should be reflected as
'no' and 'yes', respectively, in hast.

Also, I'm confused about the meaning of the first sentence in this excerpt. If hast always reflects the written value of html and the DOM is flexible about it, shouldn't this sentence read "hast is strict in reflecting HTML and the DOM is not"? Thanks for any clarification

Namespaces

TL;DR

I’m thinking out loud. We need namespace information. I can think of three solutions. Not sure which is best.

Introduction

HTML has the concept of elements: things like <strong></strong> are normal elements. There’s a subcategory of “foreign elements”: those from MathML (mi) or from SVG (rect).

A practical example of why this information is needed is because of tag-name normalisation: in HTML, tag-names are case-insensitive. In SVG or MathML, they are not. And, unfortunately tag-names themselves cannot be used to detect whether an element is foreign or not, because there are elements which exist in multiple spaces. For example: var in HTML and MathML, and a in HTML and SVG.

Take the following code:

<!doctype html>
<title>Foreign elements in HTML</title>
<h1>HTML</h1>
<a href="#">HTML link</a>
<var>htmlVar</var>
<svg>
  <a href="#">SVG link</a>
  <span>SVG</span>
  <a href="#">SVG link</a>
</svg>
<math>
  <mi>mathMLVar</mi>
  <span>MathML</span>
  <mi>mathMLVar</mi>
</math>

When running the following script:

var length = document.all.length;
var index = -1;
var node;
while (++index < length) node = document.all[index], console.log([node.tagName, node.namespaceURI, node.textContent]);

Yields:

[Log] ["HTML", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML↵HTML↵HTML link↵htmlVar↵↵ …G↵  SVG link↵↵↵  mathMLVar↵  MathML↵  mathMLVar↵↵"] (3)
[Log] ["HEAD", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML↵"] (3)
[Log] ["TITLE", "http://www.w3.org/1999/xhtml", "Foreign elements in HTML"] (3)
[Log] ["BODY", "http://www.w3.org/1999/xhtml", "HTML↵HTML link↵htmlVar↵↵  SVG link↵  SVG↵  SVG link↵↵↵  mathMLVar↵  MathML↵  mathMLVar↵↵"] (3)
[Log] ["H1", "http://www.w3.org/1999/xhtml", "HTML"] (3)
[Log] ["A", "http://www.w3.org/1999/xhtml", "HTML link"] (3)
[Log] ["VAR", "http://www.w3.org/1999/xhtml", "htmlVar"] (3)
[Log] ["svg", "http://www.w3.org/2000/svg", "↵  SVG link↵  "] (3)
[Log] ["a", "http://www.w3.org/2000/svg", "SVG link"] (3)
[Log] ["SPAN", "http://www.w3.org/1999/xhtml", "SVG"] (3)
[Log] ["A", "http://www.w3.org/1999/xhtml", "SVG link"] (3)
[Log] ["math", "http://www.w3.org/1998/Math/MathML", "↵  mathMLVar↵  "] (3)
[Log] ["mi", "http://www.w3.org/1998/Math/MathML", "mathMLVar"] (3)
[Log] ["SPAN", "http://www.w3.org/1999/xhtml", "MathML"] (3)
[Log] ["MI", "http://www.w3.org/1999/xhtml", "mathMLVar"] (3)

Note 1: Non-foreign elements break out of their foreign context.
Note 2: HTML is case-insensitive (normalised to upper-case), foreign elements are case-sensitive.

Proposal

I propose either of the following:

Add namespace on some nodes (notably, root, <mathml>, <svg>). To determine the namespace of a node, check its closest ancestor with a namespace.
Add namespace on root nodes (and wrap <svg> and <mathml> in roots). To determine the namespace of a node, check its closest root for a namespace. This changes the semantics of roots somewhat.
Add namespace on any element.

The downsides of the first two as that it’s hard to determine the namespace from an element in a syntax tree without ancestral getters. However, both make moving nodes around quite easy.
The latter is verbose, but does allow for easy access. However, it makes it easy for things to go wrong when shuffling nodes around.

Note: detecting namespaces upon creation (in rehype-parse), is very do-able. I’d like to make the usage of hastscript and transformers very easy too, though!

Do let me know about your thoughts on this!

Simplify directive nodes

Currently, there’s duplicate data in directives (processing instructions and declarations):

{
  "type": "directive",
  "name": "!doctype",
  "value": "!doctype html"
}

I’d like to propose the following instead:

{
  "type": "directive",
  "name": "!doctype",
  "value": "html"
}

But it’s breaking so it will have to wait a while.

Template tag behaviour

Hi, why is <template> not handled as every other element? In the current state, we always have to handle it explicitly this results in code like

minify(tree)
visitor(ast, (node) => {
	if (node.tagName === 'template') {
		minify(node.content)
		...
	}
})

proposal

Handle it like any other elmement

How to configure void and open-close node configuration

I was just working on hast-include (similar to posthtml-include), to show the difference between PostHTML and hast, and ran into the problem of configuring void nodes (used to control whether or not an ending tag is expected) and open-close nodes (used to determine which opening tags implicitly imply the closing of other elements).

Now, plug-ins might want to configure these mappings, e.g., the include plug-in defines a custom include element which is a void element. It should be parsed and compiled (and maybe treated by other plug-ins?) as void.

To ensure this, I propose exposing the void and open-close dictionaries on either file or processor.
E.g., file.namespace('hast').void or processor.void.

The disadvantage of this is that, when patching file, users have to keep track of that (e.g., when running parse, run, and compile with different parsers). The disadvantage of patching processor is that currently, Parser and Compiler have no access to it (but that’s fixable of course)!

Remove ancient nodes

htmlparser2 supports a weird mix of XML and HTML, such as processing instructions and directives. HTML, the standard, does not. For example, processingInstructions, directives (other than doctypes), and cdata are not supported in HTML.

There’s a new branch up for rehype which uses parse5, a standard compliant HTML parser. It doesn’t support processing instructions or cdata. I’m going to remove support for those. Plus, I’m replacing the directive with the one allowed directive: a doctype node.

As doctypes have a name, public identifier, and system identifier, maybe those should be supported on the interface?

interface Doctype <: Node {
  type: "doctype";
  name: string?;
  public: string?;
  system: string?;
}

Where:

<!DOCTYPE html>

Yields:

{
  "type": "doctype",
  "name": "html",
  "public": null,
  "system": null
}

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

syntax-tree / hast Goto Github PK

hast's Introduction

Contents

Introduction

Where this specification fits

Virtual DOM

Types

Nodes (abstract)

Literal

Parent

Nodes

Comment

Doctype

Element

Root

Text

Other types

Properties

PropertyName

Exceptions

Notes

PropertyValue

Glossary

List of utilities

Related HTML utilities

References

Security

Related

Contribute

Acknowledgments

License

hast's People

Contributors

Stargazers

Watchers

Forkers

hast's Issues

Subject of the issue

Actual behaviour

Expected behaviour

Type for Raw HTML

Problem

Expected behaviour

Initial checklist

Problem

Solution

Alternatives

TL;DR

Introduction

Proposal

proposal

Recommend Projects

Recommend Topics

Recommend Org

`Literal`

`Parent`

`Comment`

`Doctype`

`Element`

`Root`

`Text`

`Properties`

`PropertyName`

`PropertyValue`