Giter VIP home page Giter VIP logo

rehype's Introduction

rehype

Build Coverage Downloads Size Sponsors Backers Chat

rehype is a tool that transforms HTML with plugins. These plugins can inspect and change the HTML. You can use rehype on the server, the client, CLIs, deno, etc.

Intro

rehype is an ecosystem of plugins that work with HTML as structured data, specifically ASTs (abstract syntax trees). ASTs make it easy for programs to deal with HTML. We call those programs plugins. Plugins inspect and change trees. You can use the many existing plugins or you can make your own.

Contents

What is this?

With this project and a plugin, you can turn this HTML:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Saturn</title>
  </head>
  <body>
    <h1>Saturn</h1>
    <p>Saturn is a gas giant composed predominantly of hydrogen and helium.</p>
  </body>
</html>

…into the following HTML:

<!doctypehtml><html lang=en><meta charset=utf8><title>Saturn</title><h1>Saturn</h1><p>Saturn is a gas giant composed predominantly of hydrogen and helium.
Show example code
import rehypeParse from 'rehype-parse'
import rehypePresetMinify from 'rehype-preset-minify'
import rehypeStringify from 'rehype-stringify'
import {unified} from 'unified'

const file = await unified()
  .use(rehypeParse)
  .use(rehypePresetMinify)
  .use(rehypeStringify).process(`<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Saturn</title>
  </head>
  <body>
    <h1>Saturn</h1>
    <p>Saturn is a gas giant composed predominantly of hydrogen and helium.</p>
  </body>
</html>`)

console.log(String(file))

With another plugin, you can turn this HTML:

<h1>Hi, Saturn!</h1>

…into the following HTML:

<h2>Hi, Saturn!</h2>
Show example code
import rehypeParse from 'rehype-parse'
import rehypeStringify from 'rehype-stringify'
import {unified} from 'unified'
import {visit} from 'unist-util-visit'

const file = await unified()
  .use(rehypeParse, {fragment: true})
  .use(myRehypePluginToIncreaseHeadings)
  .use(rehypeStringify)
  .process('<h1>Hi, Saturn!</h1>')

console.log(String(file))

function myRehypePluginToIncreaseHeadings() {
  /**
   * @param {import('hast').Root} tree
   */
  return function (tree) {
    visit(tree, 'element', function (node) {
      if (['h1', 'h2', 'h3', 'h4', 'h5'].includes(node.tagName)) {
        node.tagName = 'h' + (Number(node.tagName.charAt(1)) + 1)
      }
    })
  }
}

You can use rehype for many different things. unified is the core project that transforms content with ASTs. rehype adds support for HTML to unified. hast is the HTML AST that rehype uses.

This GitHub repository is a monorepo that contains the following packages:

  • rehype-parse — plugin to take HTML as input and turn it into a syntax tree (hast)
  • rehype-stringify — plugin to take a syntax tree (hast) and turn it into HTML as output
  • rehypeunified, rehype-parse, and rehype-stringify, useful when input and output are HTML
  • rehype-cli — CLI around rehype to inspect and format HTML in scripts

When should I use this?

Depending on the input you have and output you want, you can use different parts of rehype. If the input is HTML, you can use rehype-parse with unified. If the output is HTML, you can use rehype-stringify with unified If both the input and output are HTML, you can use rehype on its own. When you want to inspect and format HTML files in a project, you can use rehype-cli.

Plugins

rehype plugins deal with HTML. You can choose from the many plugins that already exist. Here are three good ways to find plugins:

Some plugins are maintained by us here in the @rehypejs organization while others are maintained by folks elsewhere. Anyone can make rehype plugins, so as always when choosing whether to include dependencies in your project, make sure to carefully assess the quality of rehype plugins too.

Types

The rehype organization and the unified collective as a whole is fully typed with TypeScript. Types for hast are available in @types/hast.

Compatibility

Projects maintained by the unified collective are compatible with maintained versions of Node.js.

When we cut a new major release, we drop support for unmaintained versions of Node. This means we try to keep the current release line compatible with Node.js 16.

Security

As improper use of HTML can open you up to a cross-site scripting (XSS) attacks, use of rehype can also be unsafe. Use rehype-sanitize to make the tree safe.

Use of rehype plugins could also open you up to other attacks. Carefully assess each plugin and the risks involved in using them.

For info on how to submit a report, see our security policy.

Contribute

See contributing.md in rehypejs/.github for ways to get started. See support.md for ways to get help. Join us in Discussions to chat with the community and contributors.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

Sponsor

Support this effort and give back by sponsoring on OpenCollective!

Vercel

Motif

HashiCorp

GitBook

Gatsby

Netlify

Coinbase

ThemeIsle

Expo

Boost Note

Markdown Space

Holloway


You?

License

MIT © Titus Wormer

rehype's People

Contributors

christianmurphy avatar davidtheclark avatar etaoins avatar florentb avatar gorango avatar greenkeeperio-bot avatar hbsnow avatar iloveitaly avatar jamesmessinger avatar jaywcjlove avatar kmck avatar lpsinger avatar luk707 avatar lunaticmuch avatar makenowjust avatar marekweb avatar marko-knoebl avatar michaelnisi avatar mrzmmr avatar nbnotabene avatar remcohaszing avatar robot-inventor avatar rokt33r avatar rsclarke avatar shreshthmohan avatar starptech avatar tani avatar tomeraberbach avatar viktor-yakubiv avatar wooorm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rehype's Issues

Republish v9 series to include type updates

Initial checklist

Affected packages and versions

9.0.3

Link to runnable example

No response

Steps to reproduce

Build with TypeScript 5 with Node16 module resolution.
Along side hast-util-to-html version 8.0.4.

The published typings point to internals of hast-util-to-html which are no longer accessible.
https://www.npmjs.com/package/rehype-stringify/v/9.0.3?activeTab=code

/** @type {import('unified').Plugin<[Options?]|Array<void>, Node, string>} */
export default function rehypeStringify(
  config: void | import('hast-util-to-html/lib/types').Options | undefined
): void
export type Root = import('hast').Root
export type Node = Root | Root['children'][number]
export type Options = import('hast-util-to-html').Options

Expected behavior

No error and published package should point to the exported option

Actual behavior

node_modules/rehype-stringify/lib/index.d.ts:3:25 - error TS2307: Cannot find module 'hast-util-to-html/lib/types' or its corresponding type declarations.

3   config: void | import('hast-util-to-html/lib/types').Options | undefined

Runtime

Node v16

Package manager

npm 8

OS

Linux

Build and bundle tools

Vite

Unexpected html codes when parsing attributes with quotes

Subject of the issue

The html codes for quotes char on data-attributes are the js ones but not the html ones.

  • ' -> &#x27; instead of &#x39; maybe because ' is \u0027 in js.
  • " -> &#x22; instead of &#x34; maybe because ' is \u0022 in js.

I created a failing test for this bug on a fresh fork of the repo. https://github.com/benabel/rehype/commit/09f20182aec9c22bb482d5d1112d1b4f728467d6

Your environment

  • OS: linux
  • Packages: rehype-stringify
  • Env: yarn

Steps to reproduce

Stringify this html code, or run test api on the fork: https://github.com/benabel/rehype/commit/09f20182aec9c22bb482d5d1112d1b4f728467d6

<p data-content="This the new example with a 'quotation' mark"></p>
<p data-content='This the new example with a "quotation" mark'></p>

Expected behavior

<p data-content="This the new example with a &#x39;quotation&#x39; mark"></p>
<p data-content="This the new example with a &#x34;quotation&#x34; mark"></p>

Actual behavior

<p data-content="This the new example with a &#x27;quotation&#x27; mark"></p>
<p data-content="This the new example with a &#x22;quotation&#x22; mark"></p>'

Incorrectly parsed dash-cased svg properties as camelCase

Initial checklist

Affected packages and versions

rehype-parse

Link to runnable example

https://codesandbox.io/s/rehype-debug-forked-llth4

Steps to reproduce

Parse svg with clip-rule via rehype-parse

<svg class='sc-gGLxEB set-color' fill='#8A8F98' height='16' stroke='none' viewBox='0 0 24 24' width='16'>
  <path clip-rule='evenodd'
        d='M6 0C2.68629 0 0 2.68629 0 6V18C0 21.3137 2.68629 24 6 24H18C21.3137 24 24 21.3137 24 18V6C24 2.68629 21.3137 0 18 0H6ZM7.54545 7H10.4545C10.7558 7 11 7.24421 11 7.54545V10.4545C11 10.7558 10.7558 11 10.4545 11H7.54545C7.24421 11 7 10.7558 7 10.4545V7.54545C7 7.24421 7.24421 7 7.54545 7ZM13.5455 7H16.4545C16.7558 7 17 7.24421 17 7.54545V10.4545C17 10.7558 16.7558 11 16.4545 11H13.5455C13.2442 11 13 10.7558 13 10.4545V7.54545C13 7.24421 13.2442 7 13.5455 7ZM10.4545 13H7.54545C7.24421 13 7 13.2442 7 13.5455V16.4545C7 16.7558 7.24421 17 7.54545 17H10.4545C10.7558 17 11 16.7558 11 16.4545V13.5455C11 13.2442 10.7558 13 10.4545 13ZM13.5455 13H16.4545C16.7558 13 17 13.2442 17 13.5455V16.4545C17 16.7558 16.7558 17 16.4545 17H13.5455C13.2442 17 13 16.7558 13 16.4545V13.5455C13 13.2442 13.2442 13 13.5455 13Z'
        fill-rule='evenodd'
        stroke='none'></path>
</svg>

Expected behavior

clip-rule should stay in properties as is

Actual behavior

Now clip-rule property stay in AST as clipRule.

Runtime

Node v16

Package manager

yarn v1

OS

macOS

Build and bundle tools

No response

Trailing whitespace in element is lost

Subject of the issue

Trailing whitespace in element is lost if followed by text node.

a <strong>b </strong>c

Your environment

  • OS: macOS
  • Packages: rehype
  • Env: Node 12.16.3

Steps to reproduce

I tried to create a test, but I am not confident that it is 100% correct, please double check
#36

Expected behavior

The white space should be inside the strong element

Actual behavior

There is no white space

Parsing as document or fragment

Currently, there’s no way to add elements for optional opening tags. This should of course be available.
I’d say the default to be document mode.

SVG attributes getting transformed to invalid camel case

I'm trying to use rehype to process HTML containing inline SVGs, and the SVGs are coming out broken. Attributes like stroke-linecap="round" in the input are coming out invalid as strokeLineCap="round" in the output.

Steps to reproduce

I've set up a test repo demonstrating the issue. Here's the code I'm running there. I hooked up the code to run in a GitHub Actions workflow too so you can see it happen for yourself there together with all the gory details about the environment. It seems to be environment-independent anyway as I first encountered this on my MacBook.

const rehype = require("rehype")
const processor = rehype()

const html = `
<!doctype html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="utf-8" />
  </head>
  <body>
    <svg xmlns="http://www.w3.org/2000/svg" stroke-linecap="round" stroke-linejoin="round" viewBox="0 0 8 8">
      <path stroke="#fff1e8" d="M0 6V3h1l1 1v2"/>
    </svg>
  </body>
</html>

console.log(processor.processSync(html).toString())

Expected behavior

The SVG should come back out the other side processor.processSync(html).toString() and still be valid.

Actual behavior

<!doctype html><html lang="en" dir="ltr"><head>
    <meta charset="utf-8">
  </head>
  <body>
    <svg xmlns="http://www.w3.org/2000/svg" strokeLineCap="round" strokeLineJoin="round" viewBox="0 0 8 8">
      <path stroke="#fff1e8" d="M0 6V3h1l1 1v2"></path>
    </svg>


</body></html>

I'm not 100% convinced this is a bug yet, so I'm half-anticipating hearing that I've misunderstood something here. Still thought it was worth reporting though just in case!

Parser incorrectly reads image srcset when containing commas in image URL

Initial checklist

Affected packages and versions

8.0.3

Link to runnable example

No response

Steps to reproduce

The issue occurs when there is a comma included in the image URLs of the srcset field

My current flow of data:

Example raw HTML:

<img loading=\"lazy\" width=\"2560\" height=\"1504\" src=\"https://res.cloudinary.com/colbycloud/images/w_2560,h_1504/f_auto,q_auto/v1636561367/nextjs-app-stranger-things-wiki/nextjs-app-stranger-things-wiki.jpg?_i=AA\" alt=\"Website with grid of characters from Stranger Things\" class=\"wp-image-847\" srcset=\"https://res.cloudinary.com/colbycloud/images/w_2560,h_1504/f_auto,q_auto/v1636561367/nextjs-app-stranger-things-wiki/nextjs-app-stranger-things-wiki.jpg?_i=AA 2560w, https://res.cloudinary.com/colbycloud/images/w_300,h_176,c_scale/f_auto,q_auto/v1636561367/nextjs-app-stranger-things-wiki/nextjs-app-stranger-things-wiki-300x176.jpg?_i=AA 300w, https://res.cloudinary.com/colbycloud/images/w_1024,h_601,c_scale/f_auto,q_auto/v1636561367/nextjs-app-stranger-things-wiki/nextjs-app-stranger-things-wiki-1024x601.jpg?_i=AA 1024w, https://res.cloudinary.com/colbycloud/images/w_768,h_451,c_scale/f_auto,q_auto/v1636561367/nextjs-app-stranger-things-wiki/nextjs-app-stranger-things-wiki-768x451.jpg?_i=AA 768w, https://res.cloudinary.com/colbycloud/images/w_1536,h_902,c_scale/f_auto,q_auto/v1636561367/nextjs-app-stranger-things-wiki/nextjs-app-stranger-things-wiki-1536x902.jpg?_i=AA 1536w, https://res.cloudinary.com/colbycloud/images/w_2048,h_1203,c_scale/f_auto,q_auto/v1636561367/nextjs-app-stranger-things-wiki/nextjs-app-stranger-things-wiki-2048x1203.jpg?_i=AA 2048w\" sizes=\"(max-width: 2560px) 100vw, 2560px\" />

Expected behavior

Parsed srcset property should include a way to distinguish the value that relates multiple values of an image URL to a size

What the parsed values may look like if still following a similar comma delimited pattern:

[
  'https://res.cloudinary.com/colbycloud/images/w_2560,h_829/f_auto,q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images.jpg?_i=AA 2560w',
  'https://res.cloudinary.com/colbycloud/images/w_300,h_97,c_scale/f_auto,q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-300x97.jpg?_i=AA 300w',
  'https://res.cloudinary.com/colbycloud/images/w_1024,h_332,c_scale/f_auto,q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-1024x332.jpg?_i=AA 1024w',
  'https://res.cloudinary.com/colbycloud/images/w_768,h_249,c_scale/f_auto,q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-768x249.jpg?_i=AA 768w',
  'https://res.cloudinary.com/colbycloud/images/w_1536,h_498,c_scale/f_auto,q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-1536x498.jpg?_i=AA 1536w',
  'https://res.cloudinary.com/colbycloud/images/w_2048,h_664,c_scale/f_auto,q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-2048x664.jpg?_i=AA 2048w'
]

Actual behavior

When a comma is included in the URL of images, the srcset sees that as a delimiting character and incorrectly parses the values

Example when parsed:

[
  'https://res.cloudinary.com/colbycloud/images/w_2560',
  'h_829/f_auto',
  'q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images.jpg?_i=AA 2560w',
  'https://res.cloudinary.com/colbycloud/images/w_300',
  'h_97',
  'c_scale/f_auto',
  'q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-300x97.jpg?_i=AA 300w',
  'https://res.cloudinary.com/colbycloud/images/w_1024',
  'h_332',
  'c_scale/f_auto',
  'q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-1024x332.jpg?_i=AA 1024w',
  'https://res.cloudinary.com/colbycloud/images/w_768',
  'h_249',
  'c_scale/f_auto',
  'q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-768x249.jpg?_i=AA 768w',
  'https://res.cloudinary.com/colbycloud/images/w_1536',
  'h_498',
  'c_scale/f_auto',
  'q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-1536x498.jpg?_i=AA 1536w',
  'https://res.cloudinary.com/colbycloud/images/w_2048',
  'h_664',
  'c_scale/f_auto',
  'q_auto/v1636561853/stranger-things-characters-images/stranger-things-characters-images-2048x664.jpg?_i=AA 2048w'
]

Runtime

Node v14

Package manager

yarn v1

OS

macOS

Build and bundle tools

Next.js

rehype-parse throws an error "cannot read property of undefined(reading spaceSeparated)" when html contain unencoded markup characters in <pre><code>.

Initial checklist

Affected packages and versions

[email protected]

Link to runnable example

https://code.juejin.cn/pen/7200643610050560037

Steps to reproduce

please use this project https://code.juejin.cn/pen/7200643610050560037 to reproduce.

Expected behavior

rehype succeed parsing the html content with some incorrect closing tags just like most browser do.

Actual behavior

It throws an error Cannot read properties of undefined (reading 'spaceSeparated')

Runtime

Node v14

Package manager

npm 6

OS

macOS

Build and bundle tools

Other (please specify in steps to reproduce)

rehype-parse: parse error in xml cdata with raw closing angle bracket

Initial checklist

Affected packages and versions

[email protected]

Link to runnable example

No response

Steps to reproduce

git clone https://github.com/milahu/docbook2md
cd docbook2md
git checkout  077fbf159f16e8781336a955ef0269ac9499c39e
./run.sh

main script: src/docbook2md.ts

input file: examples/attrsets.xml

relevant section:

<programlisting><![CDATA[
let set = { a = { b = 3; }; };
in lib.attrsets.attrByPath [ "a" "b" ] 0 set
=> 3
]]></programlisting>

Expected behavior

correctly parse xml cdata

{
  programlisting: {
    type: "element",
    tagName: "programlisting",
    children: [
      { type: "text", value: '\nlet set = { a = { b = 3; }; };\nin lib.attrsets.attrByPath [ "a" "b" ] 0 set\n=> 3' }
    ]
}

Actual behavior

the parser confuses the > in cdata with the end of cdata

{
  programlisting: {
    type: "element",
    tagName: "programlisting",
    children: [
      {
        type: "comment",
        value: '[CDATA[\nlet set = { a = { b = 3; }; };\nin lib.attrsets.attrByPath [ "a" "b" ] 0 set\n=',
        position: [Object]
      },
      { type: "text", value: "3 ]]>" }
    ]
  }
}

Runtime

Deno

Package manager

No response

OS

No response

Build and bundle tools

No response

rehype-stringify 10.0.0 does not work according to documentation

Initial checklist

Affected packages and versions

rehype-stringify

Link to runnable example

No response

Steps to reproduce

Follow the example from the docs, making sure you've installed [email protected].

Note that you get typescript failures.

Try to run the code. You'll get an error like

TypeError: Cannot `process` without `Compiler`
    at assertCompiler (file:///Users/ian/projects/com.ianwremmel/node_modules/unified/lib/index.js:520:11)
    at Function.process (file:///Users/ian/projects/com.ianwremmel/node_modules/unified/lib/index.js:377:5)
    at render (file:///Users/ian/projects/com.ianwremmel/build/index.js?t=1698196639195.3516:474:105)
    at file:///Users/ian/projects/com.ianwremmel/build/index.js?t=1698196639195.3516:484:52
    at Array.map (<anonymous>)
    at loader2 (file:///Users/ian/projects/com.ianwremmel/build/index.js?t=1698196639195.3516:483:11)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at Object.callRouteLoaderRR (/Users/ian/projects/com.ianwremmel/node_modules/@remix-run/server-runtime/dist/data.js:52:16)
    at callLoaderOrAction (/Users/ian/projects/com.ianwremmel/node_modules/@remix-run/router/router.ts:3778:16)
    at async Promise.all (index 0)

This issue was initially reported in #149. It's not clear to me how they updated to the latest version to fix it since the initial bug report was about the latest version.

Expected behavior

Markdown should compile

Actual behavior

TypeError: Cannot `process` without `Compiler`
    at assertCompiler (file:///Users/ian/projects/com.ianwremmel/node_modules/unified/lib/index.js:520:11)
    at Function.process (file:///Users/ian/projects/com.ianwremmel/node_modules/unified/lib/index.js:377:5)
    at render (file:///Users/ian/projects/com.ianwremmel/build/index.js?t=1698196639195.3516:474:105)
    at file:///Users/ian/projects/com.ianwremmel/build/index.js?t=1698196639195.3516:484:52
    at Array.map (<anonymous>)
    at loader2 (file:///Users/ian/projects/com.ianwremmel/build/index.js?t=1698196639195.3516:483:11)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at Object.callRouteLoaderRR (/Users/ian/projects/com.ianwremmel/node_modules/@remix-run/server-runtime/dist/data.js:52:16)
    at callLoaderOrAction (/Users/ian/projects/com.ianwremmel/node_modules/@remix-run/router/router.ts:3778:16)
    at async Promise.all (index 0)

Runtime

Other (please specify in steps to reproduce)

Package manager

npm 8

OS

macOS

Build and bundle tools

Remix

rehype parse generates an additional <p></p> for html content

Initial checklist

Affected packages and versions

[email protected]

Link to runnable example

https://stackblitz.com/edit/github-frqzag?file=src%2Findex.ts

Steps to reproduce

Paste following html to rehype parse:

<div class="page-body">
  <p id="13d0fd81-b7f7-47ca-ae21-5b96a36c5f23" class="">aaa
  <div class="indented">
    <p id="22a66b31-b267-4924-b0c6-cf08772184b6" class="">bbb
    <div class="indented">
      <p id="d07a767a-8b5b-4d07-bb20-7657e23bf3a0" class="">ccc</p>
    </div>
    </p>
    <p id="97cd6489-e270-41bb-833c-6b55cb4b2bf4" class="">ddd</p>
  </div>
  </p>
  <p id="ee10b0a1-ee8b-4f8a-9e1d-2b1c5f09006f" class="">eee</p>
</div>

Expected behavior

<div class="page-body">
  <p id="13d0fd81-b7f7-47ca-ae21-5b96a36c5f23" class="">aaa
  <div class="indented">
    <p id="22a66b31-b267-4924-b0c6-cf08772184b6" class="">bbb
    <div class="indented">
      <p id="d07a767a-8b5b-4d07-bb20-7657e23bf3a0" class="">ccc</p>
    </div>
    </p>
    <p id="97cd6489-e270-41bb-833c-6b55cb4b2bf4" class="">ddd</p>
  </div>
  </p>
  <p id="ee10b0a1-ee8b-4f8a-9e1d-2b1c5f09006f" class="">eee</p>
</div>

Actual behavior

<p id="13d0fd81-b7f7-47ca-ae21-5b96a36c5f23" class="">aaa
  </p><div class="indented">
    <p id="22a66b31-b267-4924-b0c6-cf08772184b6" class="">bbb
    </p><div class="indented">
      <p id="d07a767a-8b5b-4d07-bb20-7657e23bf3a0" class="">ccc</p>
    </div>
+   <p></p>
    <p id="97cd6489-e270-41bb-833c-6b55cb4b2bf4" class="">ddd</p>
  </div>
+  <p></p>
  <p id="ee10b0a1-ee8b-4f8a-9e1d-2b1c5f09006f" class="">eee</p>

Runtime

Node v16

Package manager

pnpm

OS

Linux

Build and bundle tools

Vite

[docs]: Performance

Subject of the feature

Will be great to look at Performance compassion with other HTML parsers like here https://github.com/fb55/htmlparser2#performance

Problem

Expected behavior

There are not problems. I think a table like this would help increase popularity. And let the developers get some metrics.

Alternatives

I think we can use https://github.com/AndreasMadsen/htmlparser-benchmark to get results.

Why?

We want to integrate rehype in webpack and webpack ecosystem to handle HTML/HTML entrypoints. We evaluate existing solutions and their convenience. The project looks very good and has everything we need (API).

DELETE

Sorry, wrong tab of the browser

[BUG] unexpected parsing behaviour for the same html tag

Subject of the issue

When parsing the html into syntax tree, the same html tag with different properties produces conflit syntax nodes.

Your environment

  • OS:
  • win10 20h2
  • Packages:
  • rehype-parse
  • Env:
  • node 13, npm 6.12.0

Steps to reproduce

<card type="block" name="hr"></card>
<card type="block" name="localdoc"
    value="data:%7B%22status%22%3A%22done%22%2C%22source%22%3A%22transfer%22%2C%22src%22%3A%22https%3A%2F%2Fwww.yuque.com%2Fattachments%2Fyuque%2F0%2F2021%2Fpdf%2F2596791%2F1615361339259-26318a71-30c9-4f4f-ad67-d384b0b5c8af.pdf%22%2C%22name%22%3A%22Vue.js%E5%89%8D%E7%AB%AF%E5%BC%80%E5%8F%91%E5%9F%BA%E7%A1%80%E4%B8%8E%E9%A1%B9%E7%9B%AE%E5%AE%9E%E6%88%98%20-%20%E9%83%91%E9%9F%A9%E4%BA%AC(2020).pdf%22%2C%22ext%22%3A%22pdf%22%2C%22size%22%3A8758881%2C%22collapsed%22%3Atrue%2C%22margin%22%3Atrue%2C%22id%22%3A%22Uuryf%22%7D">
</card>

just parse the html content above and see its syntax tree, find out its difference.

Expected behavior

The output should be same with different properties (since they only differ from its properties)

What should happen?

But they produce different syntax trees.

Actual behavior

What happens instead?

see screenshot and find out its difference

image

image

Parsing fails with noscript tag in head

Failure to parse noscript correctly in head tag

rehype-parse fails to parse correctly when there is a noscript tag in the head. The text child is placed into the body tag, leaving an empty noscript tag in the head, and all the remaining tags are also placed into the body tag instead of the head.

My environment

  • OS: Ubuntu 19.10
  • Packages: rehype: ^10.0.0, rehype-parse: ^6.0.2,
  • Env: node v12.4.0, npm 6.14.4

Steps to reproduce

See this repo for a demo.

Expected behavior

Given the following input:

<html>
  <head>
    <noscript>&lt;h1&gt;Hello, world&lt;/h1&gt;</noscript>
    <style>
      body { background-color: #ccc; }
    </style>
  </head>
  <body>
    <h1>Goodbye, Earthlings!</h1>
  </body>
</html>

... and given the following code:

rehype().use(parse).process(source, (err, file) => {
  if (err) {
    console.log('error', err.message);
  } else {
    console.log(report(err || file));
    console.log(String(file));
  }
})

... I would have expected the last statement to produce (ignoring indentation and formatting):

<html><head>
    <noscript>&#x3C;h1>Hello, world&#x3C;/h1></noscript>
    <style>
      body { background-color: #ccc; }
    </style>
  </head>
  <body>
    <h1>Goodbye, Earthlings!</h1>
  </body>
</html>

Actual behavior

The following output is produced:

{
  "type": "root",
  "children": [
    {
      "type": "element",
      "tagName": "html",
      "properties": {},
      "children": [
        {
          "type": "element",
          "tagName": "head",
          "properties": {},
          "children": [
            {
              "type": "text",
              "value": "\n    ",
              "position": {
                "start": {
                  "line": 3,
                  "column": 9,
                  "offset": 16
                },
                "end": {
                  "line": 4,
                  "column": 5,
                  "offset": 21
                }
              }
            },
            {
              "type": "element",
              "tagName": "noscript",
              "properties": {},
              "children": [],
              "position": {
                "start": {
                  "line": 4,
                  "column": 5,
                  "offset": 21
                },
                "end": {
                  "line": 4,
                  "column": 15,
                  "offset": 31
                }
              }
            }
          ],
          "position": {
            "start": {
              "line": 3,
              "column": 3,
              "offset": 10
            },
            "end": {
              "line": 4,
              "column": 15,
              "offset": 31
            }
          }
        },
        {
          "type": "element",
          "tagName": "body",
          "properties": {},
          "children": [
            {
              "type": "text",
              "value": "<h1>Hello, world</h1>\n    ",
              "position": {
                "start": {
                  "line": 4,
                  "column": 15,
                  "offset": 31
                },
                "end": {
                  "line": 5,
                  "column": 5,
                  "offset": 80
                }
              }
            },
            {
              "type": "element",
              "tagName": "style",
              "properties": {},
              "children": [
                {
                  "type": "text",
                  "value": "\n      body { background-color: #ccc; }\n    ",
                  "position": {
                    "start": {
                      "line": 5,
                      "column": 12,
                      "offset": 87
                    },
                    "end": {
                      "line": 7,
                      "column": 5,
                      "offset": 131
                    }
                  }
                }
              ],
              "position": {
                "start": {
                  "line": 5,
                  "column": 5,
                  "offset": 80
                },
                "end": {
                  "line": 7,
                  "column": 13,
                  "offset": 139
                }
              }
            },
            {
              "type": "text",
              "value": "\n  \n  \n    ",
              "position": {
                "start": {
                  "line": 7,
                  "column": 13,
                  "offset": 139
                },
                "end": {
                  "line": 10,
                  "column": 5,
                  "offset": 163
                }
              }
            },
            {
              "type": "element",
              "tagName": "h1",
              "properties": {},
              "children": [
                {
                  "type": "text",
                  "value": "Goodbye, Earthlings!",
                  "position": {
                    "start": {
                      "line": 10,
                      "column": 9,
                      "offset": 167
                    },
                    "end": {
                      "line": 10,
                      "column": 29,
                      "offset": 187
                    }
                  }
                }
              ],
              "position": {
                "start": {
                  "line": 10,
                  "column": 5,
                  "offset": 163
                },
                "end": {
                  "line": 10,
                  "column": 34,
                  "offset": 192
                }
              }
            },
            {
              "type": "text",
              "value": "\n  \n\n",
              "position": {
                "start": {
                  "line": 10,
                  "column": 34,
                  "offset": 192
                },
                "end": {
                  "line": 13,
                  "column": 1,
                  "offset": 211
                }
              }
            }
          ]
        }
      ],
      "position": {
        "start": {
          "line": 2,
          "column": 1,
          "offset": 1
        },
        "end": {
          "line": 13,
          "column": 1,
          "offset": 211
        }
      }
    }
  ],
  "data": {
    "quirksMode": true
  },
  "position": {
    "start": {
      "line": 1,
      "column": 1,
      "offset": 0
    },
    "end": {
      "line": 13,
      "column": 1,
      "offset": 211
    }
  }
}
<html><head>
    <noscript></noscript></head><body>&#x3C;h1>Hello, world&#x3C;/h1>
    <style>
      body { background-color: #ccc; }
    </style>
  
  
    <h1>Goodbye, Earthlings!</h1>
  

</body></html>

Prefer explicit options over implicit settings

Initial checklist

Affected packages and versions

latest

Link to runnable example

No response

Steps to reproduce

const settings = Object.assign({}, options, processorSettings)

Expected behavior

Object.assign({}, processorSettings, options)

Actual behavior

Object.assign({}, options, processorSettings)

Runtime

Node v16

Package manager

No response

OS

No response

Build and bundle tools

No response

Add Support for Tag Swapping on Rehype-Stringify

Initial checklist

Problem

rehype-stringify should have a components option just like rehype-react.

This problem exists because I want to be able to convert my markdown to the limited set of HTML that Mastodon provides for their posts. I plan on implementing custom classes for formatting html tags in a way that is usable in a Mastodon post.

Solution

Add support for changing the html tags in rehype-stringify. Using the same format that rehype-react is fine

Alternatives

Convert to react first, then render results

Single quotes in style attributes are turned into html entities - css lint error

Subject of the issue

Stringify is turning my single quotes into &#x27; and makes my css linter complain.

Your environment

  • OS: Ubuntu 20.04
  • Packages:

❯ yarn list --pattern "unified|rehype-parse|to-vfile|rehype-stringify|fs-extra"
yarn list v1.22.5
├─ @types/[email protected]
├─ [email protected]
├─ [email protected]
│ └─ [email protected]
├─ [email protected]
│ └─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
│ └─ [email protected]
├─ [email protected]
│ └─ [email protected]
├─ [email protected]
└─ [email protected]
Done in 0.49s.

  • Env: node 14.5.0, yarn 1.22.5

Steps to reproduce

full minimal reproduction:

index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <h1 style="font-family: 'Some-font', sans-serif;">
</body>
</html>

index.js

const unified = require("unified");
const parser = require("rehype-parse");
const toVfile = require("to-vfile");
const stringify = require("rehype-stringify");
const fs = require("fs-extra");

const fileIn = "./index.html";
const fileOut = "./index-parsed.html";

unified()
  .use(parser)
  .use(stringify)
  .process(toVfile.readSync(fileIn), (err, data) => {
    if (err) {
      throw new Error(err);
    }
    fs.writeFileSync(fileOut, String(data));
  });

run with node index.js

Result

index-parsed.html

<!doctype html><html lang="en"><head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
    <h1 style="font-family: &#x27;Some-font&#x27;, sans-serif;">

</h1></body></html>

css lint error "property value expected at [7, 29]"

Expected behavior

The quotes should remain as is in the og file.

Actual behavior

A css lint error happens.

rehype-stringify 10.0.0 does not compile

Initial checklist

Affected packages and versions

rehype-stringify v10.0.0

Link to runnable example

No response (FYI stackblitz would not run your MWE template for me)

Steps to reproduce

Using the following snippet

import {unified} from 'unified'
import rehypeParse from 'rehype-parse'
import rehypeStringify from 'rehype-stringify'

const content = await unified()
    .use(rehypeParse)
    .use(rehypeStringify)
    .process('<h1>Hello World</h1>');

Expected behavior

The code should run and compile the HTML to a string.

Actual behavior

The code has the following runtime error:
Error [TypeError]: Cannot 'process' without 'Compiler'
and does not compile anything.

Downgrading the package to 9.0.4 fixes the issue.

Runtime

Node.js v20.5.1

Package manager

NPM 9.8.1

OS

Linux

Build and bundle tools

Next.js

"rehype is not an XML parser"

Initial checklist

Problem

readme says "rehype is not an XML parser"
but does not help me to find an XML parser for unified

> 👉 **Note**: rehype is not an XML parser.
> It supports SVG as embedded in HTML.
> It does not support the features available in XML.
> Passing SVG files might break but fragments of modern SVG should be fine.

Solution

suggest an XML parser for unified

something based on xast-util-from-xml

Alternatives

rehype-parse works for simple XML files

but it fails to parse <![CDATA[ ... ]]>

example: nixpkgs/doc/functions/library/attrsets.xml (docbook xml format) (NixOS/nixpkgs#105243)

  <example xml:id="function-library-lib.attrset.attrByPath-example-value-exists">
   <title>Extracting a value from a nested attribute set</title>
<programlisting><![CDATA[
let set = { a = { b = 3; }; };
in lib.attrsets.attrByPath [ "a" "b" ] 0 set
=> 3
]]></programlisting>
  </example>

Workaround

const inputText = (
  readFileSync(inputPath, 'utf8')
  // workaround for parsing xml
  // https://github.com/rehypejs/rehype/issues/109
  //.replace(/<!\[CDATA\[(.*?)\]\]>/sg, '$1')
  .replace(/<!\[CDATA\[(.*?)\]\]>/sg, '<cdata>$1</cdata>')
);

Bad logo rendering on github dark theme

Subject of the issue

The logo of Rehype is not well displayed in the Readme of the repository when I use Github's dark-theme

Your environment

Firefox, Github dark theme

Steps to reproduce

Enable Github Dark Theme, and open the repository page.

Expected behavior

The logo should be fully readable

Actual behavior

image

Unexpected list element hoisting

Initial checklist

Affected packages and versions

[email protected]

Link to runnable example

https://codesandbox.io/p/devbox/loving-borg-9p5c5q?file=%2Fsrc%2Findex.ts%3A11%2C7

Steps to reproduce

When using <li> elements inside of tags that are not <ul> the nested children get hoisted out. In the codepen I have used the custom tag <custom_list> but the same behavior occurs with standard <div> tags.

Input:

<custom_list>
  <li>
    <p>Text</p>
  </li>
  <li>
    <custom_list>
      <li><p>Nested Text</p></li>
    </custom_list>
  </li>
</custom_list>

Output:

<custom_list>
  <li>
    <p>Text</p>
  </li>
  <li>
    <custom_list>
    </custom_list>
  </li>
  <li><p>Nested Text</p></li>
</custom_list>

Expected behavior

The nested list item should be maintained as child of element inside the <li> tag

Actual behavior

The child of the element inside the <li> tag is hoisted up 1 level.

Runtime

Other (please specify in steps to reproduce)

Package manager

pnpm

OS

Linux

Build and bundle tools

Vite

[docs]: XHTML compatibility

Subject of the feature

Problem

No information about XHTML compatibility. Only information about XML

Expected behavior

More information about XHTML compatibility

Alternatives

Unfortunately, we do not have a huge set of tests, so we cannot check, but if the official supported will be great add couple words about it.

Sorry for multiple issue. We are evaluating rehype, so it may be useful for other developers as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.