syntax-tree / hast-util-sanitize Goto Github PK

View Code? Open in Web Editor NEW

48.0 10.0 20.0 194 KB

utility to sanitize hast nodes

Home Page: https://unifiedjs.com

License: MIT License

JavaScript 11.22% HTML 88.78%

hast html util sanitize clean unist syntax-tree hast-util xss security

hast-util-sanitize's People

Contributors

Stargazers

Watchers

Forkers

mattsolo1 rhysd cloverich mattcreager binhndicts zenizh esatterwhite arystan-sw kikobeats iammarkps rafegoldberg viczam snikitin-qtl taylorbeeston flexion ieschalier aprendendofelipe ben519 arvi9

hast-util-sanitize's Issues

Support for allowing raw nodes to remain?

I'd like the safety of sanitization using this plugin, but I need to alter what node types are ignored. This could be a part of schema extension, perhaps, where we can pass an array of node types to ignore, or an array of ignored types to allow.

Thoughts?

Look at the sanitize api proposal

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Subject

Look at built in sanitize api & implementation: https://wicg.github.io/sanitizer-api/#configuration-object

Problem

Custom api

Solution

Standard api

Alternatives

Kepe doing what we're doing.

Don't apply the clobber prefix when it's already present

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Problem

I would like to use the clobber prefix in both footnotes and sanitization. When the clobber prefix is set in remark-rehype, footnotes IDs and links to them are correctly rendered with the clobber prefix included. Then, when the sanitize plugin is applied (to prefix IDs in the text outside footnotes), the clobber prefix is added once again resulting in IDs like: user-content-user-content-foo. When the clobber prefix is NOT set in remark-rehype and the sanitize plugin is applied, then footnotes are broken, because IDs are correctly set to user-content-foo, but links are unprefixed e.g. foo.

Solution

IMHO since the sanitize plugin doesn't know anything about footnotes structure, it should be the remark-rehype plugin that should apply the clobber prefix, where it's needed. The sanitize plugin should detect, that the clobber prefix is already applied and in that case it shouldn't do nothing for that element.

The implementation should be quite easy. Just add && !value.startsWith(state.schema.clobberPrefix) to the end of this line.

Alternatives

For now, I'm hotfixing this by adding another plugin between remark-rehype and rehype-sanitize, that is removing clobber prefix from all elements, so rehype-sanitize can add it again without the duplication.

Invalid extensions causes an exception

The link to the line of code that causes error: ba16c15#r32393248

Support for multiple allowed attributes

Hi! 👋

This is a question in case this is possible and I missed it and/or a feature request in one because from my scouring and attempts, it doesn't seem like this is possible.

Subject of the feature

Ability to declare multiple allowed attributes

Problem

Currently, it seems impossible to define the schema as such to support allowing of multiple specific attribute values.

My specific use case would be for something like rehype-prism generated content, where I'd like to allow a span to have a set of known classNames, (i.e. token and operator), while still stripping any other ones.

If we take an example of a <span class="token operator">

So far, I've blindly attempted:

Adding the same attribute multiple times:
```
span: [
  ["className", "token"],
  ["className", "operator"],
],
```
Current result: <span class="operator">

Adding an array of attributes:

span: [
  ["className", ["token", "operator"]],
],

Current result: <span>

Pattern-matchy string:

span: [
  ["className", "token|operator"],
],

Current result: <span>

Space separated string, or a combo of how they appear:
```
span: [
  ["className", "token operator"],
],
```
Current result: <span>

Expected behaviour

It seems like it would be beneficial to have a way to provide a list of valid attibute values for an attribute in the schema.

Currently the choice between a blanket allowance of an attribute vs only a single variant in cases where allowed attributes are known in advance and could be listed out, nudges the user towards the blanket allowance.

While I've tried the last approach, it's probably not something that would be useful.

Apologies if this was considered before and dismissed for a reason.

Alternatives

N/A

Thank you for your time!

Provide way of disallowing certain values

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Subject

The schema argument is like an allowlist. I'd like a way of disallowing certain things.

Problem

My problem is that I'd like to support custom protocols in <a> elements in the rendered HTML. I'm using this plugin through remark-html, together with the default github.json schema along with some minor customisations.

In a nutshell:

Input

This is a [custom protocol](acme://deep/link)

Expected

<p>This is a <a href="acme://deep/link">custom protocol</a></p>

Actual

<p>This is a <a>custom protocol</a></p>

Solution

I'd like to keep the existing behaviour of disallowing malicious javascript: prefixes in the href attribute.
But at the same time allowing all other protocols.

Essentially:

{
   "protocols": {
      "href": {
        "allow": "**,
        "disallow": "javascript:"
      }
   }
}

Currently, it's not impossible to specify allows/disallows at the same time in this plugin.

Alternatives

Allow all attributes of the specific tag

Subject of the feature

Sometimes we may need to allow all attributes of the specific tag. Could we add a rule like this?

const schema = {
  tagNames: ['svg'],
  attributes: [
    { svg: '*' } // Allow all attributes of svg tag
  ]
}

Alternatives

The alternative way is to add all possible attributes, which could be too complicated and will cause large bundle size in the browser.

id property is always removed when using <h2> tag

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Affected packages and versions

5.0.0

Link to runnable example

No response

Steps to reproduce

This is a weird one as <h1>, <h3>, <h4>, ... tags all work. It's just <h2> that has a problem.

In the below code, I'm attempting to sanitize the html string <h2 id="foo">Hello, world!</h2>. I want the id to be retained (or at least sanitized into user-content-foo). However, it is removed entirely.

import deepmerge from "deepmerge"
import { defaultSchema } from "hast-util-sanitize"
import rehypeParse from "rehype-parse"
import rehypeSanitize from "rehype-sanitize"
import rehypeStringify from "rehype-stringify"
import { unified } from "unified"

const schema = deepmerge(defaultSchema, { attributes: { "*": ["id"] } })

const file = await unified()
  .use(rehypeParse)
  .use(rehypeSanitize, schema)
  .use(rehypeStringify)
  .process('<h2 id="foo">Hello, world!</h2>')

console.log(String(file))  // <h2>Hello, world!</h2>

Expected behavior

<h2 id="user-content-foo">Hello, world!</h2>
(The id property should not be removed.)

Actual behavior

<h2>Hello, world!</h2>
(The id property is removed.)

Affected runtime and version

node v20.5.1

Affected package manager and version

npm 10.0.0

Affected OS and version

mac os 13.5.2 (22G91)

Build and bundle tools

No response

Add support for allowing comments

I see in the tests, that it removes comments, but I'd actaully like to keep comments. Is there an option for it, or how would I be able to do that?

I added

{
  comment: {value: handleValue}
}

in the node schema and that seems to do what I want. Not sure if that is ideal or if there is a better place for it.

GFM footnotes

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Affected packages and versions

"rehype-sanitize": "^6.0.0"

Link to runnable example

react-markdown

Steps to reproduce

the rehype-sanitize has changed the id of the elements below:

<a href="#user-content-fn-1" id="user-content-user-content-fnref-1" data-footnote-ref="true" aria-describedby="user-content-footnote-label">1</a>

...

<li id="user-content-user-content-fn-1">
<p>something...  <a href="#user-content-fnref-1" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>

Expected behavior

It should not be changed.

Actual behavior

idk

Affected runtime and version

[email protected]

Affected package manager and version

No response

Affected OS and version

No response

Build and bundle tools

No response

Github allows <li> without ancestors.

Hello,
we have found an issue in GitHub schema.

hast-util-sanitize/lib/github.json

Lines 11 to 14 in 71283f0

 "li": [ 

 "ol", 

 "ul" 

 ],

It sanitizes the li tags without ul or ol ancestors.
Github is allowing that. Should I make a PR to fix that? or there is a rationale to keep it this way?

Update GitHub schema

Subject of the feature

The GH schema was authored in June 2016. Since then, some changed landed in how GH handles HTML.

Problem

Some new protocols, attributes, and elements are now supported: https://github.com/jch/html-pipeline/commits/master/lib/html/pipeline/sanitization_filter.rb

Add type definitions

Subject of the feature

Add TypeScript type definitions

Problem

It is currently difficult to use this package in a TypeScript project due to a lack of type definitions. This in turn also makes it different to add type definitions to a package that depends on this package.

Expected behaviour

When importing this module into a TypeScript project, the type should automatically be declared.

Alternatives

An alternative to simply adding a .d.ts file would be to actually convert the whole package to TypeScript, however, it is much easier to simply add a .d.ts file.

syntax-tree / hast-util-sanitize Goto Github PK

hast-util-sanitize's People

Contributors

Stargazers

Watchers

Forkers

hast-util-sanitize's Issues

Initial checklist

Subject

Problem

Solution

Alternatives

Initial checklist

Problem

Solution

Alternatives

Subject of the feature

Problem

Expected behaviour

Alternatives

Initial checklist

Subject

Problem

Solution

Alternatives

Subject of the feature

Alternatives

Initial checklist

Affected packages and versions

Link to runnable example

Steps to reproduce

Expected behavior

Actual behavior

Affected runtime and version

Affected package manager and version

Affected OS and version

Build and bundle tools

Initial checklist

Affected packages and versions

Link to runnable example

Steps to reproduce

Expected behavior

Actual behavior

Affected runtime and version

Affected package manager and version

Affected OS and version

Build and bundle tools

Subject of the feature

Problem

Subject of the feature

Problem

Expected behaviour

Alternatives

Recommend Projects

Recommend Topics

Recommend Org