syntax-tree / hast-util-sanitize Goto Github PK
View Code? Open in Web Editor NEWutility to sanitize hast nodes
Home Page: https://unifiedjs.com
License: MIT License
utility to sanitize hast nodes
Home Page: https://unifiedjs.com
License: MIT License
I'd like the safety of sanitization using this plugin, but I need to alter what node types are ignored. This could be a part of schema extension, perhaps, where we can pass an array of node types to ignore, or an array of ignored types to allow.
Thoughts?
Look at built in sanitize api & implementation: https://wicg.github.io/sanitizer-api/#configuration-object
Custom api
Standard api
Kepe doing what we're doing.
I would like to use the clobber prefix in both footnotes and sanitization. When the clobber prefix is set in remark-rehype, footnotes IDs and links to them are correctly rendered with the clobber prefix included. Then, when the sanitize plugin is applied (to prefix IDs in the text outside footnotes), the clobber prefix is added once again resulting in IDs like: user-content-user-content-foo
. When the clobber prefix is NOT set in remark-rehype and the sanitize plugin is applied, then footnotes are broken, because IDs are correctly set to user-content-foo
, but links are unprefixed e.g. foo
.
IMHO since the sanitize plugin doesn't know anything about footnotes structure, it should be the remark-rehype plugin that should apply the clobber prefix, where it's needed. The sanitize plugin should detect, that the clobber prefix is already applied and in that case it shouldn't do nothing for that element.
The implementation should be quite easy. Just add && !value.startsWith(state.schema.clobberPrefix)
to the end of this line.
For now, I'm hotfixing this by adding another plugin between remark-rehype and rehype-sanitize, that is removing clobber prefix from all elements, so rehype-sanitize can add it again without the duplication.
The link to the line of code that causes error: ba16c15#r32393248
Hi! 👋
This is a question in case this is possible and I missed it and/or a feature request in one because from my scouring and attempts, it doesn't seem like this is possible.
Ability to declare multiple allowed attributes
Currently, it seems impossible to define the schema as such to support allowing of multiple specific attribute values.
My specific use case would be for something like rehype-prism
generated content, where I'd like to allow a span
to have a set of known className
s, (i.e. token
and operator
), while still stripping any other ones.
If we take an example of a <span class="token operator">
So far, I've blindly attempted:
Adding the same attribute multiple times:
span: [
["className", "token"],
["className", "operator"],
],
Current result: <span class="operator">
Adding an array of attributes:
span: [
["className", ["token", "operator"]],
],
Current result: <span>
Pattern-matchy string:
span: [
["className", "token|operator"],
],
Current result: <span>
Space separated string, or a combo of how they appear:
span: [
["className", "token operator"],
],
Current result: <span>
It seems like it would be beneficial to have a way to provide a list of valid attibute values for an attribute in the schema.
Currently the choice between a blanket allowance of an attribute vs only a single variant in cases where allowed attributes are known in advance and could be listed out, nudges the user towards the blanket allowance.
While I've tried the last approach, it's probably not something that would be useful.
Apologies if this was considered before and dismissed for a reason.
N/A
Thank you for your time!
The schema argument is like an allowlist. I'd like a way of disallowing certain things.
My problem is that I'd like to support custom protocols in <a>
elements in the rendered HTML. I'm using this plugin through remark-html
, together with the default github.json
schema along with some minor customisations.
In a nutshell:
Input
This is a [custom protocol](acme://deep/link)
Expected
<p>This is a <a href="acme://deep/link">custom protocol</a></p>
Actual
<p>This is a <a>custom protocol</a></p>
javascript:
prefixes in the href
attribute.Essentially:
{
"protocols": {
"href": {
"allow": "**,
"disallow": "javascript:"
}
}
}
Currently, it's not impossible to specify allows/disallows at the same time in this plugin.
Sometimes we may need to allow all attributes of the specific tag. Could we add a rule like this?
const schema = {
tagNames: ['svg'],
attributes: [
{ svg: '*' } // Allow all attributes of svg tag
]
}
The alternative way is to add all possible attributes, which could be too complicated and will cause large bundle size in the browser.
5.0.0
No response
This is a weird one as <h1>
, <h3>
, <h4>
, ... tags all work. It's just <h2>
that has a problem.
In the below code, I'm attempting to sanitize the html string <h2 id="foo">Hello, world!</h2>
. I want the id to be retained (or at least sanitized into user-content-foo). However, it is removed entirely.
import deepmerge from "deepmerge"
import { defaultSchema } from "hast-util-sanitize"
import rehypeParse from "rehype-parse"
import rehypeSanitize from "rehype-sanitize"
import rehypeStringify from "rehype-stringify"
import { unified } from "unified"
const schema = deepmerge(defaultSchema, { attributes: { "*": ["id"] } })
const file = await unified()
.use(rehypeParse)
.use(rehypeSanitize, schema)
.use(rehypeStringify)
.process('<h2 id="foo">Hello, world!</h2>')
console.log(String(file)) // <h2>Hello, world!</h2>
<h2 id="user-content-foo">Hello, world!</h2>
(The id property should not be removed.)
<h2>Hello, world!</h2>
(The id property is removed.)
node v20.5.1
npm 10.0.0
mac os 13.5.2 (22G91)
No response
I see in the tests, that it removes comments, but I'd actaully like to keep comments. Is there an option for it, or how would I be able to do that?
I added
{
comment: {value: handleValue}
}
in the node schema and that seems to do what I want. Not sure if that is ideal or if there is a better place for it.
"rehype-sanitize": "^6.0.0"
react-markdown
the rehype-sanitize
has changed the id of the elements below:
<a href="#user-content-fn-1" id="user-content-user-content-fnref-1" data-footnote-ref="true" aria-describedby="user-content-footnote-label">1</a>
...
<li id="user-content-user-content-fn-1">
<p>something... <a href="#user-content-fnref-1" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
It should not be changed.
idk
No response
No response
No response
Hello,
we have found an issue in GitHub schema.
hast-util-sanitize/lib/github.json
Lines 11 to 14 in 71283f0
It sanitizes the li
tags without ul
or ol
ancestors.
Github is allowing that. Should I make a PR to fix that? or there is a rationale to keep it this way?
The GH schema was authored in June 2016. Since then, some changed landed in how GH handles HTML.
Some new protocols, attributes, and elements are now supported: https://github.com/jch/html-pipeline/commits/master/lib/html/pipeline/sanitization_filter.rb
Add TypeScript type definitions
It is currently difficult to use this package in a TypeScript project due to a lack of type definitions. This in turn also makes it different to add type definitions to a package that depends on this package.
When importing this module into a TypeScript project, the type should automatically be declared.
An alternative to simply adding a .d.ts file would be to actually convert the whole package to TypeScript, however, it is much easier to simply add a .d.ts file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.