cheeriojs / dom-serializer Goto Github PK
View Code? Open in Web Editor NEWrender dom nodes
License: MIT License
render dom nodes
License: MIT License
Hello and thank you for making these tools :)
If you add a non-String attribute to any node and ask dom-serializer
to parse it, it will treat the attribute value as a string and quickly throw an error.
Here is a condensed example: https://runkit.com/atjn/64073ea61e340b000802a5cd
In my case, I am trying to add a width
and height
to an SVG node. I know that these values are technically speaking strings, but they are supposed to be parsed as numbers, so I think it makes sense that I can set them to a Number without breaking the parser. This is also how the normal DOM APIs work in JS.
Therefore I would suggest adding a quick conversion to string before encoding the attributes.
If you don't think the parser should support non-string values, then I would suggest adding a check that throws a more concise error. The current one is very hard to decipher:
file:///[...]/node_modules/entities/lib/esm/escape.js:46
return ret + str.substr(lastIdx);
^
TypeError: str.substr is not a function
at encodeXML (file:///[...]/node_modules/entities/lib/esm/escape.js:46:22)
http://www.w3.org/html/wg/drafts/html/master/syntax.html#serializing-html-fragments
entities
provides).We need the code to be licensed to be able to use it.
Either the name or the description of the encodeEntities
option looks wrong for what it does:
const elem = parseDocument(`<img src="/foo?bar=bat"e="" width="1" height="1">`).childNodes[0] as Element;
console.log(render(elem)); // <img src="/foo?bar=bat&quote="" width="1" height="1">
console.log(render(elem, { decodeEntities: true })); // <img src="/foo?bar=bat&quote="" width="1" height="1">
console.log(render(elem, { decodeEntities: false })); // <img src="/foo?bar=bat"e="" width="1" height="1">
Per the inline documentation:
Lines 41 to 46 in 45e4123
And the README:
decodeEntities
•
Optional
decodeEntities: booleanEncode characters that are either reserved in HTML or XML, or are outside of the ASCII range.
default
true
The description doesn't match the name: it literally says the decodeEntities option will "Encode characters".
This make it sound like the default behavior of the render
function is to decode entities, when it fact the opposite is true: by default, it encodes the entities in the rendered HTML.
If you want it to encode the entities, you have to pass decodeEntities: true
, which is pretty confusing as well.
This option probably should have been named encodeEntities
?
The examples above and documentation would make more sense then:
const elem = parseDocument(`<img src="/foo?bar=bat"e="" width="1" height="1">`).childNodes[0] as Element;
console.log(render(elem)); // <img src="/foo?bar=bat&quote="" width="1" height="1">
console.log(render(elem, { encodeEntities: true })); // <img src="/foo?bar=bat&quote="" width="1" height="1">
console.log(render(elem, { encodeEntities: false })); // <img src="/foo?bar=bat"e="" width="1" height="1">
Also, the description could be more accurate - it won't bypass the encoding of entities entirely, only for characters where this still produces HTML that can be parsed. As per my example, it will encode "
no matter what.
I would suggest deprecating this option in favor of a correctly-named encodeEntities
with the same behavior. (probably less confusing that changing the default and reversing the behavior, which would be a breaking change.)
A html which contains self closing tags like <link>
are not getting closed with a self closing tag like <link .... />
I see that in index.js in function render there is no exclusive check made to add a self-closing tag when the mode is not xml, ideally as per HTML spec link tag and other self-closing tags should be self-closed.
This might very well be me misunderstanding something, but I expected a non-breaking space (U+00a0) to not be encoded as
when specifying encodeEntities: 'utf8'
. Is this intended behaviour?
const $ = cheerio.load('<script id="data-layer">"<br>"</script>', {
xmlMode: true
});
console.log($.html());
Produces:
<script id="data-layer">"<br>"</br></script>
Expected:
<script id="data-layer">"<br>"</script>
Cheerio version: ^1.0.0-rc.2
There is an issue with quotes — serializer always forces them to be double quotes and completely neglects initial input. It should preserve single quotes in attributes.
This is a source for an issue of cheerio, described here: cheeriojs/cheerio#1006
Hey everyone! I've found that this library is using domelementypes.root. For example in here
Line 141 in 36e7bd4
And I've also noticed that this repo is using domelementtype v2.0.1 (
Line 22 in 36e7bd4
I'm creating this issue because currently that is kinda breaking some builds I've been trying to deploy, but maybe that is my own issue with yarn... Regardless, maybe a bump for this version would be a quick fix for it?
Thanks!
You have .gitignored /node_modules, which prevents people from vendoring your plugin and its dependencies. It would be most helpful if you could either not gitignore it, or change the gitignore to node_modules/*, so it can be overwritten by other gitignore files in parent directories.
Thanks the work on this project :)
Hi there, thanks for making dom-serializer available, I'm looking forward to using it for years to come.
I encountered a weird issue around self-closing tags: The Readme clearly states:
example With selfClosingTags: true: <foo />
This is the specific scenario I'm in: I have HTML which may include "weird" tag names like foo
or ac:link
, and want the resulting HTML to include self-closing tags like <foo />
. Note the space before the />
.
However, the actual implementation only allows self-closing tags only for certain well-known tagnames:
Line 228 in 5a05207
Minimal reproducer:
'use strict';
const htmlparser2 = require('htmlparser2');
const render = require('dom-serializer').default;
const myString = '<foo />';
const myDocument = htmlparser2.parseDocument(myString);
const myStringAgain = render(myDocument, {'selfClosingTags': true});
console.log(`${myString}\nbecomes:\n${myStringAgain}`);
Expected output:
<foo />
becomes:
<foo></foo>
(i.e. round-trip identical)
Actual output:
<foo />
becomes:
<foo></foo>
Suggestions:
Hi Team,
Recently, we started to facing a issue in IE. Syntax Regular expression error. It happening due to the recent changes in entities. The introduced es6 for regex expression. So Please update entities version to 2.0.0
ENTITIES BUG: fb55/entities#200
I had a bit problem parsing some html content:
RangeError: Maximum call stack size exceeded
at String.replace (<anonymous>)
at Object.encodeXML (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/entities/lib/encode.js:58:6)
at renderText (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:136:21)
at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:91:17)
at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
I think, the problem is this for:
var render = module.exports = function(dom, opts) {
if (!Array.isArray(dom) && !dom.cheerio) dom = [dom];
opts = opts || {};
var output = '';
for(var i = 0; i < dom.length; i++){
var elem = dom[i];
if (elem.type === 'root')
output += render(elem.children, opts);
else if (ElementType.isTag(elem))
output += renderTag(elem, opts);
else if (elem.type === ElementType.Directive)
output += renderDirective(elem);
else if (elem.type === ElementType.Comment)
output += renderComment(elem);
else if (elem.type === ElementType.CDATA)
output += renderCdata(elem);
else
output += renderText(elem, opts);
}
return output;
};
This need to be async for my opinion, but how to make this async without break the librerias like cheerio that use this library?
I came across this repository from a random search.
Could the README be improved to mention why this is better than elem.outerHTML || elem.nodeValue
?
Maybe the usecase is server and not in a DOM context; if so that'd be good to know!
Recently domutils
(https://github.com/fb55/domutils) was updated to 1.5.1
(fb55/domutils@7d4bd16). Since this version it starts use dom-serializer
. domutils
uses by htmlparser2
(https://github.com/fb55/htmlparser2). htmlparser2
pins domutils
to 1.5
, but npm installs 1.5.1
still.
dom-serializer
had issue with nodes that has no children
(which was fixed by recent PR #18). This issue breaks tools that rely on htmlparser2
and generates node with no children
. Even all versions are pinned.
So, it would be great to update version of dom-serializer
in npm to fix those problems.
Thanks.
Hi. I'm having trouble with the HTML source generation using cheerio (thus using the dom-serializer). If I generate the source using .html()
, then any UTF8 character is encoded, making a pretty uselessly long output. But if I use .html({decodeEntities: false})
, then HTML special characters are not encoding, producing an invalid source, open to code injection.
var cheerio = require("cheerio");
var $ = cheerio.load('<p name=""e;éé&èè"e;"><ééèè></p>');
// over-encoded, UTF-8 characters are encoding which makes an heavier output
console.log($.html({decodeEntities: true}))
// under-encoded, the output is not a valid HTML document
console.log($.html({decodeEntities: false}))
These few characters should ALWAYS be encoded as it is first required by the standards, and second it oswerwise leaves an open door to any code injection (including javascript injection) to the code.
E.g. using Cheerio, $('<code>"hello"</code>').html()
gives back "hello"
, which is unnecessarily ugly and doesn't match browsers.
I've examined the source code and see "export default function render", and I understand the example code you give. If I implement the example in an ES module, I get the error message shown above, but if implemented in a CJS module, it successfully deserializes the DOM into HTML.
$ node --version
v16.13.0
The host is macOS 10.15.7 (Catalina).
Using the ES module, I get this output:
{ default: [Function: render] }
{ default: [Function: render] }
file:///Volumes/Extra/ws/techsparx.com/projects/node.js/htmlparser2/test2.mjs:30
const serilzd = render(dom);
^
TypeError: render is not a function
at file:///Volumes/Extra/ws/techsparx.com/projects/node.js/htmlparser2/test2.mjs:30:17
The first two lines are me printing out the render
object to make sure what module object was retrieved. The full source code is below. I've tried several variants to the program, and I keep getting this message.
Transliterating the same code into a CJS module, it instead executes successfully. The serilzd
variable gets the expected HTML text string which I can print out.
import { default as htmlparser2, Parser } from "htmlparser2";
import { DomHandler } from "domhandler";
import { default as render } from "dom-serializer";
import { default as fs, promises as fsp } from 'fs';
import util from 'util';
const rawHtml = await fsp.readFile(process.argv[2], 'utf8');
const dom = htmlparser2.parseDocument(rawHtml);
console.log(dom);
console.log(render);
console.log(util.inspect(render));
const serilzd = render(dom);
console.log(serilzd);
This is the ES module version
const htmlparser2 = require('htmlparser2');
const render = require('dom-serializer').default;
const fs = require('fs');
const fsp = require('fs').promises;
const util = require('util');
(async () => {
const rawHtml = await fsp.readFile(process.argv[2], 'utf8');
const dom = htmlparser2.parseDocument(rawHtml);
console.log(dom);
console.log(render);
console.log(util.inspect(render));
const serilzd = render(dom);
console.log(serilzd);
})().catch(err => {
console.error(err);
});
This is the CJS module version
Hello, can you please publish a new version to npm ?
In the previous version, you were depending on domelementtype 1.1.1 explicitly, which didn't have a license file published to npm. In the current domelementtype version (1.3.0), the license file has been published too.
Can you please publish a new minor version 0.1.1 ?
Thanks in advance.
Got this error when migrating a node app that users cheerio to Typescript:
Namespace '"/home/.../node_modules/domelementtype/lib/index".ElementType' has no exported member 'Root'.
70 constructor(type: ElementType.Root | ElementType.CDATA | ElementType.Script | ElementType.Style | ElementType.Tag, children: Node[]);
This issue has been fixed on https://github.com/fb55/domelementtype
already , but I had to apply a patch-package in my app to got it solved from now.
And obviously cheerio main package needs to update its dependency from this dom-serializer
to the new version when this fix is applied.
Object.prototype.proto is deprecated and it's not clear why it is set to null.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/proto
import serialize from "dom-serializer";
import { parseDocument } from "htmlparser2";
const html = `
<div class="container">
<foo />
</div >`;
const result = serialize(parseDocument(html), { selfClosingTags: true, xmlMode: true})
// result = `<div class="container">
// <foo>
// </foo></div>`
// expect `<div class="container">
// <foo />
// </div>`
I don't get what I expect. What configuration error did I make?
when installed with 0.1.1
599 passing (1s)
1 pending
1 failing
1) cheerio .load should render xml in html() when options.xmlMode = true passed to html():
Error: expected '<mixedcasetag uppercaseattribute></mixedcasetag>' to equal '<mixedcasetag uppercaseattribute=""></mixedcasetag>'
at Assertion.assert (node_modules/expect.js/index.js:96:13)
at Assertion.be.Assertion.equal (node_modules/expect.js/index.js:216:10)
at Assertion.(anonymous function) [as be] (node_modules/expect.js/index.js:69:24)
at Context.<anonymous> (test/cheerio.js:375:29)
when installed with 0.1.0 the test suite passes
When testing with Jest and enzyme, I get this:
● Test suite failed to run
TypeError: Cannot set property '__proto__' of undefined
at Object.<anonymous> (node_modules/dom-serializer/index.js:12:37)
This means that paths after the initial one are treated as if they are a child of the path before them, breaking the SVG.
UglifyJs throws Unexpected token: keyword «const» in /node_modules/dom-serializer/index.js:108,0
Due to which react build is failing with following error:
npm ERR! errno 2
...
13 verbose stack Exit status 2
13 verbose stack at EventEmitter.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:326:16)
13 verbose stack at emitTwo (events.js:126:13)
13 verbose stack at EventEmitter.emit (events.js:214:7)
13 verbose stack at ChildProcess.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14)
13 verbose stack at emitTwo (events.js:126:13)
13 verbose stack at ChildProcess.emit (events.js:214:7)
13 verbose stack at maybeClose (internal/child_process.js:925:16)
13 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:209:5)```
please wrap body of fomatAttrs for loop into "if(...hasOwnProperty(...))" condition. Otherwise formatAttrs tries to render my Object.prototype methods :(
here is direct link: https://github.com/cheeriojs/dom-serializer/blob/master/index.js#L29
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.