Giter VIP home page Giter VIP logo

dom-serializer's People

Contributors

ackar avatar ajacksified avatar alexindigo avatar alloystory avatar ameliabr avatar coderaiser avatar davidchambers avatar dependabot-preview[bot] avatar dependabot[bot] avatar ericjeney avatar fb55 avatar greenkeeperio-bot avatar kevva avatar lahmatiy avatar nageshlop avatar rupindr avatar samypesse avatar scratchyone avatar tinco avatar tiojoca avatar tosmolka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dom-serializer's Issues

Support non-String attributes

Hello and thank you for making these tools :)

If you add a non-String attribute to any node and ask dom-serializer to parse it, it will treat the attribute value as a string and quickly throw an error.
Here is a condensed example: https://runkit.com/atjn/64073ea61e340b000802a5cd

In my case, I am trying to add a width and height to an SVG node. I know that these values are technically speaking strings, but they are supposed to be parsed as numbers, so I think it makes sense that I can set them to a Number without breaking the parser. This is also how the normal DOM APIs work in JS.

Therefore I would suggest adding a quick conversion to string before encoding the attributes.

If you don't think the parser should support non-string values, then I would suggest adding a check that throws a more concise error. The current one is very hard to decipher:

file:///[...]/node_modules/entities/lib/esm/escape.js:46
    return ret + str.substr(lastIdx);
                     ^

TypeError: str.substr is not a function
    at encodeXML (file:///[...]/node_modules/entities/lib/esm/escape.js:46:22)

Backwards name/description of `encodeEntites` option

Either the name or the description of the encodeEntities option looks wrong for what it does:

  const elem = parseDocument(`<img src="/foo?bar=bat&quote=&quot;" width="1" height="1">`).childNodes[0] as Element;

  console.log(render(elem));                            // <img src="/foo?bar=bat&amp;quote=&quot;" width="1" height="1">

  console.log(render(elem, { decodeEntities: true }));  // <img src="/foo?bar=bat&amp;quote=&quot;" width="1" height="1">

  console.log(render(elem, { decodeEntities: false })); // <img src="/foo?bar=bat&quote=&quot;" width="1" height="1">

Per the inline documentation:

/**
* Encode characters that are either reserved in HTML or XML, or are outside of the ASCII range.
*
* @default true
*/
decodeEntities?: boolean;

And the README:

decodeEntities

Optional decodeEntities: boolean

Encode characters that are either reserved in HTML or XML, or are outside of the ASCII range.

default true

The description doesn't match the name: it literally says the decodeEntities option will "Encode characters".

This make it sound like the default behavior of the render function is to decode entities, when it fact the opposite is true: by default, it encodes the entities in the rendered HTML.

If you want it to encode the entities, you have to pass decodeEntities: true, which is pretty confusing as well.

This option probably should have been named encodeEntities?

The examples above and documentation would make more sense then:

  const elem = parseDocument(`<img src="/foo?bar=bat&quote=&quot;" width="1" height="1">`).childNodes[0] as Element;

  console.log(render(elem));                            // <img src="/foo?bar=bat&amp;quote=&quot;" width="1" height="1">

  console.log(render(elem, { encodeEntities: true }));  // <img src="/foo?bar=bat&amp;quote=&quot;" width="1" height="1">

  console.log(render(elem, { encodeEntities: false })); // <img src="/foo?bar=bat&quote=&quot;" width="1" height="1">

Also, the description could be more accurate - it won't bypass the encoding of entities entirely, only for characters where this still produces HTML that can be parsed. As per my example, it will encode &quot; no matter what.

I would suggest deprecating this option in favor of a correctly-named encodeEntities with the same behavior. (probably less confusing that changing the default and reversing the behavior, which would be a breaking change.)

Recognize self closing tags.

A html which contains self closing tags like <link> are not getting closed with a self closing tag like <link .... />

I see that in index.js in function render there is no exclusive check made to add a self-closing tag when the mode is not xml, ideally as per HTML spec link tag and other self-closing tags should be self-closed.

Issue with quotes

There is an issue with quotes — serializer always forces them to be double quotes and completely neglects initial input. It should preserve single quotes in attributes.

This is a source for an issue of cheerio, described here: cheeriojs/cheerio#1006

Root nodes not being properly rendered due to dependancy issue

Hey everyone! I've found that this library is using domelementypes.root. For example in here

case ElementType.Root:

And I've also noticed that this repo is using domelementtype v2.0.1 (

"domelementtype": "^2.0.1",
) which is kinda weird because that type doesn't exist in that version of that lib https://github.com/fb55/domelementtype/blob/v2.0.1/src/index.ts

I'm creating this issue because currently that is kinda breaking some builds I've been trying to deploy, but maybe that is my own issue with yarn... Regardless, maybe a bump for this version would be a quick fix for it?

Thanks!

node_modules is in .gitignore

You have .gitignored /node_modules, which prevents people from vendoring your plugin and its dependencies. It would be most helpful if you could either not gitignore it, or change the gitignore to node_modules/*, so it can be overwritten by other gitignore files in parent directories.

Thanks the work on this project :)

selfClosingTags work differently than advertised

Hi there, thanks for making dom-serializer available, I'm looking forward to using it for years to come.

I encountered a weird issue around self-closing tags: The Readme clearly states:
example With selfClosingTags: true: <foo />
This is the specific scenario I'm in: I have HTML which may include "weird" tag names like foo or ac:link, and want the resulting HTML to include self-closing tags like <foo />. Note the space before the />.

However, the actual implementation only allows self-closing tags only for certain well-known tagnames:

opts.selfClosingTags && singleTag.has(elem.name))

Minimal reproducer:

'use strict';

const htmlparser2 = require('htmlparser2');
const render = require('dom-serializer').default;

const myString = '<foo />';
const myDocument = htmlparser2.parseDocument(myString);
const myStringAgain = render(myDocument, {'selfClosingTags': true});
console.log(`${myString}\nbecomes:\n${myStringAgain}`);

Expected output:

<foo />
becomes:
<foo></foo>

(i.e. round-trip identical)

Actual output:

<foo />
becomes:
<foo></foo>

Suggestions:

  • Ideally, I'd like a new options, either to enforce self-closing tags independently of the hard-coded list, or a way to add to the list.
  • Or, perhaps, just clear up the documentation that this doesn't actually mean self-closing tags, but rather "self-closing tags but only for those that are a well-know part of HTML", which, eh, I don't know.

Maximum call size problem with render

I had a bit problem parsing some html content:

RangeError: Maximum call stack size exceeded
    at String.replace (<anonymous>)
    at Object.encodeXML (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/entities/lib/encode.js:58:6)
    at renderText (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:136:21)
    at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:91:17)
    at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
    at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
    at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
    at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
    at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
    at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
    at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
    at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
    at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
    at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)
    at renderTag (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:116:14)
    at module.exports (/home/ubuntu/crawler/crawler/node_modules/cheerio/node_modules/dom-serializer/index.js:83:17)

I think, the problem is this for:


var render = module.exports = function(dom, opts) {
  if (!Array.isArray(dom) && !dom.cheerio) dom = [dom];
  opts = opts || {};

  var output = '';

  for(var i = 0; i < dom.length; i++){
    var elem = dom[i];

    if (elem.type === 'root')
      output += render(elem.children, opts);
    else if (ElementType.isTag(elem))
      output += renderTag(elem, opts);
    else if (elem.type === ElementType.Directive)
      output += renderDirective(elem);
    else if (elem.type === ElementType.Comment)
      output += renderComment(elem);
    else if (elem.type === ElementType.CDATA)
      output += renderCdata(elem);
    else
      output += renderText(elem, opts);
  }

  return output;
};

This need to be async for my opinion, but how to make this async without break the librerias like cheerio that use this library?

improve readme with motivation

I came across this repository from a random search.

Could the README be improved to mention why this is better than elem.outerHTML || elem.nodeValue ?

Maybe the usecase is server and not in a DOM context; if so that'd be good to know!

Publish new version to npm

Recently domutils (https://github.com/fb55/domutils) was updated to 1.5.1 (fb55/domutils@7d4bd16). Since this version it starts use dom-serializer. domutils uses by htmlparser2 (https://github.com/fb55/htmlparser2). htmlparser2 pins domutils to 1.5, but npm installs 1.5.1 still.
dom-serializer had issue with nodes that has no children (which was fixed by recent PR #18). This issue breaks tools that rely on htmlparser2 and generates node with no children. Even all versions are pinned.
So, it would be great to update version of dom-serializer in npm to fix those problems.
Thanks.

special HTML / XML characters are not encoded if decodeEntities is false

Hi. I'm having trouble with the HTML source generation using cheerio (thus using the dom-serializer). If I generate the source using .html(), then any UTF8 character is encoded, making a pretty uselessly long output. But if I use .html({decodeEntities: false}), then HTML special characters are not encoding, producing an invalid source, open to code injection.

var cheerio = require("cheerio");
var $ = cheerio.load('<p name="&quote;éé&amp;èè&quote;">&lt;ééèè&gt;</p>');
// over-encoded, UTF-8 characters are encoding which makes an heavier output
console.log($.html({decodeEntities: true}))
// under-encoded, the output is not a valid HTML document
console.log($.html({decodeEntities: false}))

These few characters should ALWAYS be encoded as it is first required by the standards, and second it oswerwise leaves an open door to any code injection (including javascript injection) to the code.

"s and 's are over-escaped

E.g. using Cheerio, $('<code>"hello"</code>').html() gives back &quot;hello&quot;, which is unnecessarily ugly and doesn't match browsers.

TypeError: render is not a function in ES module, but not in CJS module

I've examined the source code and see "export default function render", and I understand the example code you give. If I implement the example in an ES module, I get the error message shown above, but if implemented in a CJS module, it successfully deserializes the DOM into HTML.

$ node --version
v16.13.0

The host is macOS 10.15.7 (Catalina).

Using the ES module, I get this output:

{ default: [Function: render] }
{ default: [Function: render] }
file:///Volumes/Extra/ws/techsparx.com/projects/node.js/htmlparser2/test2.mjs:30
const serilzd = render(dom);
                ^

TypeError: render is not a function
    at file:///Volumes/Extra/ws/techsparx.com/projects/node.js/htmlparser2/test2.mjs:30:17

The first two lines are me printing out the render object to make sure what module object was retrieved. The full source code is below. I've tried several variants to the program, and I keep getting this message.

Transliterating the same code into a CJS module, it instead executes successfully. The serilzd variable gets the expected HTML text string which I can print out.

import { default as htmlparser2, Parser } from "htmlparser2";
import { DomHandler } from "domhandler";
import { default as render } from "dom-serializer";
import { default as fs, promises as fsp } from 'fs';
import util from 'util';

const rawHtml = await fsp.readFile(process.argv[2], 'utf8');

const dom = htmlparser2.parseDocument(rawHtml);

console.log(dom);

console.log(render);
console.log(util.inspect(render));

const serilzd = render(dom);

console.log(serilzd);

This is the ES module version

const htmlparser2 = require('htmlparser2');
const render = require('dom-serializer').default;
const fs = require('fs');
const fsp = require('fs').promises;
const util = require('util');

(async () => {

    const rawHtml = await fsp.readFile(process.argv[2], 'utf8');

    const dom = htmlparser2.parseDocument(rawHtml);
    
    console.log(dom);
    
    console.log(render);
    console.log(util.inspect(render));
    
    const serilzd = render(dom);
    
    console.log(serilzd);
    
})().catch(err => {
    console.error(err);
});

This is the CJS module version

Publish new version to npm

Hello, can you please publish a new version to npm ?

In the previous version, you were depending on domelementtype 1.1.1 explicitly, which didn't have a license file published to npm. In the current domelementtype version (1.3.0), the license file has been published too.

Can you please publish a new minor version 0.1.1 ?

Thanks in advance.

domelementtype needs updating to 2.1.0 to avoid typescript errors

Got this error when migrating a node app that users cheerio to Typescript:

Namespace '"/home/.../node_modules/domelementtype/lib/index".ElementType' has no exported member 'Root'.

70 constructor(type: ElementType.Root | ElementType.CDATA | ElementType.Script | ElementType.Style | ElementType.Tag, children: Node[]);

This issue has been fixed on https://github.com/fb55/domelementtype already , but I had to apply a patch-package in my app to got it solved from now.

And obviously cheerio main package needs to update its dependency from this dom-serializer to the new version when this fix is applied.

”selfClosingTags“ not work

import serialize from "dom-serializer";
import { parseDocument } from "htmlparser2";
  const html = `
                      <div class="container">
		                    <foo  />
                      </div >`;  
                   
  const result = serialize(parseDocument(html), { selfClosingTags: true, xmlMode: true})
  
  // result =   `<div class="container">
  //              <foo>
  //             </foo></div>`
  
  // expect  `<div class="container">
  //               <foo />
  //             </div>`
 

I don't get what I expect. What configuration error did I make?

Version 0.1.1 has broken the cheerio test suite

when installed with 0.1.1

  599 passing (1s)
  1 pending
  1 failing

  1) cheerio .load should render xml in html() when options.xmlMode = true passed to html():
     Error: expected '<mixedcasetag uppercaseattribute></mixedcasetag>' to equal '<mixedcasetag uppercaseattribute=""></mixedcasetag>'
      at Assertion.assert (node_modules/expect.js/index.js:96:13)
      at Assertion.be.Assertion.equal (node_modules/expect.js/index.js:216:10)
      at Assertion.(anonymous function) [as be] (node_modules/expect.js/index.js:69:24)
      at Context.<anonymous> (test/cheerio.js:375:29)

when installed with 0.1.0 the test suite passes

Version 0.2.1 has broken Jest tests

When testing with Jest and enzyme, I get this:

● Test suite failed to run

TypeError: Cannot set property '__proto__' of undefined

  at Object.<anonymous> (node_modules/dom-serializer/index.js:12:37)

v0.2.1: UglifyJs throws Unexpected token: keyword «const» in /node_modules/dom-serializer/index.js:108,0

UglifyJs throws Unexpected token: keyword «const» in /node_modules/dom-serializer/index.js:108,0

Due to which react build is failing with following error:

npm ERR! errno 2

...

13 verbose stack Exit status 2
13 verbose stack     at EventEmitter.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:326:16)
13 verbose stack     at emitTwo (events.js:126:13)
13 verbose stack     at EventEmitter.emit (events.js:214:7)
13 verbose stack     at ChildProcess.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14)
13 verbose stack     at emitTwo (events.js:126:13)
13 verbose stack     at ChildProcess.emit (events.js:214:7)
13 verbose stack     at maybeClose (internal/child_process.js:925:16)
13 verbose stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:209:5)```


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.