Giter VIP home page Giter VIP logo

soup's Introduction

soup

NPM version Linux Build Status Dependency Status devDependency Status

A little library for querying and manipulating tag soup via CSS selectors.

It manipulates the string itself (rather than operating on a parsed DOM and then re-exporting it). So it retains all the syntactic/formatting nuances of the original, such as:

  • attribute quotes or lack thereof,
  • whitespace,
  • invalid-but-parseable stuff,
  • omitted closing tags, etc.

Use cases:

  • build tasks/plugins that need to manipulate markup without parsing away all the original formatting;
  • GUI webpage design tools that need to combine hand-coded HTML with WYSIWYG-driven edits;
  • anywhere else you need to make automated, light-touch changes to other people's markup.

Usage

$ npm install soup
var Soup = require('soup');

soup = new Soup('<div class=thing><img src=cat.jpg></div>');

// Change the img src
soup.setAttribute('img', 'src', 'dog.jpg');
soup.toString();    // <div class=thing><img src=dog.jpg></div>

// Add a class to the div
soup.setAttribute('.thing', 'class', function (oldValue) {
  return oldValue + ' another';
});
soup.toString();    // <div class="thing another"><img src=dog.jpg></div>

Selectors

Soup uses Cheerio under the hood for finding elements to update, so you can use any CSS3 selector in the methods below.

Methods

setAttribute(selector, attributeName, newValue)

  • newValue can be:
    • any string – to set the attribute's value
    • true – to set it as a boolean attribute (eg required)
    • false – to delete the attribute
    • null – for "no change"
    • a function – which should decide what to do and return the correct value (as a string, boolean, or null). The function will be passed these arguments (NB. indices are relative to the start of the whole HTML string):
      1. the current value
      2. the start index of the attribute
      3. the end index of the attribute
      4. the start index of the element
      5. the end index of the element
  • Soup will respect the original quote style of each attribute it updates whenever possible (but quotes will be added to non-quoted values if necessitated by characters in the new value).

Example – adding a query string to all image URLs:

soup.setAttribute('img', 'src', function (oldValue) {
  return oldValue + '?12345';
});

getAttribute(selector, attributeName, callback)

Same as .setAttribute(), except that any return value from your callback will be ignored.

setInnerHTML(selector, attributeName, newHTML)

  • newHTML can be:
    • a string of HTML
    • a function that returns a string of HTML
      • this will be passed the oldHTML
    • null for "no change"

Example – appending new content inside an element:

soup.setInnerHTML('#foo', function (oldHTML) {
  return oldHTML + '<p>appended content</p>';
});

License

Copyright (c) 2014 Callum Locke. Licensed under the MIT license.

soup's People

Contributors

callumlocke avatar xhmikosr avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

soup's Issues

npm audit error

Soup requires an ancient version of cheerio, which in turn requires an obsolete version of lodash, which triggers multiple npm audit security errors ("Prototype Pollution"). Could you please update to require a current version of "cheerio"?

Errors with namespaces and self-closing tags

I don't know if there's support for namespaces and self-closing tags, but Soup isn't working properly with them.

With this code:

var soup = new Soup(htmlContent);
soup.setAttribute('[attr]', 'attr', function(oldValue) {
  return oldValue + '?t=now';
});
soup.toString();

When htmlContent is:

<some:namespace attr="value">
<element attr="other-value">
  • Expected output:

    <some:namespace attr="value?t=now">
    <element attr="other-value?t=now">
  • Actual output:

    <some:namespace attr="value?t=now">
    <element attr="other-value?t=nowother-valueother-value?t=now">

When htmlContent is:

<other attr="value" />
<element attr="other-value">
  • Expected output:

    <other attr="value?t=now" />
    <element attr="other-value?t=now">
  • Actual output:

    <other attr="value?t=now" />
    <element attr="other-value?t=nowother-valueother-value?t=now">

It'd be nice to have support for them. If that's not possible, could you specify it in the docs?

Multiple self-closing tags are incorrectly parsed

Examples of errors

const Soup = require("soup");
soup = new Soup(`
<image src="123" />
<image src="abc" />
`);
soup.setAttribute("image[src]", "src", (url) => {
    return url + "?test";
});

console.log(soup.toString());

Expected output:

<image src="123?test" />
<image src="abc?test" />

Current output:

<image src="123?test" />
<image src="abc?testabcabc?test" />

Correct example

const Soup = require("soup");
soup = new Soup(`
<image src="123"></image>
<image src="abc"></image>
`);
soup.setAttribute("image[src]", "src", (url) => {
    return url + "?test";
});

console.log(soup.toString());

/*
<image src="123?test"></image>
<image src="abc?test"></image>
*/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.