Giter VIP home page Giter VIP logo

html-scrubber's People

Contributors

avereha avatar leejo avatar mrcaron avatar nigelm avatar ruz avatar sergeyromanov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

html-scrubber's Issues

Adding attributes

It would be nice not only to clean up HTML, but also to be able to extend it. I use a sub to manipulate. It would be desirable not only to ban or manipulate attributes, but also to be able to add them (for example adding a target attribute depending on the href attribute)

Feature request: Subroutine validators for tags

If I understood the module HTML::Scrubber correctly, tags are either allowed or not allowed. This is unlike attributes where a callback may decide whether an attribute is dropped or modified.

In a webapp, that I'm working on, there is a hand rolled sanitizer which can drop a complete tag based on the existence or value of attributes. It would be nice to have such functionality in HTML::Scrubber too. This would allow me to eliminate the hand rolled sanitizer.

This is a bit related to #22.

Test Dependencies are now excessive

Prior to merging #9, you were using a combination of testing plugins that made it so "authorship" tests were required for end users to run, and subsequently, the choice of plugins you had caused installation to failure if you didn't satisfy those deps.

Subsequently, in 0.17, you changed that, and authortests now ship in xt, and their dependencies are no longer necessary.

However, you still have this set of "user must install these" test dependencies, which is now just silly:

https://github.com/nigelm/html-scrubber/blob/master/dist.ini#L12-L23

Most of these are now only development deps.

HTML entities are encoded unexpectedly

The module appears to apply HTML::Entities:encode_entities to the passed HTML string but does not decode the data afterwards. This is unexpected.

Ex.
my $scrubber = HTML::Scrubber->new();
print $scrubber->scrub('2 > 1 is true');
2 &gt 1 is true

IMO if it's going to encode then it should decode when it's done.

This was also mentioned in the 0.15 CPAN rating/review: https://cpanratings.perl.org/dist/HTML-Scrubber

Sample code in synopsis does not reflect how module works

The synopsis has this code:

my $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] );
print $scrubber->scrub('<p><b>bold</b> <em>missing</em></p>');
# output is: <p><b>bold</b> </p>

However, the output still contains the missing text contrary to the comment.

Any modules out there that will remove the tag as well as the content?

escaping of < and >

hi nigel,

$ perl -MHTML::Scrubber -e '$s=HTML::Scrubber->new(); print $s->scrub(qq[ Y U escape > & <, HTML::Scrubber? \n])'
Y U escape &gt; & &lt;, HTML::Scrubber?

this behavior caused me trouble. people often use > for quoting in, e.g., email.

i am not the only one who's had this issue:
https://rt.cpan.org/Public/Bug/Display.html?id=69947

it's debatable whether this behavior is "correct" : HTML::Scrubber is used to remove HTML tags of various kinds -- but here it's escaping an entity that, although it might confuse some parser, is not part of any HTML. there appears no option to turn off this behavior.

the guilty lines appear to be 440-441:
$text =~ s/</&lt;/g; #https://rt.cpan.org/Ticket/Attachment/8716/10332/scrubber.patch
$text =~ s/>/&gt;/g;

The comment suggests this behavior was added to address some issue.

In the changelog for HTML::Scrubber, there is the following:
0.03 Mon Jul 21 07:32:10 2003
- closed http://rt.cpan.org/NoAuth/Bug.html?id=2969
now escape spurious >< in text

I searched for the relevant RT ticket to understand why this change was made, but couldn't find it. It's not 8716, 10332, or 2969.

good luck.

michael

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.