mckamey / htmldistiller.net Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 452 KB

HtmlDistiller: HTML parser and filter library for .NET

C# 100.00%

htmldistiller.net's People

Contributors

Watchers

htmldistiller.net's Issues

HTML Entities don't count toware literal character length.

Repro:
1. Parse markup with some literal text which contains long strings of 
unicode text, e.g. 
"<p>АмериканскаяфирмаПерсонaльныйспутник</p>"
2. Specify that non-ASCII chars should be encoded.
3. Attempt to force word wrapping the text at a length shorter than the 
literal string.

Actual result:
- The literal length resets when it encounters HTML entites (e.g. treats 
them similarly to tags).

Expected result:
- The literal chars should be interpreted the same as non-encoded/ASCII 
chars. (i.e. they should count toward the literal string length).

Original issue reported on code.google.com by mckamey on 14 Jul 2009 at 6:30

Occasionally less-than symbols in <script> blocks may be interpreted as tag starts

When putting less-than symbols inside <script> blocks, if it looks like a tag 
it may start 
to be parsed as a tag.

What steps will reproduce the problem?
1. parse the following: "<script>generics like 'new List<string>()' look like a 
tag to the 
parser.</script>"
2. notice during the parsing looks like a <string> tag has been parsed

What is the expected output?
- you would expect that <script> and <style> tags are treated as CDATA blocks 
and the 
contents would not be further parsed

What do you see instead?
- instead you will notice that an inner tag has been parsed and the rest of the 
flow of the 
content will adjust accordingly (respecting tag-balancing options, etc.)

Workarounds:
- place a space between the less-than character '<' and the starting word
- encode the less-than character as '&lt;'

Original issue reported on code.google.com by mckamey on 19 Sep 2009 at 12:49

HTML Entities don't count toware literal character length.

Repro:
1. Parse markup with some literal text which contains long strings of 
unicode text, e.g. 
"<p>АмериканскаяфирмаПерсонaльныйспутник</p>"
2. Specify that non-ASCII chars should be encoded.
3. Attempt to force word wrapping the text at a length shorter than the 
literal string.

Actual result:
- The literal length resets when it encounters HTML entites (e.g. treats 
them similarly to tags).

Expected result:
- The literal chars should be interpreted the same as non-encoded/ASCII 
chars. (i.e. they should count toward the literal string length).

Original issue reported on code.google.com by mckamey on 14 Jul 2009 at 6:30

mckamey / htmldistiller.net Goto Github PK

htmldistiller.net's People

Contributors

Watchers

htmldistiller.net's Issues

HTML Entities don't count toware literal character length.

Occasionally less-than symbols in <script> blocks may be interpreted as tag starts

HTML Entities don't count toware literal character length.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent