Giter VIP home page Giter VIP logo

Comments (7)

sideshowbarker avatar sideshowbarker commented on August 26, 2024

I can't reproduce this when running HTML5 Tidy with the default option settings.

What non-default config options to do you have set? You can do "tidy -show-config" to list all your current settings.

from tidy-html5.

mindnektar avatar mindnektar commented on August 26, 2024

I compared the output of tidy -show-config with the options reference and found no differences, so it appears that no non-default options are set. Here is the output if you'd like to check:

tidy -show-config

Configuration File Settings:

Name                        Type       Current Value                           
=========================== =========  ========================================
accessibility-check         enum       0 (Tidy Classic)                       
add-xml-decl                Boolean    no                                     
add-xml-space               Boolean    no                                     
alt-text                    String                                            
anchor-as-name              Boolean    yes                                    
ascii-chars                 Boolean    no                                     
assume-xml-procins          Boolean    no                                     
bare                        Boolean    no                                     
break-before-br             Boolean    no                                     
char-encoding               Encoding   utf8                                   
clean                       Boolean    no                                     
coerce-endtags              Boolean    yes                                    
css-prefix                  String                                            
decorate-inferred-ul        Boolean    no                                     
doctype                     DocType    auto                                   
doctype-mode                Integer   *2                                      
drop-empty-elements         Boolean    yes                                    
drop-empty-paras            Boolean    yes                                    
drop-font-tags              Boolean    no                                     
drop-proprietary-attributes Boolean    no                                     
enclose-block-text          Boolean    no                                     
enclose-text                Boolean    no                                     
error-file                  String                                            
escape-cdata                Boolean    no                                     
fix-backslash               Boolean    yes                                    
fix-bad-comments            Boolean    yes                                    
fix-uri                     Boolean    yes                                    
force-output                Boolean    no                                     
gnu-emacs                   Boolean    no                                     
gnu-emacs-file              String                                            
hide-comments               Boolean    no                                     
hide-endtags                Boolean    no                                     
indent                      AutoBool   no                                     
indent-attributes           Boolean    no                                     
indent-cdata                Boolean    no                                     
indent-spaces               Integer    2                                      
input-encoding              Encoding   utf8                                   
input-xml                   Boolean    no                                     
join-classes                Boolean    no                                     
join-styles                 Boolean    yes                                    
keep-time                   Boolean    no                                     
language                    String                                            
literal-attributes          Boolean    no                                     
logical-emphasis            Boolean    no                                     
lower-literals              Boolean    yes                                    
markup                      Boolean    yes                                    
merge-divs                  AutoBool   auto                                   
merge-emphasis              Boolean    yes                                    
merge-spans                 AutoBool   auto                                   
ncr                         Boolean    yes                                    
new-blocklevel-tags         Tag names                                         
new-empty-tags              Tag names                                         
new-inline-tags             Tag names                                         
new-pre-tags                Tag names                                         
newline                     enum       LF                                     
numeric-entities            Boolean    no                                     
output-bom                  AutoBool   auto                                   
output-encoding             Encoding   utf8                                   
output-file                 String                                            
output-html                 Boolean    no                                     
output-xhtml                Boolean    no                                     
output-xml                  Boolean    no                                     
preserve-entities           Boolean    no                                     
punctuation-wrap            Boolean    no                                     
quiet                       Boolean    no                                     
quote-ampersand             Boolean    yes                                    
quote-marks                 Boolean    no                                     
quote-nbsp                  Boolean    yes                                    
repeated-attributes         enum       keep-last                              
replace-color               Boolean    no                                     
show-body-only              AutoBool   no                                     
show-errors                 Integer    6                                      
show-warnings               Boolean    yes                                    
slide-style                 String                                            
sort-attributes             enum       none                                   
split                       Boolean    no                                     
tab-size                    Integer    8                                      
tidy-mark                   Boolean    yes                                    
uppercase-attributes        Boolean    no                                     
uppercase-tags              Boolean    no                                     
vertical-space              Boolean    no                                     
word-2000                   Boolean    no                                     
wrap                        Integer    68                                     
wrap-asp                    Boolean    yes                                    
wrap-attributes             Boolean    no                                     
wrap-jste                   Boolean    yes                                    
wrap-php                    Boolean    yes                                    
wrap-script-literals        Boolean    no                                     
wrap-sections               Boolean    yes                                    
write-back                  Boolean    no                                     


Values marked with an *asterisk are calculated 
internally by HTML Tidy

For reference, this is the complete output I get:

cat test.html
<span><a href="/">Test</a></span>

tidy test.html
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 19 - Warning: inserting implicit <span>
line 1 column 19 - Warning: missing </span> before </a>
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
5 warnings, 0 errors were found!

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 (experimental) for Linux https://github.com/w3c/tidy-html5/tree/ddb5702">
<title></title>
</head>
<body>
<span><a href="/"><span>Test</span></a></span>
</body>
</html>

About this fork of Tidy: http://w3c.github.com/tidy-html5/
Bug reports and comments: https://github.com/w3c/tidy-html5/issues/
Or send questions and comments to [email protected]
Latest HTML specification: http://dev.w3.org/html5/spec-author-view/
HTML language reference: http://dev.w3.org/html5/markup/
Validate your HTML5 documents: http://validator.w3.org/nu/
Lobby your company to join the W3C: http://www.w3.org/Consortium

I didn't properly look at the warnings before, as I should have. The "inserting implicit <span>" obviously is the key, but why is it doing that?

from tidy-html5.

sideshowbarker avatar sideshowbarker commented on August 26, 2024

Please do "tidy -v" and check what version of tidy you're using.

Regardless, you should if possible check out the latest sources and rebuild.

from tidy-html5.

sideshowbarker avatar sideshowbarker commented on August 26, 2024

No response for a month, so closing this now. Feel free to re-open it if you have new information.

from tidy-html5.

jznf avatar jznf commented on August 26, 2024

@sideshowbarker
I guess this issue should be reopened since it's not fixed yet even after years.

I've tried version 5.2.0 from http://binaries.html-tidy.org/ on OSX and latest Debian, the behaviour is the same:

<!DOCTYPE html>
<html>
    <head>
        <title></title>
    </head>
    <body>
        <span><button>OK</button></span>
    </body>
</html>

results in

<!DOCTYPE html>
<html>
    <head>
        <meta content="HTML Tidy for HTML5 for Mac OS X version 5.2.0" name="generator">
        <title></title>
    </head>
    <body>
        <span><button><span>OK</span></button></span>
    </body>
</html>

options set are:

indent-attributes: 0,
wrap-attributes: 0,
tidy-mark: 1,
drop-empty-elements: 0,
preserve-entities: 1,
indent: 1,
indent-spaces: 4,
hide-comments: 1,
doctype: 'html5',
sort-attributes: 'alpha',
split: 1,
merge-divs: 0,
merge-spans: 0,`

used via pytidylib6==0.2.2 but behaves the same when used directly with default settings.

Edit: correction of markdown.

from tidy-html5.

geoffmcl avatar geoffmcl commented on August 26, 2024

@jznf thanks for the report...

You have certainly found a case where tidy, since sometime before 2009, will propagate the <span> to inside the <button> block, thus producing additional tags...

But as @sideshowbarker reported way back then, this does not happen in the original minimal html sample given, by @mindnektar, so this is a new case, thus would prefer we open a new Feature Request issue for this... thanks...

I call it a Feature Request since this propagation in certain circumstances has been tidy's action for maybe upwards of 10 years. That is, it is deliberate coded behavior, so is not a bug, per se... We would need to explore why this was chosen... why was it thought needed? ie what case or cases did it fix along the way? And maybe we would need a new tidy option to now stop doing this... all of this points to this being a new issue... that needs its own discussion... thanks...

from tidy-html5.

jznf avatar jznf commented on August 26, 2024

Ok, thanks. I've opened #461

from tidy-html5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.