Giter VIP home page Giter VIP logo

tidy-html5's Introduction

HTACG HTML Tidy

All other READMEs and related materials can be found in README/. Although all of our materials should be linked in this README, be sure to check this directory for documents we’ve not yet added to this document.

Building HTML Tidy

Branches and Versions

Learn about which branches are available, which branch you should use, and how HTML Tidy’s versioning scheme works.

Contributing and Development Guides

We gladly accept PRs! Read about some of our contribution guidelines, and check out some of the additional explanatory documents that will aid your understanding of how to accomplish certain things in HTML Tidy.

General Contribution Guidelines

These are some general guidelines that will help you help us when it comes to making your own contributions to HTML Tidy.

Adding Features Guides

When you’re ready to add a great new feature, these write-ups may be useful.

  • Learn how to add new element attributes to HTML Tidy by reading README/ATTRIBUTES.md.
  • Discover how to add new tags to Tidy in README/TAGS.md.
  • If you want to add new messages to Tidy, read README/MESSAGE.md.
  • Configuration options can be added according to README/OPTIONS.md.
  • Pull Requests must pass all existing regression tests, or you must change existing regression test expectations with a good explanation. New features require that you add new regression tests. See README/TESTING.md for more details.

Language Localization Guides

Tidy supports localization, and welcomes translations into various languages. Please read up on how to localize HTML Tidy.

Other Important Links

History

This repository should be considered canonical for HTML Tidy starting from 2015-January-15.

  • This repository originally transferred from w3c.github.com/tidy-html5, then redirected to the current site, but now dead.

  • First moved to Github from tidy.sourceforge.net. Note, this site is kept only for historic reasons, and is not now well maintained.

Tidy is the granddaddy of HTML tools, with support for modern standards. Have fun...

License

HTML Tidy and LibTidy are free and open source software with a permissive license.

tidy-html5's People

Contributors

adammajer avatar arrmo avatar balthisar avatar bdesham avatar brlin-tw avatar cqcallaw avatar ermshiperete avatar gagern avatar geoffmcl avatar halindrome avatar hugotiburtino avatar jidanni avatar johnweldon avatar jokester avatar lacombar avatar ler762 avatar lhchavez avatar ltx2018 avatar marcoscaceres avatar nokome avatar pedromorgan avatar peterkelly avatar rffontenelle avatar seaburg avatar sideshowbarker avatar skynet avatar spk avatar sria91 avatar stevenle avatar vielmetti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tidy-html5's Issues

<meta http-equiv="refresh"> fails to validate inside <noscript> in <head>

Minimal test case:

<!DOCTYPE html>
<html>
<head>
  <title>Minimal Test Cast for a meta tag inside noscript</title>

  <noscript>
    <meta http-equiv="refresh" content="0; url=/javascript-disabled/">
  </noscript>

</head>
<body>
</body>
</html>

Also, the output seems to be HTML 4.01 EN? See output below:

└──>>tidy meta-noscript.html 
line 6 column 3 - Warning: inserting implicit <body>
line 7 column 5 - Warning: <meta> isn't allowed in <noscript> elements
line 6 column 3 - Info: <noscript> previously mentioned
line 10 column 1 - Warning: </head> isn't allowed in <body> elements
line 6 column 3 - Info: <body> previously mentioned
line 11 column 1 - Warning: discarding unexpected <body>
line 6 column 3 - Warning: trimming empty <noscript>
Info: Document content looks like HTML 4.01 Strict
Info: No system identifier in emitted doctype
5 warnings, 0 errors were found!

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 16 November 2011), see www.w3.org">
<title>Minimal Test Cast for a meta tag inside noscript</title>
<meta http-equiv="refresh" content="0; url=/javascript-disabled/">
</head>
<body>
</body>
</html>

To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to [email protected]
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium

Empty <canvas> tags stripped

Empty tags are stripped by tidy.

Before:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Canvas</title>
  </head>
  <body>
    <canvas></canvas>
  </body>
</html>

After:

<!DOCTYPE html>
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for Linux from https://github.com/w3c/tidy-html5">
<meta charset="utf-8">
<title>Canvas</title>
</head>
<body>
</body>
</html>

libtidy output error on OS X 10.8.1

libtidy appears to build and install properly for me using the included steps, but outputs gibberish in use (e.g. @<GS>B<BS><SOH>). The command-line tool works perfectly.

I'm including the result of make install below in case I'm doing something wrong. Any ideas?

Making install in src
 /opt/local/bin/gmkdir -p '/usr/local/lib'
 /bin/sh ../libtool   --mode=install /opt/local/bin/ginstall -c   libtidy.la '/usr/local/lib'
libtool: install: /opt/local/bin/ginstall -c .libs/libtidy-0.99.0.dylib /usr/local/lib/libtidy-0.99.0.dylib
libtool: install: (cd /usr/local/lib && { ln -s -f libtidy-0.99.0.dylib libtidy.dylib || { rm -f libtidy.dylib && ln -s libtidy-0.99.0.dylib libtidy.dylib; }; })
libtool: install: /opt/local/bin/ginstall -c .libs/libtidy.lai /usr/local/lib/libtidy.la
libtool: install: /opt/local/bin/ginstall -c .libs/libtidy.a /usr/local/lib/libtidy.a
libtool: install: chmod 644 /usr/local/lib/libtidy.a
libtool: install: ranlib /usr/local/lib/libtidy.a
make[2]: Nothing to be done for `install-data-am'.
Making install in console
 /opt/local/bin/gmkdir -p '/usr/local/bin'
  /bin/sh ../libtool   --mode=install /opt/local/bin/ginstall -c tidy tab2space '/usr/local/bin'
libtool: install: /opt/local/bin/ginstall -c .libs/tidy /usr/local/bin/tidy
libtool: install: /opt/local/bin/ginstall -c .libs/tab2space /usr/local/bin/tab2space
make[2]: Nothing to be done for `install-data-am'.
Making install in include
make[2]: Nothing to be done for `install-exec-am'.
 /opt/local/bin/gmkdir -p '/usr/local/include'
 /opt/local/bin/ginstall -c -m 644 platform.h tidy.h tidyenum.h buffio.h '/usr/local/include'
make[2]: Nothing to be done for `install-exec-am'.
make[2]: Nothing to be done for `install-data-am'.

Output <video> similar to <ul>

Currently,

    elements.

    For example, before:

    <!DOCTYPE html>
    <html lang="en">
      <head>
        <meta charset="utf-8">
        <title>Video</title>
      </head>
      <body>
        <video width="320" height="240" controls="controls">
          <source src="movie.mp4" type="video/mp4" />
          <source src="movie.ogg" type="video/ogg" />
          Your browser does not support the video tag.
        </video>
      </body>
    </html>

    After:

    <!DOCTYPE html>
    <html lang="en">
    <head>
    <meta name="generator" content=
    "HTML Tidy for Linux from https://github.com/w3c/tidy-html5">
    <meta charset="utf-8">
    <title>Video</title>
    </head>
    <body>
    <video width="320" height="240" controls="controls"><source src=
    "movie.mp4" type="video/mp4"> <source src="movie.ogg" type=
    "video/ogg"> Your browser does not support the video tag.</video>
    </body>
    </html>

    Proposed:

    <!DOCTYPE html>
    <html lang="en">
    <head>
    <meta name="generator" content=
    "HTML Tidy for Linux from https://github.com/w3c/tidy-html5">
    <meta charset="utf-8">
    <title>Video</title>
    </head>
    <body>
    <video width="320" height="240" controls="controls">
    <source src="movie.mp4" type="video/mp4">
    <source src="movie.ogg" type="video/ogg"> 
    Your browser does not support the video tag.
    </video>
    </body>
    </html>

TidyMergeClean merges <font> into <p>

TidyMergeClean merges <font> into <p>, for example:

<p style="background-color: #00ffff;"><font style="background-color: #ffff00">Text</font></p>

and

<P><FONT style="BACKGROUND-COLOR: #00ff00">Green Text</FONT></P>

As a result, the paragraph takes the font's color, which is wrong.

No warnings for missing </p> end tag in XHTML mode

The following XHTML is missing </p> end tags but Tidy validation does not issue any warning about this:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
</head>

<body>

  <p>one

  <p>two

</body>
</html>

Since the document is XHTML and must be well formed, should Tidy not warn about this?

Could this be added / fixed in this fork?

Create Github page with documentation

There's a lot of nice documentation in the project which is not accessible directly in Github making it impossible to read without doing a clone.

Github provides a super simple way of rendering documentation by creating a branch called gh-pages and committing relevant files in there, see http://pages.github.com.

It would be great to set this up, I'm happy to make a pull request with the relevant files if someone sets the branch up.

See page here: http://w3c.github.com/tidy-html5/

Parser too greedy over <script> blocks

I ran into a known bug, open for 5 years:
http://sourceforge.net/tracker/?func=detail&atid=390963&aid=1642186&group_id=27659

Proposed patch:
http://sourceforge.net/tracker/?func=detail&atid=390965&aid=1644645&group_id=27659

<!DOCTYPE html>
<html>
<head><title></title>
<body>
<script>
var a = '<script';
</script>
</body>
</html>

gives me:

line 7 column 7 - Warning: '<' + '/' + letter not allowed here
line 8 column 5 - Warning: '<' + '/' + letter not allowed here
line 9 column 5 - Warning: '<' + '/' + letter not allowed here
line 5 column 1 - Warning: missing </script>
line 5 column 1 - Warning: missing </script>

unit tests failing

Hi all

I am getting the following errors.

== 427844 failed (Status received: 0 vs expected: 1)
Info: Doctype given is "-//W3C//DTD HTML 4.0 Transitional//EN"
Info: Document content looks like HTML 4.01 Transitional
Info: No system identifier in emitted doctype
No warnings or errors were found.

== 431719 failed (Status received: 0 vs expected: 1)
Info: Doctype given is "-//W3C//DTD HTML 3.2//EN"
Info: Document content looks like HTML 3.2
No warnings or errors were found.

== 431883 failed (Status received: 0 vs expected: 1)
Info: Doctype given is "-//W3C//DTD HTML 4.0//EN"
Info: Document content looks like HTML 4.01 Strict
Info: No system identifier in emitted doctype
No warnings or errors were found.

== 435909 failed (Status received: 0 vs expected: 1)
Info: Doctype given is "-//W3C//DTD HTML 4.0 Transitional//EN"
Info: Document content looks like HTML 4.01 Transitional
Info: No system identifier in emitted doctype
No warnings or errors were found.

Are these known issues?

Thanks

Chris

Code cleanup

Since we're starting a new project, let's take the time up front to clean up some of the artifacts from the old tidy:

  • Standardize tabs/spaces and indenting, including adding vim mode lines (and emacs if anyone wants them)
  • Clean all spurious whitespace so that we don't have false diffs down the road.
  • Remove all CVS cruft. These have no value to us:
   CVS Info:
     $Author: arnaud02 $
     $Date: 2007/02/11 09:45:08 $
     $Revision: 1.9 $

HTML5 Doctype

Hi, great job on getting HTML5 support into Tidy!

Are there any plans on supporting the HTML5 doctype? It seems that with doctype = "auto" Tidy should recognize this as an HTML5 doctype:

<!DOCTYPE html>

Alas, it gets converted to a 4.01 doctype:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">

<video> issue with </source> tags

Looks like there's an issue when users use <source></source> and multiple <source> tags inside a

Input:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Video</title>
  </head>
  <body>
     <video src="ogg">
       <source class="webm" src="demo.webm" type="video/webm"></source>
       <source class="mp4" src="demo.mp4" type="video/mp4"></source>
     </video>
  </body>
</html>

Errors:

line 8 column 6 - Warning: replacing unexpected source by </source>
line 10 column 60 - Warning: discarding unexpected </source>
line 11 column 6 - Warning: discarding unexpected </video>
Info: Document content looks like HTML5
3 warnings, 0 errors were found!

Output: (notice the second <source> tag appears outside of the <video> tag)

<!DOCTYPE html>
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for Linux from https://github.com/w3c/tidy-html5">
<meta charset="utf-8">
<title>Video</title>
</head>
<body>
<video src="ogg"><source class="webm" src="demo.webm" type=
"video/webm"></video> <source class="mp4" src="demo.mp4" type=
"video/mp4">
</body>
</html>

Release 0.1

We'd like to have 0.1 released by 7/31/2012.

Need a tidyVersion() function

For my Perl code that wraps tidy-html5, I'm going to need a function that returns a version number of the library. This is so my Perl code can know that it's calling an underlying library of some minimum version.

The constant TY_(release_date)[] in src/version.h doesn't help because I can't tell any sequence from that.

empty <script> tag wraps but doesn't indent properly

Given this input file:

<!doctype html>
<html>
<head>
<title>test</title>
<link href="foo.css" media="screen" rel="stylesheet" type="text/css">
<script type="text/javascript" src="bar.js"></script>
</head>
<body>
</body>
</html>

I'll run tidy like this:

tidy --indent yes --indent-spaces 4 --wrap 0 --quiet yes --tidy-mark no test.html

And get:

<!DOCTYPE html>
<html>
    <head>
        <title>
            test
        </title>
        <link href="foo.css" media="screen" rel="stylesheet" type="text/css">
        <script type="text/javascript" src="bar.js">
</script>
    </head>
    <body>
    </body>
</html>

Note that the closing script tag is wrapped but not indented. If I put a comment inside the tags:

<!doctype html>
<html>
<head>
<title>test</title>
<link href="foo.css" media="screen" rel="stylesheet" type="text/css">
<script type="text/javascript" src="bar.js"><!-- nothing --></script>
</head>
<body>
</body>
</html>

Then I'll get this:

<!DOCTYPE html>
<html>
    <head>
        <title>
            test
        </title>
        <link href="foo.css" media="screen" rel="stylesheet" type="text/css">
        <script type="text/javascript" src="bar.js">
<!-- nothing -->
        </script>
    </head>
    <body>
    </body>
</html>

I'm thinking that the closing script tag should either not wrap at all (in the case where there's no content) or wrap and be indented to match the opening tag.

Thanks.

Content of script tags without type get wrapped into uncommented CDATA

Hey,

I have run into a quite worrying bug, seems that if there's a <script> tag without a type parameter – which is perfectly valid in HTML5 – Tidy will enclose the contents into an uncommented CDATA :(

Here's a very simple example which reproduces the error for me with Tidy compiled on Max OS X Lion 10.7.3 using the latest (1st March 2012 – 9412ef6) commit.

Using...

tidy --write-back yes test.html

...on this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB" lang="en-GB">


<head>

</head>

<body class=" x ">
<script>
    try { top.document.domain } catch (e) {
        var f = function() { document.body.innerHTML = ''; }
        setInterval(f,1);
        if (document.body) document.body.unload = f;
    }
</script>



<div id="skip-links">
    <p class="skip-link-p">
        Skip to: 
        <a accesskey="1" class="skip-link" href="#content">content</a>, 
        <a accesskey="2" class="skip-link" href="#nav-links">navigation</a>
    </p>
</div>


</body>
</html>

...results in this:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB" lang=
"en-GB">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X https://github.com/w3c/tidy-html5/tree/f2825b9" />
<title></title>
</head>
<body class=" x">
<script>
<![CDATA[
        try { top.document.domain } catch (e) {
                var f = function() { document.body.innerHTML = ''; }
                setInterval(f,1);
                if (document.body) document.body.unload = f;
        }
]]>
</script>
<div id="skip-links">
<p class="skip-link-p">Skip to: <a accesskey="1" class="skip-link"
href="#content">content</a>, <a accesskey="2" class="skip-link"
href="#nav-links">navigation</a></p>
</div>
</body>
</html>

Whitespace removed after <a>...</a>

Current matster (4ff3234) removes white space following <a>...</a>. This HTML

<html>
<head>
</head>
<body>
<a href="n" name="n">one</a> two
</body>
</html>

is cleaned up to

<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<a href="n" name="n" id="n">one</a>two
</body>
</html>

Notice that the single white space before "two" is missing in the output.

This is a regression from the original Tidy from SourceForge.

Add option to force HTML5 doctype

At the moment there's no way to make Tidy convert doctype to the simple HTML5 format, which is a shame as it could be a really useful tool to bulk convert legacy markup.

Adding a forcehtml5 option to --doctype which acts the same as omit, but adds a simple <!DOCTYPE html> back in would be a great start I think.

<ul> inside <address>

From what I understand, HTML5 allows a <ul> element to be a child of an <address> element. However, if I pass the following through Tidy (with a HTML5 doctype):

<address id="contact">
    <ul>
        <li>Item 1</li>
        <li>Item 2</li>
        <li>etc...</li>
    </ul>
</address>

I get these warnings:

Warning: missing </address> before <ul>
Warning: discarding unexpected </address>

and Tidy also decides to change the markup to:

<address id="contact"></address>
<ul>
    <li>Item 1</li>
    <li>Item 2</li>
    <li>etc...</li>
</ul>

I think this behaviour was correct for HTML4 but has since changed in HTML5. Correct me if I'm wrong.

indent directive fails

tidy -i
Assertion failed: (option_defs[ optId ].type == TidyInteger), function prvTidySetOptionInt, file config.c, line 392.

Mac OSX 10.6.8
Textmate uses the -i directive in it's HTML-bundle/tidy command.

ISO-8859-1 special characters not translated correctly

When I first tried this fork of tidy, I noticed that the following html code

<!DOCTYPE html>
<html><head><title></title></head>
<body>
  <p>It costs 15&cent;.</p>
</body>
</html>

is translate by tidy into the following:

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 (experimental) for Mac OS X https://github.com/w3c/tidy-html5/tree/c63cc39">
<title></title>
</head>
<body>
<p>It costs 15¢.</p>
</body>
</html>

Note that &cent; doesn't appear in the output... it has been replaced by an escape sequence. If this is intended behavior, how do I revert to the behavior in the original fork of tidy? Thank you.

Build fails on Mac OS X 10.6.1

make -C build/gmake/

fails with

if [ ! -d ../../bin ]; then mkdir ../../bin; fi
gcc -g -pedantic -Wall -I ../../include -Wunused-parameter -D_DEBUG=1 -D_MSC_VER=1400 -o ../../bin/tidy ../../console/tidy.c -I../../include ../../lib/libtidy.a
ld: in ../../lib/libtidy.a, archive has no table of contents
collect2: ld returned 1 exit status
make: *** [../../bin/tidy] Error 1

Windows?

I have GOW installed but am still unable to build on Windows. Will support for Windows be added in the future?

Error -

PS C:\wamp\www\1_Resources\w3c-tidy-html5-d194e87\w3c-tidy-html5-d194e87> make -C build/gmake/
C:\Program Files (x86)\Gow\bin\make.exe: Entering directory C:/wamp/www/1_Resources/w3c-tidy-html5-d194e87/w3c-tidy-htm l5-d194e87/build/gmake' if [ ! -d ./obj ]; then mkdir ./obj; fi gcc -g -pedantic -Wall -I ../../include -Wunused-parameter -D_DEBUG=1 -D_MSC_VER=1400 -o obj/access.o -c ../../src/acce ss.c process_begin: CreateProcess((null), gcc -g -pedantic -Wall -I ../../include -Wunused-parameter -D_DEBUG=1 -D_MSC_VER=14 00 -o obj/access.o -c ../../src/access.c, ...) failed. make (e=2): The system cannot find the file specified. C:\Program Files (x86)\Gow\bin\make.exe: *** [obj/access.o] Error 2 C:\Program Files (x86)\Gow\bin\make.exe: Leaving directoryC:/wamp/www/1_Resources/w3c-tidy-html5-d194e87/w3c-tidy-html
5-d194e87/build/gmake'

Please update documentation

Is it possible to update the documentation to include any new config options added by the html5 patch?

Is there a "output-html5: yes" or is this patch just to stop tidy from removing html5 elements that it couldn't recognize before?

Incorrect indentation for script tag

Input:

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <title></title>
    </head>
    <body>
        <p>Paragraph 1</p>
        <p>Paragraph 2</p>
        <script>
            function() {
                return 'Hello, World!';
            }
        </script>
        <script src="foo.js"></script>
    </body>
</html>

Command (no config file):

tidy --indent 1 --indent-spaces 4 input.html

Output:

<!DOCTYPE html>
<html>
    <head>
        <meta name="generator" content=
        "HTML Tidy for HTML5 (experimental) for Mac OS X https://github.com/w3c/tidy-html5/tree/68a9e74">
        <meta charset="utf-8">
        <title></title>
    </head>
    <body>
        <p>
            Paragraph 1
        </p>
        <p>
            Paragraph 2
        </p><script>
    function() {
                return 'Hello, World!';
            }
        </script> <script src="foo.js">
</script>
    </body>
</html>

Issues:

  • There is no newline between the closing tag for paragraph 2 and the opening script tag
  • The script tag is closed on the wrong line

Format string warnings

When building the following format string warnings are generated.

localize.c: In function ‘prvTidyReportAccessWarning’:
localize.c:1376: warning: format not a string literal and no format arguments
localize.c: In function ‘prvTidyReportAccessError’:
localize.c:1383: warning: format not a string literal and no format arguments
localize.c: In function ‘prvTidyReportWarning’:
localize.c:1402: warning: format not a string literal and no format arguments
localize.c: In function ‘prvTidyReportError’:
localize.c:1483: warning: format not a string literal and no format arguments
localize.c:1502: warning: format not a string literal and no format arguments
localize.c: In function ‘prvTidyReportFatal’:
localize.c:1551: warning: format not a string literal and no format arguments

Name it

A new name that is not "tidy" will help differentiate this project from the old HTML4-only tidy.

Write design goals

Create a design document that states the high-level design decisions. Things like "We only validate HTML 5, and not HTML 4 or XHTML", or "We will only run on these compilers: gcc 4.x or above, Visual C whatever," etc etc.

Breaks valid HTML 4 by upgrading HTML5

(downloaded latest version) One single proprietary tag will break tidy-html5. For example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Testing this</title>
</head>
<body>
<div id="container">

<table width="900" border="0" cellpadding="0" cellspacing="0">
<tr>
<td><a href="/"><img src="image.png" alt="some text" width="274" height="80" border="0"></a></td>
<td width="44%" rowspan="2" align="center" valign="middle" background="images/header-back-blue.png"></td>
<td width="26%" align="right" valign="top">Some text</td>
</tr>
</table>
</div>
</body>
</html>

Simply by placing the "background=" in the <td> (<td> not in HTML5) made tidy-html5 upgrade this otherwise valid HTML 4 document into invalid HTML5.

A feature that would be greatly needed by me is the ability to force tidy-html5 to NOT upgrade the document even if it qualifies. You can force your declaration, but I would like the ability to not upgrade using the automatic detection mode.

Make a test suite

Make a comprehensive test suite that is runnable via a simple make test.

trouble building libtidy commit d3440ed9f7 on Mac OS X 10.7.4

I try
sh build/gnuauto/setup.sh && ./configure && make
and get back

Undefined symbols for architecture x86_64:
"_prvTidyCleanGoogleDocument", referenced from:
_tidyCleanAndRepair in tidylib.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status
make[1]: *** [libtidy.la] Error 1
make: *** [all-recursive] Error 1

Any tips or work-arounds?

Empty <i> tags stripped

Tidy removes empty <i> tags, which are used by Twitter Bootstrap for placed Glyphicons

<i class="icon-search"></i>

Is there an option to tell Tidy to never remove tags? If not, maybe there should be.

Preseve original source, only indent

I am building HTML 5 templates which are generated locally using Ruby ERB templates.

The output is not pretty and I have a lot of invalid and specific source code which I need in my final output.

For example, I have inline longhand CSS and <style> tags which need to be preserved within <body>. I do not want any HTML helpers modifying any of this at all.

Is it possible to add a configuration which does not alter any original source code and only indents my markup as it's written?

TITLE-Attribute should not wrap

Even with --wrap-attributes yes the TITLE-attributes should (like the ALT-attribute right now) not be wrapped, because the browsers treat this like a PRE formatted text in the tooltips.

Proposed patch:

diff --git a/src/pprint.c b/src/pprint.c
index 375abb8..0b0c995 100644
--- a/src/pprint.c
+++ b/src/pprint.c
@@ -1158,7 +1158,7 @@ static void PPrintAttribute( TidyDocImpl* doc, uint indent,
     {
         if ( TY_(IsScript)(doc, name) )
             wrappable = cfgBool( doc, TidyWrapScriptlets );
-        else if (!(attrIsCONTENT(attr) || attrIsVALUE(attr) || attrIsALT(attr)) && wrapAttrs )
+        else if (!(attrIsCONTENT(attr) || attrIsVALUE(attr) || attrIsALT(attr) || attrIsTITLE(attr)) && wrapAttrs )
             wrappable = yes;
     }

An unrelated minor optimization: putting the wrapAttrs first will allow faster evaluation of the if-statement, when wrapAttrs == false.

Separate "informational messages" from warning/error count summary

I was just looking for a way to switch off the rather long "informational messages" that get sent to stderr and found this:

quiet

Type:    Boolean
Default: no
Example: y/n, yes/no, t/f, true/false, 1/0

This  option  specifies if Tidy should output the summary of the
numbers of errors and warnings, or the welcome or  informational
messages.

Using quiet: yes in a config file works as expected. The only problem being that it also disables the summary of the number of warnings/errors along with it, which doesn't really seem to be related at all, and is actually quite useful.

Would it be possible to make these options separate to allow enabling the summary but disabling the messages? Perhaps quiet: yes/no could be kept for the messages and something like summary: yes/no could be used for the warning/error counts?

new-blocklevel-tags doesn't work with XML para tag

This issue is not specific to this version of HTML Tidy, i.e. it was in previous versions as well. Seems like this is a good place to address it however.

Config:
indent: yes
indent-spaces: 4
input-xml: yes
output-xml: yes
new-blocklevel-tags: para
wrap: 0

XML:


<?xml version="1.0" encoding="utf-8"?>
<document>
<topic>
<para>
<b>Name:</b>Torgo.
</para>
</topic>
</document>

Converts to:


<?xml version="1.0" encoding="utf-8"?>
<document>
    <topic>
        <para>
        <b>Name:</b>Torgo.</para>
    </topic>
</document>

Should be:


<?xml version="1.0" encoding="utf-8"?>
<document>
    <topic>
        <para>
            <b>Name:</b>Torgo.
        </para>
    </topic>
</document>

TidyMergeEmphasis: Error in Doc or Code?

The built-in description of TidyMergeEmphasis states that, if it is not set,

<span>foo <b>bar <b>baz</b></b> </span>

is tidied to

<span>foo <b>bar baz</b></span>

I found this to be false. Instead, I get

<span>foo <b>bar <b>baz</b></b></span>

So either the code or the documentation is wrong.

Tested against a61504c, 26. Jun 2012.

Support for empty <span> tags

Hi there,

Our developers often use <span> tags for styling, which gets stripped by tidy, but there doesn't seem to be an option to override. I'm wondering if it'd be worthwhile to add an option for this, or to always allow empty <span> tags?

I imagine the change is very trivial (probably just 2 lines of code, similar to f6a3bbe), but I'm curious if there's a reason why this behavior is the way it is currently?

Thanks,

Steven

No context in warning/error messages when using globbing

If I use the command tidy -e -quiet *.html in a directory with many HTML files, any warning/error messages show line numbers but not file names. This makes it very difficult to see which errors belong to which files without going through them again one by one. This also happens with the logging feature, as in: tidy -f tidy.log *.html.

Some Unix utilities (ls, file etc.) solve this by adding a header to each item when dealing with multiple inputs or globbing, for example:

$ ls logo/ site/
logo/:
horizontal.svg  icon.svg  Makefile  vertical.svg

site/:
css  img  contact.html  index.html  portfolio.html

It's not necessarily a major issue, since it's easy to fix by doing something like:

for file in *.html; do
    echo "$file"
    tidy -e -quiet "$file"
done

...but I just thought I'd create an issue here anyway, in case it was an oversight.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.