Giter VIP home page Giter VIP logo

es-spec-html's Introduction

es-spec - Convert the ECMAScript Language Specification to HTML

To run the program:

./es-spec.py es6-draft.docx

Note: Python 3 is required.

About this program

Architecture: The program is in four parts:

  • Load the Word document (docx.py)
  • Convert it to extremely rough HTML+CSS (transform.py)
  • Apply a series of transformations, ranging from minor tweaks to very fancy algorithms, to the HTML (fixups.py)
  • Dump the resulting HTML document (htmodel.py)

Most of the interesting work, and most of the bugs, are in fixups.py.

Fragility: The script is quite sensitive to the input document and will throw an exception and give up if the document isn't exactly as expected. It's been hard to balance (a) being "liberal in what you accept" with (b) making sure fixups do not break silently, but rather get the user's attention, when the input document changes in unexpected ways.

Debugging: If a directory named _fixup_log exists under the current directory, the script dumps the whole halfway-transformed document to a file in that directory after each fixup.

es-spec-html's People

Contributors

jorendorff avatar mathiasbynens avatar norbertlindenberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

es-spec-html's Issues

Stripped indentation in 11.2.1

Properties are accessed by name, using either the dot notation:
MemberExpression . IdentifierName
CallExpression . IdentifierName

or the bracket notation:
MemberExpression [ Expression ]
CallExpression [ Expression ]

Some of this is supposed to be indented.

Cut the Scrap Heap

The Scrap Heap is 25 screensful of text at the end of the document, containing assorted oddments that have been deleted from the main body. I've been confused by it more than once. ("Oh, wait, none of this is part of the proposed spec, I'm in the Scrap Heap.")

It probably should be a separate Word document. Failing that, how about we just strip it from the HTML version.

List recovery bug in 8.2.4.2

In 8.2.4.2, “Let succeeded be the result of calling the [[Set]] internal method of base” — the document has all this in a single paragraph, but the OOXML markup clearly has two separate paragraphs. Not sure what's going on.

Some algorithm steps mysteriously formatted in Times New Roman

In 8.3.16.7:

          <li><span style="font-family: Times New Roman">Let <i>functionPrototype</i> be the intrinsic object</span>
              %FunctionPrototype%.</li>

In 8.5.1:

    <li><span style="font-family: Times New Roman">Let <i>trap</i> be the result of <a
        href="#sec-9.3.7">GetMethod</a>(<i>handler</i>,</span> "<code>getPrototypeOf</code>").</li>

and so on.

This is not all that mysterious; we just need to do a better job figuring out what the default font for a paragraph is supposed to be. Right now we check the first and last span and see if they use the same font. These paragraphs have a few characters at the end that are the wrong font, but it's not noticeable because it's just punctuation.

Terms that should be links but aren't

NewFunctionEnvironment

Function environment record

Global environment record (note this will fight with "the global environment")

Function Declaration Instantiation

Empty numbering markers

In 15.1.3 URI Handling Function Properties, Decode Abstract Operation,
step 4.d.vi.2.a and 4.d.vi.3.a, the script prints warnings because it is
misinterpreting the Word document. transform.py seems to be producing
empty markers.

I think the output is fine, just bogus warnings.

Bogus numbering warning in 15.7.3.12

Another bogus warning because we incorrectly interpret the Word document has having wrong numbering.

I think the bug is that numbering state is per-num, but the script is storing it per-abstractNum.

/Users/jorendorff/dev/es-spec-html/fixups.py:507: UserWarning: Word marker is '5.\t', HTML will show '1.\t'
  warn("Word marker is {!r}, HTML will show {!r}".format(marker_str, html_marker_str))
<li style="-ooxml-indentation: 0.0pt">If Type(<span style="font-style: italic">number</span>) is not Number, return <span
    style="font-weight: bold">false</span>.</li>

/Users/jorendorff/dev/es-spec-html/fixups.py:507: UserWarning: Word marker is '6.\t', HTML will show '2.\t'
  warn("Word marker is {!r}, HTML will show {!r}".format(marker_str, html_marker_str))
<li style="-ooxml-indentation: 0.0pt">If <span style="font-style: italic">number</span> is <span style="font-weight:
    bold">NaN</span>, <span style="font-weight: bold">+&infin;</span>, or <span style="font-weight: bold">&minus;&infin;</span>,
    return <span style="font-weight: bold">false</span>.</li>

/Users/jorendorff/dev/es-spec-html/fixups.py:507: UserWarning: Word marker is '7.\t', HTML will show '3.\t'
  warn("Word marker is {!r}, HTML will show {!r}".format(marker_str, html_marker_str))
<li style="-ooxml-indentation: 0.0pt">Otherwise, return <span style="font-weight: bold">true</span>.</li>

Table 31 is not recognized

Table captions are supposed to have the Tabletitle paragraph style. This one doesn't.

I'll email @allenwb about it -- this is perhaps easier to fix in Word than in python.

Add Google Analytics

http://ecma-international.org/ecma-262/5.1/ has the following in its source:

<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://ecma-international.org/ecma-262/5.1/");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-6146537-1");
pageTracker._trackPageview();
</script>
<script type="text/javascript">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-6146537-1']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>

As you can see, there are two different Google Analytics snippets — the first of which is being used incorrectly. (The erroneously edited gaJsHost line makes it try to load ga.js from ecma-international.org rather than from Google.)

To prevent mistakes like this in the future, and to keep the manual post-processing to a minimum, it would probably be a good idea to have the HTML generator script automatically insert the correct snippet.

Here’s an optimized version of that script which could be used in this case:

<script>
 window._gaq = [['_setAccount', 'UA-6146537-1'], ['_trackPageview']];
 (function(d, t) {
   var g = d.createElement(t),
       s = d.getElementsByTagName(t)[0];
   g.src = '//www.google-analytics.com/ga.js';
   s.parentNode.insertBefore(g, s);
 }(document, 'script'));
</script>

Update: The online version has now been fixed to include this correct, optimized snippet.

In 8.2.4.2, _opt marker is not taken as part of grammar production

    <p><span class="prod"><span class="nt">AssignmentElementList</span> <span class="geq">:</span> <span
    class="nt">Elision</span></span> <span style="font-family: sans-serif"><sub>opt</sub></span> <i>AssignmentElement</i></p>

Hard to imagine where this is coming from.

Link internal data properties

I just realized some of them are described adequately somewhere; for example, see Table 37 in es6 draft rev 18.

[[Map]], [[MapNextKind]], and [[MapIterationKind]] should link to rows in that table.

REF bustedness in 13.1.1

allenwb writes:

BTW, Just in case you didn't notice. The numbering differences cause some of the
cross references to be broken. for example see:

http://people.mozilla.org/~jorendorff/es6-draft.html#sec-13.1.1

Yup, they're busted all right. It's due to the numbering snafu, where the doc now contains section numbers like "13.0.1", on purpose, and my numbering code is generating "13.1.1" for some reason. That's the thing to fix.

Table borders are all wrong

See sections 7.6.1.2, 7.7, 7.8.3, etc.

Sometimes the document uses tables for layout; we should notice that there aren't any headings or borders in such a table and give it a special HTML class.

Stripped font in 5.1.6

In 5.1.6, “If the phrase “[empty]” appears”, [empty] should be shown in the same font as in actual grammar productions. Same for “[lookahead ∉ set]” in the next paragraph. Also “[no LineTerminator here]”.

not using <h2>

There are a few places where the spec should use <h2>...</h2> but instead has <p><b>...</b><p>:

18.2.6.1 / Runtime Semantics
B.1.1 / Static Semantics
B.1.4 / Syntax
B.1.4 / Pattern Semantics
D / In Edition ... ?

Add link-menus for methods with multiple implementations

Currently [[Call]] is not a link, because there is not a single obvious place for that to link to.

But [[Call]] should be clickable. On consideration I think clicking it should bring up a menu of links:

[[Call]] method
…described (Table 9)
…of an ordinary Function object (8.3.16.1)
…of an exotic bound Function object (8.4.1.1)
…of an exotic Proxy object (8.5.14)

Like an index.

Making a real index is a lot of work; faking it is what fixup_links does.

Treat all-bold paragraphs as subheadings

At least some subheadings in the document are not marked with any paragraph style at all, it's just a Normal paragraph with bold text in it. These should be converted to h2 the same as any other paragraph with identical appearance.

Correctly parse one-line grammar productions

When the grammar fixups were originally written, we never had lines like

ClassTail: ClassHeritageopt { ClassBody }

in the document. Grammar productions were always on two lines. Now that we have one-line productions, the grammar fixups need to be fixed up.

Serif and sans-serif fonts appear with different x-height

It's unsightly.

sans-serif text in the middle of a serif run should get a smaller font-size to compensate for differences in x-height among the fonts we will end up using; of course without either font-size-adjust or switching to web fonts or fonts everyone has, it's impossible to get this perfect across platforms.

Permalinks

All the section numbers changed in Revision 18, and it will break all incoming links to sections. Need more resilient section ids.

Weird formatting around NOTE in clause 6

The document shows only 1 paragraph beginning

NOTE      ECMAScript differs from the Java programming language...

The OOXML markup in document.xml clearly shows two paragraphs though (with a <del> element at the end of the first one -- maybe that is what causes them to be joined?) and this is currently being rendered as two HTML paragraphs.

Link KeyedDestructuringAssignmentEvaluation

Axel Rauschmayer points out:

“KeyedDestructuringAssignmentEvaluation” is mentioned once and could be a link to #sec-runtime-semantics-keyeddestructuringassignmentevaluation

Opening this because there are two options: 1) just add it to specific_link_source_data_lang; or 2) do something more general (like what the script does for algorithm names; see title_as_algorithm_name).

Generate a multi-page version of the spec

The resulting HTML file is ~1.6 MB, which is enough to freeze some modern browsers (especially if you already have some tabs open).

Would it be possible to generate both a single-page and a multi-page version, i.e. one page per chapter?


P.S. es5.github.io (repo) uses a script that takes the resulting single-page HTML file and splits it up.

8.1.2.4 NewFunctionEnvironment is not the latest revision

Bullet 5 on the word doc is:
5. If F’s has a [[HomeObjectNeedsSuper]] internal slot is true, then
a. Let home be the value of F’s [[HomeObject]] internal slot.
b. If home is undefined, then throw a ReferenceError exception.

a.c. Set envRec’s HomeObject to homethe value of F’s [[HomeObject]] internal slot.

b.d. Set envRec’s MethodName to the value of F’s [[MethodName]] internal slot.

The web version still uses the strikedthrough version.

Random bullet in 8.1.6

The paragraph "Property keys are used to access properties and their values." has a bullet. This isn't in the Word doc.

no way to link to specific paragraphs

Really there needs to be a way to establish an id= attribute on the particular paragraph that defines something, and link to that.

The links for "Assert:" all point to "5.2 Algorithm conventions" which is a massive wall of text.

List recovery bug in B.2.3.2

In B.2.3.2 some lists are messed up. The indentation and numbering are both wrong. There should be one list nested inside another, and instead we have three separate lists.

We can improve by recovering list structure (partly or entirely) from the appearance of the text (particularly indentation and numbering/bullets), not just from the Word list structures, which can be a bit bizarre.

Numbering bug in 15.10.2.6

In 15.10.2.6, last algorithm, the table should be inside the list and the last step should be numbered 4.

We erroneously make a new list and number it "1." instead.

Converter should use UTF-8 for source code and generated HTML

Commit dba375f removed changes that made the Word→HTML converter use UTF-8 in the Python source code and in the generated HTML. I think that commit was a change in the wrong direction:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.