whatwg / infra Goto Github PK

View Code? Open in Web Editor NEW

118.0 118.0 93.0 268 KB

Infra Standard

Home Page: https://infra.spec.whatwg.org/

License: Other

Makefile 0.08% HTML 99.92%

infra standard whatwg

infra's People

Contributors

Stargazers

Watchers

Forkers

jakearchibald aubakirova shekyan xfq tobie kleopatra999 domfarolino bigboned55 jebcat1982 homemaker1963 jyasskin burtharris equalsjeffh omunroe-com trowbotham dalavancloud 0xrustlang proxenet al-arz jacrites81 amonfire jaerae76 ms2ger hiramtibbit nicks1986 andreubotella canepole90 cane4044 vanessawilson0701 aphillips christian7877 meghlakhan36 therbendo polinar68 hixio-mh xirdigh portablehead c3333 martinthomson acidburn0zzz global-localhost global19 global19-atlassian-net dimberd magicianred bthuntercn kimstacy oflenake bosappyahoo manny27nyc is2ei cryptomoneybotz sitedata eem1919 aaronmedel1987 seanpm2001 seanwallawalla-forks oriblish skalarfeilds79 shekita88 nando4512 https-onlinedeal4unow-com cricket01 surfndez fantasai tabatkins nshcmitz86 dlrobertson bluefire32 grcspace311 bocoup slayer94 aykutbulut katoqiioo tiffbooth cxslucyfer forest-im miketaylr snowwolfjay jcolebeyond02 khunphyo24 yoavweiss bleken marietorres655 khl0de rami-daoud jofernmorais alexanderalonso890 mightb1 michellie2

infra's Issues

Publishing

Logo, see #8
infra.spec.whatwg.org domain, requires @Hixie (put it under "annevankesteren")
Twitter acount
Blog post

Immediately after publishing:

Get into Shepherd
Start PRing various specs to use these concepts
Update biblio.json
Update the https://spec.whatwg.org/ index (requires @Hixie)

Anything else? Please modify this list.

Tracking vector tracking

I'm not sure where exactly we'd want to put this. Thoughts?

Mention how to convert between strings and byte sequences

I.e. by using the Encoding Standard. See e.g. w3c/webauthn#258

Avoid "easy", "simply"

These words do not add anything but risks having the reader feel dumb if they don't understand something supposedly simple.

Add VoidFunction

Several specs end up needing to define IDL that accepts a function which is called for its side effects, which means they use something like:

callback VoidFunction = void ();

It would be nice to have this defined in Infra so we didn't have to worry about colliding global names for this trivial concept.

Sketch out prose for algorithm definitions.

It would be lovely if we could agree upon a standard way of describing algorithms in specs. For instance, it's helpful to understand expected inputs and outputs, but there's no commonly shared way of spelling those out. Some examples:

WebAuthn has note blocks describing inputs: https://w3c.github.io/webauthn/#makeCredential, and describes outputs in prose.
CSP describes both inputs and outputs in prose, usually in the form 'Given a request’s cryptographic nonce metadata (nonce) and a source list (source list), this algorithm returns "Matches" if the nonce matches one or more source expressions in the list, and "Does Not Match" otherwise:'.
ECMAScript describes inputs but not outputs: "The abstract operation PerformEval with arguments x, evalRealm, strictCaller, and direct performs the following steps:"

And so on. It would be great if we could align this to enough of an extent that we could start building tooling support for the callsite as well.

Needs a logo!

Ideas:

Something indicating "foundations" (a house? A lego-ish building block?)
Something very abstract (examples)

""For each key → value of map""

https://infra.spec.whatwg.org/commit-snapshots/f817d690ee9f1a7556805d4796a6ebbfe6eb127f/#map-iterate

"For each key → value of map"

The "For each" links to [=list/for each=], rather than [=map/for each=] as intended.

While true / loop until break

See https://github.com/whatwg/fullscreen/pull/72/files#r101013520 for a need for this.

Do we have cases like this in other specs, and what do they say?

Move Web IDL conversions to Web IDL

Infra should not depend on Web IDL IMO; it should stick with things that are universally applicable to all specs, including non-Web IDL-based ones.

Define string sorting by "code unit order"

Background: whatwg/url#199

We want to update specs to be unambiguous about this, so I think infra is a good place to define it. It should include some examples (similar to the URLSearchParams WPTs).

Which WebIDL types are maps?

Like #14, but plausible WebIDL maps include at least dictionaries, objects, and records. I care in order to iterate over them.

Using generics for bytes / code units / code points

See #1 for some discussion on code units.

A term like "ASCII digit" and others like it are equally meaningful for all three primitives, since the primitives are defined as integers. Should we define these terms as generics so they can apply to each primitive?

Alternative we could change the phrasing, e.g., "An ASCII digit is a byte, code unit, or code point in the range 0x30 to 0x39, inclusive." This would also require slight tweaking of how we define "byte" and "code point".

Criptografia WhatsApp

https://infra.spec.whatwg.org/commit-snapshots/8e8d83d4035e82b82e007ec26c1feecb565fb871/

Comparison

Split from #6. I'd rather not define a case-sensitive match as to me that seems something that an "equal" or "is" operation would also cover, which we already use far more.

URL uses "equal" to define comparison operations for URL and host structs. Should we use "equal" as well here to define it for strings? Or maybe allow both equal and is?

Suggestion:

Allow both "is" and "equal"
Define them for strings (code points; works for JavaScript and scalar value strings) and byte sequences
Use dfn and accept that not all callers will use that (for now)

Do we need this for other data types?

Define list/truncate

[=list/Truncate=] |list| to [=list/size=] |n|.

or even:

[=list/Truncate=] |list| to |n|.

Is a lot more readable than:

[=list/Remove=] all items from |list| except the first |n|, so that |list|'s [=list/size=] is now |n|.

Add basic JavaScript types

We should add undefined, null, and boolean (true, false). We haven't made much type-value distinction thus far so I'm not quite sure how to formulate this. Anyone ideas?

Define initializing variables and setting them?

I.e., let and set. We could do that with <dfn>, but I don't think we want to require documents to link instances.

Control flow in algorithms

Definitions for "abort these steps" and "abort these sub-steps" would be useful (unless they're considered bad practice and should be replaced by "return" and "throw," in which case a note saying so would be great).

In particular, while the meaning of "abort these steps" is obvious when it's in the top-level of steps, it's not super explicit what it means when nested.

Similarly, does "abort these sub-steps" return control to the set of steps right above it, or to the caller of the algorithm?

Are Web IDL sequences lists?

Or do you convert lists to Web IDL sequences?

I think the big difference is that as defined here, lists can contain abstract things. Whereas Web IDL sequences can only contain things which are properly part of the Web IDL type system.

Maybe what we want to do here is state something like "often we use lists in a place that expects sequences, or treat sequences like lists. This kind of implicit conversion is OK, as long as the type systems match up."

What was the motive in choosing algorithm description approach?

Infra Standard defines pseudocode and algorithm description approach that is different from what I usually find in academic papers or classic books like Introduction to Algorithms. Standard also notes that described algorithms aren't intended to be performant:

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms are intended to be easy to follow, and not intended to be performant.)

What was the motive in choosing this specific style of algorithms description?

Use of "one of"

When referencing one of the items from a list, is using "and" or "or" more accurate one over the other? Or they don't matter much? Both of them are being used.

One of "uninstantiated", "errored", or "instantiated", used to prevent reinvocation of ModuleDeclarationInstantiation on modules that failed to instantiate previously.

If header list contains a header whose name is one of If-Modified-Since, If-None-Match, If-Unmodified-Since, If-Match, and If-Range, ...

If origin’s host component matches one of the CIDR notations 127.0.0.0/8 or ::1/128

A job is an abstraction of one of register, update, and unregister request for a service worker registration.

Define numbers (waiting on Number / BigInt)

Should we define numbers and there various notation schemes? (Mathematical operators?)

It might also make sense to define null as being roughly analogous to JavaScript's null and a good initial value for variables.

Lists should contain items, not elements

Otherwise having a list of elements is confusing ("each element of the queue is an HTML element").

Operation to map/transform lists

I frequently want to build a new list based on an old one by modifying the old list's elements using substeps. Perhaps:

Let newList be the result of transforming each item of oldList through the following steps:

Return item + 2.

as shorthand for:

Let newList be a new list.

For each item of oldList:

Append item + 2 to newList.

"Return" could be "include" or "append" or some other term.

The text I'm proposing isn't much shorter, but it keeps the logical operation in a single step instead of spreading it across 2.

Provide a way to iterate/get values of a map

Bikeshed complains if a variable is unused, which happens if you iterate over a map but don't use the key.

The can be worked around with <var ignore>, but maybe it's better to have an explicit way to handle just values? There's already a way to get just the keys.

Control flow for loops

It would be great to have a less awkward way of phrasing to run a loop for the next item: https://dom.spec.whatwg.org/#concept-event-listener-inner-invoke. Basically something like "continue".

Rethink strings

I need to study the various dependencies of strings and figure out what we want to do. It seems there's a couple kind of strings that probably need to be distinguished and named somehow:

JavaScript strings - each code point is in the range U+0000 to U+FFFF
scalar value strings - each code point is a scalar value
byte strings - each code point is in the range U+0000 to U+00FF
ASCII strings - each code point is an ASCII code point
strings - each code point is a code point (I don't think we really have this in the platform even though Encoding defines this kind of string; we have a variant of this where valid surrogate pairs are treated as their own code point)

Define string size

In particular for JavaScript string, see #73, we need something like code-unit length from HTML (and then remove that from HTML and use our new concept).

Either we define size and for JavaScript string it's the number of code units and for scalar value string it's the number of scalar values, or size is always code points and we have code-unit size just for JavaScript strings. The latter is probably slightly better since it makes it more explicit?

Deal with list[n] access where n is negative or => size

We should probably say that it's not possible or actually define what it would return. If we return something it would have to be value like "none" or some such, that doesn't mean anything else and needs to be explicitly dealt with.

Byte sequences

We should also mention that byte sequences can be represented using 0x00 0xFF syntax and maybe flush out the whole concept a bit more with examples and such.

Credit: @foolip.

Add stacks and queues

I remember now that HTML uses these for custom elements and more. After or as part of #7.

Something like:

A list is sometimes called a stack or a queue. These are just other names for list, but come with their own conventional terminology.

To push onto a stack is to...

To pop from a stack is to...

To enqueue from a queue is to...

To dequeue from a queue is to...

Also be sure it's clearly defined what happens when you pop or dequeue from an empty stack/queue.

Define character as alias of code point or stop using it

We're currently using the term character to define syntax. We should probably stop doing that and use the syntax we outline for code points. Slightly weird to be informal here while we require much more of others.

See also #6 on the topic of whether or not to stop using character altogether as something that means code point all or some of the time (it seems somewhat silly to make it mean code point for something where Unicode says the code point is a non-character, but not out of the question).

Term to reference internal concepts

For discussion, we tend to call definitions of spec concepts a concept and their fields an internal slot which is an ECMAScript's specification device. Can we clarify terminologies to reference these internal spec definitions?

String / byte sequence instance manipulation

For a byte sequence it probably makes sense, e.g., https://fetch.spec.whatwg.org/#concept-method-normalize (and also the uppercase/lowercase operations), but for strings it might be a little unexpected given JavaScript. We do it all over though so maybe we should just make that a little bit more clear.

"An ASCII lower alpha is a code point in the ran..."

https://infra.spec.whatwg.org/commit-snapshots/208e4e04632d0c8514a8b5f26f99c8472d7e836d/#example-code-point-notation

An ASCII lower alpha
is a code point in the range U+0041 to U+005A, inclusive.

An ASCII upper alpha
is a code point in the range U+0061 to U+007A, inclusive.

U+0041 is Latin Capital Letter A
U+005A is Latin Capital Letter Z
U+0061 is Latin Small Letter a
U+007A is Latin Small Letter z

So the first range should be ASCII upper alpha; the second range should be ASCII lower alpha.

Data structures section

Distinct from data types, I think.

In all cases we want clear instructions and examples around the verbiage for adding to/removing from/looking up in the collection.

Known used types:

Map (see module map)
List
- Ordered
- Can be indexed into, maybe with some notation
- Maybe re-use ES's notation for "list literals"? Or not; we don't so far.
- Easy conversion to/from Web IDL sequences, as explained in Web IDL somewhat informally.
Set (see ... CustomElementsRegistry? Not sure, that's kind of a map with lots of keys)
- Can also be ordered (see DOMTokenList); default to insertion order
- In an ordered set, does adding something that already exists replace, or does it remove and append at the end?

Editorial issues I noticed that I need to write a PR for

At least one "must" in a note.

List shouldn't use contents, but just refer to items consistently.

Define increment

Define increment as:

Set |i| to |i| + 1;

So you can say:

[=Increment=] |i|.

or alternatively:

[=Increment=] |i| by 1.

"May" in a note

implementations may optimize based on the fact that the order is not observable.

Define pairs

In #79 @mikewest brought up pairs. I think we should consider defining them as a special case of tuples (fixed size of two) with their own / syntax.

I also think the <dfn> convention he mentions there is expected, but I'm not sure how to put <dfn> conventions into prose.

Suggestions for the typography section

First, maybe separate out block-level styles (definition, requirement, explanation through CSS fragment; maybe also switches) from inline styles (defining instance through variables).

For inline styles, in general all of these would benefit from examples. Maybe multiple constructs per example.

This one I'm less sure about... But I think phrasing like

Other code fragments are marked up like this.

is a bit less good than

Other code fragments are marked up in monospace

with an example showing the actual usage. Otherwise it's kind of like the infrastructure standard is violating itself, by using the monospace style for things that are not actually code fragments :P

byte sequence backtick representation handling of C0 controls

https://infra.spec.whatwg.org/#byte-sequences

In this section is the text:

Byte sequences with bytes in the range 0x00 to 0x7F, inclusive, can alternately be written as a string, but using backticks instead of quotation marks, to avoid confusion with an actual string.

This is intended for showing ASCII byte sequences as strings, but ignores that control characters such as NUL, escape, newline, etc. are not printable or would mess up the display (or show as tofu boxes and cannot be discerned). I'd suggest making the range go from 0x20 to 0x7F instead.

Is the backtick byte sequence representation really that useful anyway?

Move algorithms into its own top-level section

I think there's enough there to warrant that now.

Record-like data structure

I'd like URL record, request, and response to just be some data structure so you can more easily address their members.

They're basically maps with fixed keys or what JavaScript calls records. The values are mostly mutable still, but thus far they don't have things similar to methods.

Replace for lists

This is a thing DOM does. Reasonable?

https://dom.spec.whatwg.org/#concept-element-attributes-replace step 4.

Describe the switch construct

<dl class=switch>

Add "context object"

I think we should move https://dom.spec.whatwg.org/#context-object to Infra. But consider renaming it at the same time, as has been suggested somewhere, "this" is probably less confusing than "context". HTML uses "this element" etc (without cross-referencing) in some places.

Tuples

We use tuples in a couple places and they're very much like immutable ordered sets. The syntax is typically (element1, element2).

Tracker for things to move here

https://html.spec.whatwg.org/#encoding-terminology
- code unit, character?, Unicode character?, code-unit length
https://html.spec.whatwg.org/#case-sensitivity-and-string-comparison
- case-sensitive comparison, prefix match
- defaulting of string comparisons to case-sensitive
https://html.spec.whatwg.org/#common-parser-idioms
- many things already done but under different names; HTML will need updating
- White_Space characters, control characters, uppercase/lowercase hex digits
- Algorithms like "collect a sequence of characters" and friends
- Update: everything moved except White_Space and control characters
https://html.spec.whatwg.org/#numbers
- Maybe only the definitions, not the parsing algorithms? Since the parsing algorithms seem kind of HTML specific?
https://html.spec.whatwg.org/#dates-and-times ??? wait for a second consumer?
https://html.spec.whatwg.org/#colours honestly this feels like it should go in some CSS spec?
https://html.spec.whatwg.org/#space-separated-tokens (concepts left in HTML, parsing moved here)
https://html.spec.whatwg.org/#comma-separated-tokens (concepts left in HTML, parsing moved here)
https://dom.spec.whatwg.org/#ordered-sets
- Has redundancies with a few other things
https://html.spec.whatwg.org/#namespaces
- shared between HTML and DOM it seems
MIME type stuff at https://html.spec.whatwg.org/#resources
- Moved to MIMESNIFF
https://html.spec.whatwg.org/#terminology
- Definition of "or"