w3c / dpub-pwp Goto Github PK

Repository of the W3C DPUB IG on the (Packaged) Web Publications work

HTML 97.92% CSS 1.03% JavaScript 1.04%

publishing digital-publishing web-publications

dpub-pwp's Introduction

The Digital Publishing Interest Group, that managed this repository, is now closed, and so is this repository. Activities in the group has been taken up by:

All these groups are part of the Publishing@W3C, born out of the merger of IDPF and W3C.

– Ivan Herman ([email protected])

Portable Web Publications for the Open Web Platform

Documents produced by the Digital Publishing Interest Group of the W3C.

See also the paged view for the Editors' Draft served in HTML. Publishing snapshots are also available in paged view:

FPWD

If you are member of the interest group, and you wish to contribute to the content of this repo, please contact Ivan Herman ([email protected]), giving them your github login.

dpub-pwp's People

Contributors

Stargazers

Watchers

Forkers

dauwhe lordzen kleopatra999 bjdmeest alexxnica jsit isabella232

dpub-pwp's Issues

Resources in a PWP need original URLs

When a PWP is created from a WP, it is necessary to record original URLs of all resources in the WP. Without such original URLs, it will become impossible to reconstruct the original WP from the PWP. See What is a Web Publication?.

Value of quote in the abstract

Is there really any value to the first paragraph of the quoted text in the abstract? (i.e., are people really unclear what a book is?) I'd suggest keep it short and sweet and only include the more apt second paragraph.?

(Admin: this is, originally, part of set of editorial issues raised by @mattgarrish in #28. Turned into a separate issue; all the other issues have been covered by a separate PR.

Change the terminology to "Portable Web Publication"

Should the term "Document" be changed to "Publication". (See, eg, Mail of Bill Kasdorf)

Identifiers and Scope/Context as defined in Web Annotation Data Model

The Web Annotation Data Model (Working Draft) currently provides a means for talking about (as an annotation target) a resource in the context of another resource, e.g., annotating an image in the context of a Web page. See: http://www.w3.org/TR/2015/WD-annotation-model-20151015/#scope-of-a-resource. In this approach, when describing the annotation, the URL of the resource being targeted (e.g., the image) appears as the value of one property and the URL of the containing resource (e.g., the Web page) appears as the value of a different property. They are connected using the annotation SpecificResource structure. (A manifest might be a way of connecting these 2 properties in a publication.) The 2 resources need not be retrievable from the same site, and de-referencing is done in the usual way, no special mechanism need be provided to parse the identifiers used. Not clear this use case is relevant to the identifier issues raised for publications that 'contain' many components at various locations, but may possibly be of interest in thinking about this issue. In particular this makes it easy to recognize when 2 annotations (or potentially 2 publications) target (or contain) the same resource. As Ivan said on today's call it would potentially be a more attractive approach if the Annotation SpecificResource approach were generalized in a way that would allow it to be expressed as a fragment identifier, although this may be a bridge too far.

Mitigate the web packaging section

Brady's remark at the F2F: if the reader reopens the book (package) at a later stage but on another machine, and this fact is somehow synchronized, streaming may not help. Ie, the requirement for streaming may not be that strong for PWP-s. This takes away one of the main reasons to go for Web Packaging, even if it is taken up by the Web Platform WG. Worth noting in the document.

Change SW part to use PWP Processors

The section should talk about abstract PWP processor, with SW being one of the possible implementations. Thus making the text more "independent" of implementations.

Is it necessary to have the concept of manifest combination?

The current document uses the concept of manifest combination. However, the algorithm is complex, and requires the definition of how to combine each manifest item. This may become very complex. Also, it goes beyond what Web manifests do and may jeopardize reusing the work elsewhere.

Spec redefines or possibly conflicts with fetch

There seems to be a lot overlap with fetch. Fetch defines how HTTP(S) resources are obtained in the platform. Please avoid redefining how client and servers interact over HTTP or you risk introducing security issues.

Paragraph on the broader Web Platform?

The current document includes a paragraph (in section 2) saying

The broader Web Platform can also be considered to be at a tipping point…

It may not be appropriate in this document to make general statements on the Web platform, even if it is relevant to DPUB. It may also start unnecessary discussions (unnecessary for the topic at hand, that is) which may side-track the necessary discussions around PWP proper.

I would propose to replace this paragraph by a text emphasizing the advantages that the traditional Web may gain by bringing the publishing community closer (some of these texts were around in earlier versions of this document, but part of the use cases that are now to be removed).

paged view does not work

Leads to 404 error!

Differentiated Usage of Manifest and Manifestation

Polymorphism in the terms "manifest" (in a computing context) and "manifestation" (in a FRBR context).

Confusion is worse when a word like "manifested" is used. This section has at least declared its context:

Section 1.1: "A Web Publication is not just a collection of links— the act of publishing involves obtaining resources and organizing them into a publication, which must be “manifested” (in the FRBR [frbr] sense) by having the resources available on a Web server."

It would be better yet to define both terms, and to remove any variations, as with stemming.

It would be helpful for readability to add a note like this:

The term "manifestation" is used in this section as defined in FRBR; The term "manifest" is used throughout this specification to mean a resource (a file) containing a list of references to other resources, in the manner of a shipping list, or a list of things that needed to be downloaded for offline use.

This issue applies equally to https://www.w3.org/TR/pwp-ucr/ which uses two instances of the word "manifestation".

Minor remarks

Some minor (quite personal and optional) remarks about the current document:

introduction
- fig 1: It seems as if the figure states being packaged === being offline, maybe some rewording within the figure would prevent this?
states:
- packed/unpacked dfns: I wouldn't mention the protocols here, to emphasize on 'combined into one single unit' vs 'can be directly accessed individually'
locator-dfn:
- canonical locator: remove sentence 'This is purely conceptual, i.e., the PWP does not have to be published unpacked online.'.
The last note of 4.2.1 is almost identical to the next-to-last note of 4.3, maybe they can be merged into one?

Access of local files from the service worker

As discussed at the Sapporo F2F some additional information/note should be added to the PWP document.

/Cc @danielweck

Add more about manifests

Manifests may be a requirement for PWP-s to describe the content of the package (ie, the resources it contains) as well as a possible remapping of URL-s. Maybe worth its separate section?

Planing to CSS Fragmentation Module Level (css-break)

As https://www.w3.org/TR/css-break/ is coming (!), the future PWP and EPUB (v3.3?) releases can say something about it. Example: printable EPUB and HTML-css-break recomendations in "paged media".

Spec doesn't justify its raison d'être

A naive reading of the spec doesn't really give a true raison d'être for its existence. The spec is a little hand-wavy about required resources and fonts, etc. but it doesn't prove that current web technologies don't already do everything described.

What would be great would be more clarity about that. That is, a really clear, technical proof, that "today, the Web cannot do books: _why this spec is really needed here." It seems like DRM would be the only thing (?), as I'm having a hard time thinking of what can't be built today using Web tech with regards to "books" on the Web - and hypermedia in general, which the Web is pretty good at.

Let's frame this differently: let's say you came to Mozilla, Google, or Apple and asked them to implement the spec. What would you want the browser to do differently and why? And how would that be different to "web apps"?

As a web developer, I can already do offline with Service Workers, etc. The fonts issue can also be handled through the fonts API and through the cache API, fetching is handled by the fetch spec. And so on... and the merging of data is trivial too (e.g., Object.assing({}, JSON.parse(a), JSON.parse(b)))). so, it would be great to identify what the actual gaps are that the spec is trying to standardize.

It might be that the Web provides all of what is needed already? It be really cool to do a rundown and see what is missing.

Must web publications be available offline?

Opening this as an issue to have a place for discussion. The current WP draft states:

A Web Publication must be available and functional while the user is offline. A user should, as much as possible, have a seamless experience of interacting with a Web Publication regardless of their network connection. We make no distinction between online and offline when defining web publications.

I don't think we should change must to should in the first sentence. To me, a fundamental aspect of a publication is that it has some sort of permanent existence—it doesn't just go away depending on circumstances. Trying to retain that sense of permanence in a digital world is important.

There's also significant concern in the ebook community (voiced, for example, by Fran Toolan) that "the web" doesn't care about offline, and that the IDPF+W3C combination means that offline use cases will be given less weight.

Adapt state' terminology to the rest of the text

The commit on 2015.09.30 has the proposed state definitions, but the rest of the text is not yet adapted to it. This is mainly relevant for section 4.

Various minor nits

Not sure if this document is ready for review, but I thought I'd at least mention a few non-technical items I noticed while skimming it:

Wouldn't it make sense to group sections 1, 2 and 3 under a common heading, like "Introduction"?
"Some Technical Challenges" - "Some" is unnecessary.
"Profiling PWP-s?" - Drop the question mark.
"EPUB4 as PWP" - I don't understand the heading given that there is no epub 4. Perhaps something like "Evolutionary Objective" would be more appropriate, since it talks about being a compatible evolution of epub 3.
Headings drift from title case to sentence case in places
Is there really any value to the first paragraph of the quoted text in the abstract? (i.e., are people really unclear what a book is?) I'd suggest keep it short and sweet and only include the more apt second paragraph.
In 1.1, there's 'webpages' and 'websites', but elsewhere 'Web page' and 'Web site'.
I can't recall seeing acronyms pluralized with hyphens before (API-s, WP-s and PWP-s), but maybe that's just me.

Why "(Packaged)" in the title?

It seems a bit cumbersome to be explaining the states a web publication might be in in the title.

I get that the parentheses mean a web publication might sometimes be packaged, but that's only because I already know what this work is about. If I didn't, I'd probably start to wonder if that was a placeholder or something (which I admit I sort of did having been overwhelmed with epub for the last half year).

Why not just focus on the product being created -- the web publication -- and leave the packaging/portability of that document for the body? Not to go all marketing on you, but it's a much simpler and more powerful name without the parentheses.

If you absolutely need to explain as early as possible that the publication can be packaged, too, a subtitle might be better.

And just to really nitpick, does "for the Open Web Platform" really add anything, since they're called "Web" publications?

Locators and Packages

There's been email discussion of related issues, but it seems worthy of a GitHub issue. Section 4.2.2 gives us numerous locators for unpackaged, packaged, and "canonical" versions of a publication and constituent resources:

unpackaged
https://example.org/books/1/
https://example.org/books/1/img/mona_lisa.jpg
packaged
https://example.org/packed-books/1/package.zip
canonical
https://example.org/published-books/1
https://example.org/published-books/1/img/mona_lisa.jpg

What's not clear to me is, do we need locators for resources inside a packaged publication? To borrow Daniel Weck's old ! notation, do we need to have something like this?:

https://example.org/packed-books/1/package.zip!/img/mona_lisa.jpg

Another question is why do we need a separate "canonical" locator? What's the advantage of https://example.org/published-books/1/img/mona_lisa.jpg over https://example.org/books/1/img/mona_lisa.jpg?

In all of the above cases, my manifest would list the image as having an href of img/mona_lisa.jpg. This has got to be one of the best use cases for relative URLs ever! Do we really need hrefsrc?

{ "href": "img/mona_lisa.jpg", "hrefsrc": "https://example.org/books/1/img/mona_lisa.jpg" }

Both the web and EPUB don't seem to need this...

Should we remove the reference to file URI-s from the definitions?

It seems that resources on the local file systems are not properly handled through service workers, and some "tricks" should be used to get to those. Is it necessary to refer to local file system in the text at all? It may make the definitions clearer if only URL-s were used.

Change the section on identification

the current formulation does not reflect the unified view provided by service workers
remove the reference to Robust Anchoring, that document does not really exist any more
add some ideas stemming from the presentation at TPAC
introduce the idea of having a unique id and a separate locator; in general, bring the document in line with the TPAC 2015 discussion.