w3c / webmention Goto Github PK

View Code? Open in Web Editor NEW

111.0 40.0 46.0 544 KB

Webmention spec

Home Page: https://www.w3.org/TR/webmention/

HTML 91.06% PHP 8.75% CSS 0.06% JavaScript 0.04% Hack 0.09%

webmention's Introduction

Webmention

This repo contains the latest editor's draft of the Webmention specification.

Implementation reports are collected in the Implementation Reports folder.

Webmention issues

webmention's People

Contributors

Stargazers

Watchers

webmention's Issues

Add Robustness or Clarification wrt Security and Verification

Right now I am concerned. If the webmention is a public endpoint that is unauthenticated, then the computation it performs should be bounded. That is, you should know when you get a webmention that the effort it would take to verify will be reasonably deterministic. If you have to perform a GET, and then you have to parse, then it seems to be unbounded (unless the size of the source document is capped. etc etc.)

So, webmentions can come from third parties (site C) to tell site A about a link from site B. But, a malicious third party can find some large source page on B and send a bunch of webmentions to A, causing a lot of wasted effort on A (and potentially bandwidth on B) Meaning, depending on how B mitigates: B could ban/block A. B could run out of bandwidth or be hit financially. Site A needs to block third-party C (but C can be any server anywhere, so it has so few burdens) Site A could run out of cpu resources or bandwidth. Site A could not be able to reasonably schedule other webmention verifications among other tasks it needs to do because it doesn't know how much effort a webmention takes. Site A could be hit financially. You can certainly nitpick these, but only one has to be reasonably true.

For verification to be reasonably bounded, you'd sign the message with site A or have some specified way of bounding the GET that is made and refusing to do anything else. That would require A to more actively participate in the protocol, but would limit the amount of data B needs to pull from A. There are already solutions found in other similar protocols. Salmon, for instance, signs the message and therefore assumes that only the origin requires verification, not the semantics of the message.

At any rate, the simplicity of the protocol makes it very fragile. Therefore, some discussion on the implications of verification and how to avoid such pitfalls would be beneficial. Such as: rejecting documents that are too large to verify, limiting redirects, checking for and refusing streamed data, caching popular documents.

(I think, as an implementer, I wouldn't recommend such a potentially open protocol be added to one's software. If it had easy reasonably bounded origin verification, then yes, which I would suggest as a MUST. And then I'd recommend that full-text verification be done only very optionally at idle times for servers I don't trust/know with a long set of restrictions such as those I listed. I'm still thinking about it a lot though. Full-text verification seems a bit too game-able to reliably enforce trust over time.)

Why only "410 GONE" for deleted posts?

Why does the spec (and the test suite) not accept "404 NOT FOUND" as well? In my admittedly limited experience, most CMS platforms don't natively support "remembering" old URLs, thus being able to return a 410. I would think a 404 response would be just as meaningful.

Recommend a backoff strategy for discovering webmention endpoints

If no webmention endpoint is discovered for a target, it would be best to avoid attempting to re-discover an endpoint for the domain until some amount of time has passed, to avoid making a bunch of unnecessary requests to the site. We should create some recommendations about when to throttle back trying to discover a webmention endpoint based on the target domain.

Require same Origin

A number of people here have trouble understanding Cross-Origin issues as discussed in issue 10: Lack of context in WebMention. See the discussion there or @kevinmarks comment explaining forms in issue 1.

One answer to this is to require that WebMention links only ever be to the same origin: ie. that people host their own WebMention end point. It could be argued that in that case we do indeed then find ourselves in the same situation pointed out by @kevinmarks with forms since the beginning of the web, and that extra context is not required:

I'd say that nearly 20 years without a spec more formal than that indicates that we may not need one, given how much has been built on it, theoretical drone strikes notwithstanding.

If an answer cannot be found to #10 then this would be a way to reach a minimal consensus.

Examples

Consider the current use case from the current WebMention spec Protocol Summary section.

Imagine that Aron's webmention statement points to the an endpoint on another origin, that understands the source and target attributes in a perfectly reasonable way for that service. Perhaps someone hacked Aaron's web site, or a man in the middle has altered Aaron's response to Barnaby's agent's request and changed the link to point to the other endpoint, or just simply because Aaron is himself intent on some form of mischief.

As a result Barnaby's webmention enabled agent ( this could be a client or server ) would - having published Barnaby's post referring to Aaron's entry - and having retrieved the mischievous webmention endpoint POST a message, send to this actually non webmention endpoint a message that is interpreted by that agent as meaning something completely different by the server. What kind of meaning could that agent give the source and target attributes? Pretty much anything. Here are some ideas:

it could be a service that people use to list porn sites and people linking to them
it could be a service that people use to list blog posts that are sexist and people linking to them
it could be a service that people use to list blog posts that are have copyright infringements and people linking to them
it could be a service that people use to list blog posts that insult the koran and people linking to them
it could be a service that terrorists use to notify a group of the next attack target
it could be an internal secret service department that puts people on a terrorist watch list, and the webmention agent could be built into the client software ( instead of being on the server )

In 6. even though there is no explicit authentication there is an implicit authentication because Barnaby's web agent somehow finds itself inside the firewall. This links this issue to issue #14 "Webmention MUST be done anonymously".

The examples here are examples that could be used by issue #1 "Introduce Property parameters". But if it is easy to imagine a protocol which specifies the relation between link and target, it is also easy to imagine that other people are already using such systems now, without reference to webmention, or that people could develop some inspired by it, or with mischievous intent.

Caution regarding using form encoded data

Given that webmention is strictly based on form-urlencoded data, it'd be important to caution implementers on what the W3C Recommendation for HTML5 says (see note):

https://www.w3.org/TR/html5/forms.html#url-encoded-form-data

Security consideration: Attacker can cause webmention sender to send arbitrary requests to victim

Assumptions:

The victim is a server that processes query string parameters the same way as post body parameters (many web frameworks allow code to access these parameters the same way, so this is actually relatively common, e.g. PHP's $_REQUEST or Ruby (Sinatra)'s params[:foo])
The server performs some action that the attacker, Chuck, normally can't do from his own server (e.g. a form with no authentication that lives behind a firewall, or an endpoint that uses an IP address as authentication)
Alice has a blog that lives on a server that can trigger this action on the victim server
Alice's blog supports Salmentions (e.g. sending a webmention to previous commenters when a new comment is received), or an alternate attack involves socially engineering Alice to write a post that links to Chuck's malicious URL

The attacker, Chuck, creates a reply to one of Alice's blog posts at http://chuck.example.com/attack and sets the webmention endpoint for the page to the victim's URL, which may be behind a corporate firewall. The URL contains query string parameters that are crafted to cause the victim to perform some operation such as creating a new user account, or some other undesirable operation. An example might be an internal user registration form that creates accounts that can be used from outside the firewall. For example, Chuck sets the webmention endpoint to http://victim.internal/register?userid=chuck&password=1234.

Once Chuck has set the webmention endpoint on his page to the internal server, he then sends a webmention to one of Alice's blog posts that is running on a blog inside the firewall at http://alice.internal/post/100. This causes Alice's server to fetch his reply and show it as a comment.

Chuck then sends a follow-up comment which Alice's server accepts. Then, following the Salmention rules, Alice's server sends webmentions to all previous comments to notify them of the new comment.

Alice's server discovers the maliciously crafted webmention endpoint on Chuck's original comment, and sends a post request to it that looks like the following:

POST /register?userid=chuck&password=1234 HTTP/1.1
Host: victim.internal
Content-type: application/x-www-form-encoded

source=http://alice.internal/post/100&target=http://chuck.example.com/attack

The vulnerable server ignores the source and target parameters but processes the query string parameters as if they were post values.

The summary is that the attacker can cause Alice's server to send arbitrary requests to a server.

Is this something worth addressing or acknowledging in the spec?
Are there too many preconditions for this attack to be worth noting?
Is this really any different from getting Alice to visit a page in her browser that has javascript that submits a carefully crafted html form?

webmention verification should specify using a HEAD request

Section 3.2.2 of https://www.w3.org/TR/webmention/#webmention-verification says

If the receiver is going to use the Webmention in some way, (displaying it as a comment on a post, incrementing a "like" counter, notifying the author of a post), then it must perform an HTTP GET request on source, and follow any HTTP redirects (up to a self-imposed limit such as 20) and confirm that it actually links to the target.

The suggestion of using a GET is wrong IMO because it 1) is doing a full resource request when a HEAD request will suffice at this stage and 2) by requiring a GET my implementation really cannot perform a HEAD (for the first reason)

My suggested change would be to say

" If the receiver is going to use the Webmention in some way, (displaying it as a comment on a post, incrementing a "like" counter, notifying the author of a post), then it SHOULD perform an HTTP HEAD request on source, and follow any HTTP redirects (up to a self-imposed limit such as 20) and confirm that it actually links to the target."

Move mention of specific limit of redirects to Security Considerations section

Change:

it must perform an HTTP GET request on source, and follow any HTTP redirects (up to a self-imposed limit such as 20) and confirm that it actually links to the target.

it must perform an HTTP GET request on source, following any HTTP redirects (and SHOULD limit the number of redirects it follows) to confirm that the target URL actually links to the target.

The suggested limit of 20 should be moved to the "Limits on GET requests" section.

Webmention should use JSON-LD

I do not support this issue. I'm just helping organize here, separating it from #39.

In #39 there seem to be several proposals considered:

OPTION 1. Where Webmention currently uses form-encoding, it should instead use JSON-LD. That is, Webmention receivers would be required to understand JSON-LD and not required to understand form-encoding. This has a big problem: it would be a flag day change., which seems impossible in a decentralized system.

OPTION 2. Webmention receivers should be required to understand both JSON-LD and form-encoding. This has a medium-sized problem: it would make every Webmention receiver more complicated, since it would have to understand two syntaxes, not just one.

I leave it to someone who supports this issue to make the case. Hopefully they can do it with a very simple use case that shows how someone would benefit from one of these OPTIONS being adopted.

My suggestions is that if someone wants to use JSON-LD with webmentions, they instead use one of the non-webmention solutions that works with JSON-LD, like Semantic Pingback, Solid Inbox, or ActivityPub.

Develop exit criteria for CR

Discuss handling of fragments in URLs

As noted here: https://tinokremer.nl/2015/benwerd-i-figured-out-what-goes-wrong-with-webmentions-comments
Webmention handlers should strip fragments before checking if the URL is one they handle. Fragments can be used to direct the webmention to a subsection of the destination URL, and if they are fragmentions, a particular phrase.

section 3.2: no character encoding on response examples

[raised on behalf of Addison Phillips, discussed in i18n telecon]

https://www.w3.org/TR/webmention/#receiving-webmentions

In Section3.2 there are examples of responses. They are giving the Content-Type of text-plain with no charset parameter. Please include a charset=UTF-8.

Should there also be a health warning to use UTF-8?

No JSON based social syntax

The WG charter states:

A JSON-based syntax to allow the transfer of social information

Yet I was unable to locate any JSON based examples in the text (tho perhaps I missed something)

Could this be made available for review?

Webmention verification

The requirement to do an HTTP GET on the source and to verify whether it indeed references the target excludes important use cases, for example in web-based scholarly communication. I will explain by means of a very hot topic: linking publications with datasets. Other cases exist.

For publication/dataset linking, the publication (source) would use Webmention to inform the dataset (target) that it is being referenced in the paper. Typically:

Both the publication and the dataset are identified by a DOI, say, respectively http://dx.doi.org/12.34/567 and http://dx.doi.org/76.54/321. These are the URIs one would be inclined to use as source and target.
When dereferencing these URIs and following all redirects, one ends up on a so-called landing page that provides an abstract of the DOI-identified resource. The actual content is "somehow" linked from that page.
In many cases, the publication (source) is a PDF that sits behind a paywall.

It would be very hard (or even impossible) to perform "Webmention verification" as described in the spec because:

The receiver would have to determine where the actual content (eg PDF file) can be found when ending up on the landing page
The receiver may not be able to access the actual content (eg PDF file) because of the paywall
If the receiver would be able to penetrate the paywall, it would have to parse the PDF file (not impossible, but hey ...)

Even if one were to use the URIs of the actual content (PDF file, dataset) instead of the DOIs as source/target URIs, two of the above problems would remain.

I very much understand that this problem is to a large extent related to the fact that web-based scholarly communication does not necessarily operate in a manner that aligns very well with the way other pockets of the web do. Then again, I assume paywalls and landing pages exist beyond scholarly communication. And, most importantly, I would love if webmention could be used in scholarly communication, see eg slides 45-52 of [http://www.slideshare.net/hvdsomp/reminiscing-about-interoperability].

Hence a suggestion to consider an additional aspect regarding "Webmention verification", which could be along these lines "if the receiver has a trust relationship with the sender, verification is optional".

Cheers

Herbert Van de Sompel
Los Alamos National Laboratory

No document at http://www.w3.org/ns/webmention

There is no document at :

http://www.w3.org/ns/webmention#

Describing the key terms used in webmention

Is there something intended to be there, or a pointer to draft text. That would make it easier to review.

Webmention is not correctly defined in the rel registry (from #3)

The IANA registry is not used for HTML rels.
The existing-rels registry has been updated with the current specification link.

Webmention MUST be done anonymously

One way proposed by @sandhawke and others to reduce the risks of situations described in issue 10: Lack of context in WebMention is to force WebMentions to be anonymous. This way someone POSTing a form can never be taken to be responsible for doing so. This seems to be the argument given by @sandhawke in a comment to that post

It's a POST done without any credentials, so some would say it can't do anything bad, but I think bblfish is imagining that maybe sometimes doing I POST even without credentials could be taken as a commitment. I think there may be occasional poorly designed systems where that's true.

If it is only a problem with poorly designed systems, then they can be blamed for it. On the other hand systems in which the client has authenticated would be well designed. So for those the argument would go through.

Therefore it seems that WebMention cannot allow client to authenticate.

Should verification be a MUST?

The current spec says:

The receiver SHOULD perform a HTTP GET request on source to confirm that it actually links to target (note that the receiver will need to check the Content-type of the entity returned by source to make sure it is a textual response).

Making this a MUST seems a clearer statement of intent for the protocol. Posting unverified links is the Trackback problem.

Definition of 'link'

Abstract:

Webmention is a simple way to notify any URL when you link to it on your site. From the receiver's perspective, it's a way to request notifications when other sites link to it.

Intro:

At a basic level, a Webmention is a notification that one URL links to another.

'link' is debatable given that a plain text file with a URL in it is a valid webmention source. Something like:

"Webmention is a simple way to notify any URL that it appears on your site. From the receiver's perspective, it's a way to request notifications when others sites refer to it."

"At a basic level, a Webmention is a notification that one URL appears at another."

"protocol summary" section is unclear

Section 1.2 of the current draft is called "Protocol Summary" and does seem to describe some sort of protocol, but there is no introduction or context. Shouldn't there be one?

Add conformance requirements section

Section 3.2: no language support

[raised by Addison Phillips, discussed in i18n telecon]

https://www.w3.org/TR/webmention/#receiving-webmentions

Section 3.2 say (in part):

The response body MAY contain content, in which case a human-readable response is recommended.

There is no mention of language negotiation or language identification here. The assumption appears to be that a wad of English is returned? ;-)

The example could include a Content-Language header or might allow for other language identification in the body (complicated)

This is also applicable to at least 3.2.3 Error Responses as well.

If the Webmention was not successful because of something the sender did, it MUST return a 400 Bad Request status code and MAY include a description of the error in the response body.

section 3.1.2 charset

[raised by Addison Phillips, discussed in i18n telecon]

https://www.w3.org/TR/webmention/#sender-notifies-receiver

Section 3.1.2 describes the submission of the source and target URLs in the x-www-form-urlencoded format. There is no mention of character encoding, which normally is an important concern for this format. However, since the strings in question are URLs, they are presumably already "URL encoded" using the character encoding recognized by the host server. I'm raising this issue to point out that the charset issue works in this case. However, if fields were added to the payload in a future revision, the charset might become important.

It is unclear how to convert "source" and "target" parameters to URIs

In section 5 of the protocol

Barnaby's server sends a webmention to Aaron's post's webmention endpoint with

source set to Barnaby's post's permalink
target set to Aaron's post's permalink.

Source and target are form encoded parameter. While this makes sense in that context it suffers from a couple of weaknesses.

Similar semantics may be wished to be sent with another mime type (e.g. using those developed by the social web WG, such as JSON based or other w3c rec)
In other scenarios "source" and "target" as simple strings my be subject to name clashes and collisions

In order to make this more scalable it is advantageous to be able to systematically convert those parameters into URIs. This information could be gained either in a generic way that applies to all form encoded variables (tho none exist today), or described in the spec.

It is undesirable for software to do this ad hoc, as different decisions might be made by different code bases leading to interoperability issues.

Suggestions for this:

use the pingback vocab ( http://www.w3.org/wiki/Pingback#Semantic_Pingback )
use a URN
create a new vocab, e.g. at w3.org/ns or w3id.org
other

If there's no preference here, I'd suggest using the pingback vocab as it was one of the cited motivations for webmention.

A few words in the text of the spec could make clear how software providers could use webmention in the form context and also with linked data based systems.

Webmention message is not clearly mapped to linked data

The webmention spec lack interoperability with linked data.

What is needed is an explicit translation of

source=https://waterpigs.example/post-by-barnaby&
target=https://aaronpk.example/post-by-aaron

Into a format that can be implemented by those working with web standards such as JSON LD, Turtle, AS2 etc.

While the spec clarifies the namespace that could be used in the predicate position. The mapping remains unclear for implementors and needs to be stated explicitly.

It is clear that webmention alone suffers from the "webmention spam" problem leading to possible DDoS attacks. The argument I have heard is that while webmention is small, it's not a problem, but if it becomes a W3C REC extensibility will become critical. The samlention and vouch system essentially are replicating work done elsewhere in the W3C such as digital signatures and verifiable claims. Rather than replicating this work twice, or forking the working group, I would strongly suggest aligning the work now by adding an example to the spec to explicitly show how implementors can send source and target to a server using linked data (JSON-LD might be a good fit).

But this mapping is currently under specified. I suggest using this issue to come up with the closest mapping possible.

Should be source syntax agnostic

I thought the webmention spec was supposed to be agnostic of the markup used in the source. As such, a MUST on microformats2 parsing in 2.2.4 seems inappropriate.

Even though supporting updating mentions is MAY, this implies that if you are going to support it, you have to support mf2, which may deter people who are interested in other kinds of semantic markup, or indeed syntaxes other than html, from supporting updating at all. If someone wants to support updates, shouldn't it be left up to them to decide how they're going to get the updated data? (Which would be consistent with other similar decisions left up to implementers elsewhere in the spec).

DDoS prevention

It occurs to me that all your DDoS concerns that are creeping into the protocol are the same as email has been dealing with for years. In email, there are two main ways:

a trusted third party: SPF puts into DNS records that say "mail from domain example.com is allowed to be sent by [ip1, ip2, ip3, ...]"
signing: DKIM puts into DNS a public key, and a header is stuffed into email containing a signature on various pieces of the headers and content

(http://www.openspf.org/Related_Solutions)

For webmentions, you could spec that:

the receiver asks if sender has allowed the sender IP, either through DNS (which means TXT records because A records are unreliable as a listing of every IP) or through https://sender.com/webmention.txt
2, sender makes a keypair and signs the notification with it. The public key can again either be put into DNS, or at https://sender.com/webmention.gpg
(notice that https:// is used there: TLS authenticates sender.com and therefore receiver knows these values are trustworthy and they're not being MITM'd)

Suggestion: simplifying notification

Webmentions replaced XMLRPC with something RESTish, but they aren't fully RESTful because they still use indirection, by sending encoded sender=...&receiver=....

It would drastically cull the number of errors and be much more web happy if the protocol instead was:

Notification: sender GETs receiver, with X-Webmention: sender
Finding the mention: receiver GETs sender, parses microformats searching for any with a link to receiver

This mixes the GET/POST pair that sender does into a single GET, which will look totally normal to servers that don't understand, without requiring any HTML code changes. As more people come online to webmentions, servers can progressively start to adopt them, but until then nothing untoward happens if a notification is sent when no one is listening. The current spec does a sorta preflight check by asking the receiver if and where its webmentions should go.

Another idea is simply to use the Referer header in step 1, which I think is something at least some blog engines used to do for finding trackbacks: blogs would ride on the back of their surfers for tracing out the web. But by just making the server pretend to be a surfer, you get immediate push notification.

I guess that having a separate webmention endpoint made deployment seem easier, especially to something like Wordpress. But it should be just as simple to write some middleware that catches X-Webmention headers. The longest part is what to do with the mention once you've got it, anyway.

Remove Salmentions?

Aaron and I noticed a problem with the normative references. There's a W3C process rule that specs aren't allowed to normatively refer to unstable technologies, because then conformance could change after-the-fact.

But aside from the process issue, which we could address other ways, I think the Salmention clause in Webmention isn't a good idea. The spec says:

A Webmention update implementation MAY support updating data from children, or other descendant objects of the primary object (e.g. a comment h-entry inside the h-entry of the page). If an implementation does support this, it MUST support it according to the [Salmention] extension specification (AKA a "Salmention implementation").

First off, I find that text very hard to understand. What are "children" and "descendant objects"? Those terms aren't defined in the spec, as far as I can tell, and mean nothing to me. Do I need to understand what they mean? By using "MUST" this text says I do. Are there test cases for this?

I do understand Salmention, I think. It's the practice that when your server receives a webmention and incorporates the content that mentions you into your own content, you should issue your webmentions again, so that "upstream" sites (things you point to) can see the "downstream" content (sites that point to you). But isn't that upsteam relaying is already implied by:

If the source URL was updated, the sender SHOULD re-send any previously sent Webmentions, (including re-sending a Webmention to a URL that may have been removed from the document), and SHOULD send Webmentions for any new links that appear at the URL.

... although maybe I don't understand that correctly. I think it means "If the source Resource was updated".

So, in effect, by following that SHOULD you're doing Salmentions without even knowing it, if you happen to include into your content any content that mentions you (with suitable provenance).

In short, can we remove the Salmentions text? And if not, can someone rephrase it in a way that makes quite plain the entirety of what I need to know in doing a complete and conformant Webmention implementation?

W3C Social Pingback

webmention is cool and I like it.

However, I think we need more to really make progress on the "Federation Protocol" deliverable mentioned in the charter.

Other than webmention, LDP is the other protocol mentioned in the charter as a 'possible input' to that deliverable, but it doesn't help prescribe much that helps with webmentions or pingbacks (that I can tell).

@melvincarvalho points out the similarity between webmention and something many of us have worked with for over a decade

Webmention has been implemented in linked data in the form of "Semantic Pingback" http://www.w3.org/wiki/Pingback#Semantic_Pingback

Notably, (AFAICT) the W3C has never produced a Pingback standard. The closest thing to a standard I can find is "Pingback 1.0" hosted on hixie.ch (Ian Hickson, WHATWG).

hixie is rad, and an editor of HTML5. But link rel pingback is not specified as part of HTML5. http://www.w3.org/TR/html5/links.html#linkTypes

So Pingback is not a standard we have to adhere to.

But I think there are some important questions to answer if we want the "Federation Protocol" to be a CR, and not just a Note: Why throw away 13 years of implementations and start from scratch? Can we salvage Pingback 1.0 into a W3C specification that is similar, but also modern enough to drive adoption of web federation for at least another 13 years (perhaps with a logical major version bump, '2.0')? The answer is not obvious.

The following two sections/lists are meant as starting points. Please suggest additions

Update. I have strikken through points where I was wrong, which were numerous..

Here are things that Pingback deals with that Webmention currently doesn't

~~Discovery via HTTP header, not just HTML~~ (As @aaronpk points out, webmention does have this)
- This is nice because clients can discover via HEAD requests without having to download/parse HTML
- Response headers can sometimes be hard for some publishers to implement. well-known URIs and/or webfinger aim to help with this.

Here are things that I personally don't like about the idea of implementing Pingback 1.0 in this era of the web

"Pingback-enabled HTML and XHTML pages MUST be valid." seems unneccessarily strict, especially considering the regex mentioned in the service discovery section
XML-RPC Interface is the only ping mechanism described, which is a MUST for conformance
- "To claim conformance to this specification a pingback server MUST be able to receive pingback XML-RPC calls"
- Many young web developers have never worked with XML-RPC. Some never even XML. Instead, many developers are more comfortable with these mimetypes:
  - application/x-www-form-urlencoded - Like webmention today
  - LD Syntaxes - As seen in the AS2 and Web Annotation communities, it's possible that all these can be covered with only a syntax-agnostic vocabulary (perhaps >= "http://dssn.org/pingback/ns/namespace.html")
    - application/ld+json - http://www.w3.org/TR/json-ld/
    - application/activity+json - http://www.w3.org/TR/activitystreams-core/
    - text/turtle - http://www.w3.org/TR/turtle/
  - application/json - https://tools.ietf.org/html/rfc7159
    - This could be covered with a recommended way of framing the JSON-LD vocabulary with standard key strings, e.g. just { "source": "url", "target": "url" }. The Activity Vocabulary is doing this nicely.
- If I had to pick one of these to be a MUST for conformance, I'd pick application/x-www-form-urlencoded just like Webmention does today.

I expect this to stoke some strong opinions, but I believe it illustrates the possibility of a Middle Way to a Federated Social Web that doesn't ~~ignore past efforts~~ (webmention didn't ignore it) abandon existing pingback markup, and also considers the current uses and extensibility benefits of linked data.

If this is well-received, I could try to draft something that addresses these issues while attempting to be very close to a superset of Webmention and Pingback 1.0.

Why not allow multiple webmention endpoints?

Currently the spec says:

The webmention endpoint is advertised in the HTTP Link header or a <link> or <a> element with rel="webmention" . If more than one of these is present, the HTTP Link header takes precedence, followed by the <link> element, and finally the <a> element. Clients MUST support all three options and fall back in this order.

However it does not address what should be done if multiple endpoints are indicated for a page using one of these techniques.

I'd suggest that in this case, sending a webmention to each one found makes sense. As webmention does not define what endpoints should do, it is clear that there could be different webmention triggered services - one that creates useful comment threading, and one that caches linked-from pages, for example.

In addition, if you are migrating from one webmention service to another, being able to ping both in parallel is best practice to ensure consistency.

Allow 404 to trigger deletion as well as 410

In the Updating section, 410 is explicitly called out as triggering a deletion. It seems like many situations would result in a 404 to trigger deletion as well, such as the domain name being sold and reused, the content management system changing and becoming unaware of all previous URLs, and so forth.

Equally, simply don't refer to HTTP status codes, as one assumes that the 410 would not mention the target, and hence fall under the second clause that triggers deletion 😸

Register "webmention" with IANA

seeing that http://www.iana.org/assignments/link-relations/link-relations.xhtml currently does not list "webmention" as a registered link relation type, i am proposing to add a section to the draft that registers "webmention" in the IANA registry according to RFC 5988. i'd be more than happy to contribute such a section. this should make it easier for people discovering "webmention" links on the web to find the authoritative document specifying the link relation type.

Support whole-domain delegation

link: https://hypothes.is/a/yHecWAE6SOewee7LbZheEg

Add a fourth discovery layer of whole-domain delegation via .well-known.
http://indiewebcamp.com/irc/2015-11-29#t1448856141695

(this could also just be documenting how to use webfinger for this)

This lets one domain delegate all its webmentioning to another provider without having to adjust server code to add new response headers or HTML

Should webmention ping data support extensibility via something like LD namespaces?

Started on #1, but that issue is specifically to introduce a new parameter in the currently specced application/x-www-form-urlencoded request body content-type.

Webmention has been implemented in linked data in the form of "Semantic Pingback" http://www.w3.org/wiki/Pingback#Semantic_Pingback

Semantic Pingback is worth being familiar with in the webmention discussion. However, that document does not use normative vocabulary, and thus I don't think it is fit as-is.

@melvincarvalho :

I'd suggest to use namespaces for source, target etc. from the pingback vocab if wishing to implement this in the linked data world. Alternatively putting something under w3.org/ns could work.

Which may or may not be a good idea. Regardless, I've never seen LD namespaces in a application/x-www-form-urlencoded string. So perhaps it is a good idea to standardize webmention pings in other Content-Types for use cases where more than just a 'source' and 'target' are known.

Iff webmention does standardize LD-friendly Content-Type ping bodies, I agree that it's a good idea to re-use (or explicitly equivalent) the pingback namespace (esp as it already defines source and target as in current webmention draft).

It seems to me that all the debates this morning about 'namespaces' really imply a request for further discussion on Content-Types.

Move everything from Testing down to a Note

The content that follows "How to test webmentions" is implementation rather than specification focused, and would be great to keep, just in a Note or other mechanism than a spec :)

Consider as:Activity as payload

Being able to transfer an Activity description (to be embedded in an ActivityStream, for example) would be great for integration. As AS has its own content-type, there would be no confusion as to what the payload consists of.

As an strawperson proposal, systems MUST support www-form-encoded, and SHOULD support AS?

This gives an extension point in the future if there are further syntaxes that can also carry the same information (perhaps with additional structure that's impossible in form-encoded), while still establishing form-encoded as the lingua franca for the 99% simple cases.

Location of returned webmention URL

The webmention endpoint will validate and process the request, and return an HTTP status code. Most often, 202 Accepted will be returned, indicating that the request is queued and being processed asynchronously to prevent DoS attacks. The response body SHOULD include a URL that can be used to monitor the status of the request.

-> The Location header SHOULD include a URL that can be used to monitor the status of the request, response body MAY contain this URL?

Introduce the property parameter

Document and incorporate the idea of using properties as a parameter to better employ webmentions.

This was already documented and mentioned here:

http://csarven.ca/webmention

in which the IWC community is well-aware of (let me know if citations are needed). And, here on the Social Web WG mailing list:

https://lists.w3.org/Archives/Public/public-socialweb/2015Jul/0092.html

and several times on IRC #social . Let me know if citations are needed on that.

If this repository is strictly about IWC's webmention, then I propose renaming it.

Add note about limiting the number of redirects to follow

When checking if target is a redirect, there is potentially no end to the 301 redirect chain. Browsers have a limit where they'll stop following redirects after N. While specifying N is not a good idea, the spec should at least have a note about following redirects up to a chosen limit, and possibly recommending something sane such as the default that browsers follow.

Senders and Verifiers should respect robots.txt

If an alleged webmention endpoint is blocked by robots.txt on that domain, senders should not send to it.
If a receiver is about to verify by requesting some endpoint, and that endpoint is blocked by robots.txt, the receiver should not request it.
http://www.robotstxt.org/robotstxt.html

Is this a good idea?

Sending webmentions - discovery - lacks interoperability

Two methods of discovery are listed in section 3.1.1

The sender MUST fetch the target URL (and follow redirects [FETCH]) and check for an HTTP Link header [RFC5988] with a rel value of webmention, or an HTML or element with a rel value of webmention.

Namely

Link Header (MUST)
HTML or tag

Not all implementations on the social web are based on HTML. Other serializations need to be handled in order to achieve interoperability with existing standards.

Suggestions

Move the link header to a MAY
Drop mention of HTML to an example, but reword the text to say that the serialization MUST link to the webmention endpoint in the body of a document, in a standards compliant way

I much prefer this link to be in the body, as not every implementor will have access to link headers, and some may wish to use mentions.

Can target parameter be optional?

Forking this off from #1 as they are independent issues.

Currently the spec says that the target parameter is required, so that a minimal webmention parser can just check that the URL is in the source document. My naïve example:

https://github.com/kevinmarks/mentiontech/blob/master/main.py#L119

result = urlfetch.fetch(mention.source)
if result.status_code == 200:
    mention.sourceHTML = unicode(result.content,'utf-8')
    if mention.target in mention.sourceHTML:
        mention.verified = True
    else:
        mention.verified = False

Clearly, actually parsing the source document for actual links would be an enhancement here.

in #1 (comment) @csarven says

If we want to talk about what is strictly required, it is just the source. If we want to continue talking about how to let the target know precisely why a target was mentioned, then you need both property and target. Both property and target help with the validation process. If you want to discuss in terms of "extensions", then everything outside of source is an extension.

source is a MUST, property is a SHOULD, target is a SHOULD.

This is true in the specific case that the webmention endpoint is tightly coupled to a particular domain, and thus can know a priori which links are within its purview. That is a common case for webmention, but it is not the only possible case, as webmention receivers can support mutiple target sites,

There is another case where only a source can work - if you are sending webmentions on behalf of a page. indiewebify.me does this. However this is more of a webmention supporting service than an implementation of the protocol (it accepts a url parameter, not source)

Further comments from #1:
@rhiaro:

I'd rather see property required than target optional.

@dissolve:

Lets take taking target as optional first. I think @kevinmarks has a perfect example of anything doing webmention handling as a service being a key place where it is needed to always be there. This is a perfectly valid use-case and likely an important one. I can see this being the same issue for any site that has multiple users (silos even),especially for ones that allow custom domains for users. Moreover this would significantly effect processing of anything that refers to more than one URL. If you mention 1000 URLs in a list of "top 1000 URLs on subject X" you have to check ALL 1000 to see if ANY url you are managing is in there. When given the target URL you can easily verify that you actually care about the URL and that it is referenced by the source.

@csarven:

Making target optional was based on the fact that property and target are not absolutely needed, i.e., an endpoint can still be operational (one counter-example to the raised "issue" against that was: my http://csarven.ca/webmention endpoint).

The proper way forward is to provide both property and target. I was not advocating for target being optional any more than property being optional. They are equally valuable, which is why I last suggested that all source, property, and target should be MUSTs: #1 (comment) , on the basis that having all three represents the complete information of the webmention claim. source, property, target are strictly part of the data.

I hope I have captured everyone's arguments on this point. If not, please comment below.

Client is not defined

In section 3.1.1

Clients MUST support all three options and fall back in this order

However, "Client" was not defined.

I found this sentence slightly confusing. What is actually meant by client?

clarifying URLEncoded form meaning

In Lack of context WebMention the problem of the meaning of URLEncoded forms as paramter/values is considered.

This may not be that difficult to do. We could define a new Link: relation say urlencoded that would point to a transformer from urlencoding to rdf. This would allow a client on making a request to a webmention endpoint

GET /webmention HTTP/1.0

To retrieve a result such as this (see Web Linking RFC), where of course the "urlencoded" relation needs to be described and registered correctly.

200 Ok
Link: <http://w3c.org/social/WebMention>; rel="urlencoded"

The document at <http://w3c.org/social/WebMention> would have both an HTML representation and a machine readable representation.

The human readable representation would just explain how webmention works, the header, and some explanations of the mapping.
The machine readable form would have to give a simple method to transform the attribute/values into a graph with well understood, extensible semantics.

What one really wants is the ability to also retrieve a machine readable document from http://w3c.org/social/WebMention that would describe the url encoded form. It would have some yet to be determined mime type (that is not html), and would return something like this:

PREFIX ping: <http://purl.org/net/pingback/>
CONSTRUCT { 
  [] ping:source ?source;
     ping:target ?target . 
} WITH ?source ?target

Where ?source and ?target are the attribute names of the form. This would allow the WebMention enabled clients to continue sending the attribue value pairs as they do now,

source=http://joe.name/card
target=http://jane.name/other

and would allow a robot to interpret that to be equivalent to the rdf graph written out in Turtle as

@prefix ping: <http://purl.org/net/pingback/> .

[] ping:source <http://joe.name/card>;
   ping:target <http://jane.name/other> .

( clearly there is a piece of syntax still missing in the sketched language to turn the ?source and ?target strings into URLs) This is not that complicated and would allow us to de-siloeify all forms on the web.

This would allow the IndieWeb folk to increase the security of their protocol while retaining their principle of remaining accessible, and it would allow this to be integrated generically into the SoLiD platform, so as to reduce configuration mistakes, and make it easier to automatically create such resources. This would require from the LDPnext side to work out how one can increase the mime type to such a urlencoded form.

Consider expanding verification step 2

The discovery step is very clearly described, and allows for non-HTML content to participate using the protocol via HTTP Link headers. The verification step, where the source is retrieved and it is confirmed that it links to the target, is less clear as to what constitutes a valid link. Further processing requirements would be valuable here.

As a strawperson:

Check if any of the Link headers have the target as the IRI
If the content-type of the delivered representation is (X)HTML, check for linking elements including link, a, img, object and so forth
If the content-type is JSON or JSON-LD, check all of the values of they keys
If the content-type is any other text, check as a string search in the entity-body
Other content-types to be handled at the implementer's discretion

Update HTTP reference to point at RFC723*

Just a minor issue, RFC2616 has been superseded by an array of new specs, RFC7230 and onwards. The reference should be updated to reflect that.

What constitutes the idea of source 'linking to' target?

The intention of this issue is to clarify the following in the current spec in order to give insight into other issues.

The receiver SHOULD perform a HTTP GET request on source to confirm that it actually links to target (note that the receiver will need to check the Content-type of the entity returned by source to make sure it is a textual response).

Two simplified examples at source http://example.org/foo:

<a href="http://example.org/bar">:

a. If the endpoint receives http://example.org/bar, there is a match.
b. If it receives a different value, there is no match, but may want to process further at its own discretion.
c. If it receives no value, receiver has to make a decision on what may be of interest at source.
<a href="http://t.co/abc"> (a short URL or any URL which may or may not eventually resolve at one of target's URLs):

a. If the endpoint receives http://t.co/abc, there is a match.
b. If it receives a different value, there is no match, but may want to process further at its own discretion.
c. If it receives no value, receiver makes a decision on what may be of interest at source.

Options b and c are similar. I've intentionally kept them distinct to have a clear discussion. The intention here is not to necessarily lay down how verifications should be conducted but to understand the moving components better.

Note: This issue emerged from issue #16 . It may also reflect and incorporate the output of issue #1 , #5 , and #12 . See also proposed possible POSTs with interpretations.

urlencoded attribute/values have incomplete semantics

Webmention's urlencoded form communication mechanism has a problem when crossing contexts ( eg, between servers owned by different organisations or individuals). This is also known as Cross Origin communication, and special rules apply to it in Web Browsers (eg. CORS).

For most html Forms on the web this has not been a problem, as the organisation writing the page containing the form is the same as the one writing the program that parses the POSTed name/value pairs and does something with that information. But WebMention is designed to cross contexts.

The problem with urlencoded key/value pair forms when crossing origins, is that the keys lack a clear interpretation, when crossing namespace boundaries. The same key eg: source and target as proposed by the current WebMention spec, can have different meanings in different contexts. Since agents can come to a resource from any other server - we are in a p2p global information space after all - there has to be a way for the client to be able to be clear when he is posting something, what the meaning of what is being posted is, so that there is agreement on the client and the server.

As an example of what could go wrong: the army could quite plausibly set up a joining the army form, and by accident use exactly the same parameter names as webmention. Some people could by mistake or for profit publish links pointing to those forms, leading thereby a lot of people to join the army against their will, when they actually only wanted to send someone a ping message. This was known as taking the king's shilling.

This becomes prevalent when we are building User Agents that follow relations around Web Origins, say following a distributed social network, as these simple web agents won't be able to take the context, aesthetics and meaning of the page into account before acting. These agents therefore need to know before posting what the meaning of the content they will POST is from the point of view of the receiver.

One way to do this is to make sure the mime type of what is POSTed has a well understood interpretation.
- Semantic Pingback seems to suggest that adding RDFa to the form does the job of specifying the meaning of the form. (Is that actually specified somewhere?)
- Using Activity Streams has been proposed as they have their own mime type
Another way is to extend urlencoding by turning attributes into URLs as proposed by @melvincarvalho . This could function in that it could be argued that the meaning of urlencoded forms until now was always a client/server relation and that the parameters therefore were always interpreted relative to the forms base url. But see limitations discussed in #11
The other way to do this, is for the resource receiving the urlencoded form (an endpoint in the webmention, a Container in LDP), to specify the interpretation of the meaning of what will be sent by mapping it into a well defined semantics (eg. RDF). This could be done the server providing an appropriate Link Header. This is considered in more detail in Issue 11: clarifying URLEncoded form meaning.

All of these answers make it easier to integrate with the SoLiD, if only for the simple reason that it then becomes possible for a POST to an LDPC to create a resource that can return a number of different representations.

Differentiate new class "proxy receiver" and how it differs from "receiver"

There are currently two types of webmention receivers implemented. The simple receiver exists on a single domain and accepts webmentions for URLs at that domain or URLs that redirect to that domain (such as shortURLs or aliases). Proxy receivers accept webmentions for domains other than themselves. Examples of these include:

Parts of the spec that apply differently to proxy receivers include:

"Verifies that target is a valid resource for which the receiver accepts Webmentions"

Feel free to comment with other parts of the spec that differ for this class.

w3c / webmention Goto Github PK

webmention's Introduction

Webmention

webmention's People

Contributors

Stargazers

Watchers

Forkers

webmention's Issues

Examples

Recommend Projects

Recommend Topics

Recommend Org