This repo contains the latest editor's draft of the Webmention specification.
Implementation reports are collected in the Implementation Reports folder.
Webmention spec
Home Page: https://www.w3.org/TR/webmention/
This repo contains the latest editor's draft of the Webmention specification.
Implementation reports are collected in the Implementation Reports folder.
Right now I am concerned. If the webmention is a public endpoint that is unauthenticated, then the computation it performs should be bounded. That is, you should know when you get a webmention that the effort it would take to verify will be reasonably deterministic. If you have to perform a GET, and then you have to parse, then it seems to be unbounded (unless the size of the source document is capped. etc etc.)
So, webmentions can come from third parties (site C) to tell site A about a link from site B. But, a malicious third party can find some large source page on B and send a bunch of webmentions to A, causing a lot of wasted effort on A (and potentially bandwidth on B) Meaning, depending on how B mitigates: B could ban/block A. B could run out of bandwidth or be hit financially. Site A needs to block third-party C (but C can be any server anywhere, so it has so few burdens) Site A could run out of cpu resources or bandwidth. Site A could not be able to reasonably schedule other webmention verifications among other tasks it needs to do because it doesn't know how much effort a webmention takes. Site A could be hit financially. You can certainly nitpick these, but only one has to be reasonably true.
For verification to be reasonably bounded, you'd sign the message with site A or have some specified way of bounding the GET that is made and refusing to do anything else. That would require A to more actively participate in the protocol, but would limit the amount of data B needs to pull from A. There are already solutions found in other similar protocols. Salmon, for instance, signs the message and therefore assumes that only the origin requires verification, not the semantics of the message.
At any rate, the simplicity of the protocol makes it very fragile. Therefore, some discussion on the implications of verification and how to avoid such pitfalls would be beneficial. Such as: rejecting documents that are too large to verify, limiting redirects, checking for and refusing streamed data, caching popular documents.
(I think, as an implementer, I wouldn't recommend such a potentially open protocol be added to one's software. If it had easy reasonably bounded origin verification, then yes, which I would suggest as a MUST. And then I'd recommend that full-text verification be done only very optionally at idle times for servers I don't trust/know with a long set of restrictions such as those I listed. I'm still thinking about it a lot though. Full-text verification seems a bit too game-able to reliably enforce trust over time.)
Why does the spec (and the test suite) not accept "404 NOT FOUND" as well? In my admittedly limited experience, most CMS platforms don't natively support "remembering" old URLs, thus being able to return a 410. I would think a 404 response would be just as meaningful.
If no webmention endpoint is discovered for a target, it would be best to avoid attempting to re-discover an endpoint for the domain until some amount of time has passed, to avoid making a bunch of unnecessary requests to the site. We should create some recommendations about when to throttle back trying to discover a webmention endpoint based on the target domain.
A number of people here have trouble understanding Cross-Origin issues as discussed in issue 10: Lack of context in WebMention. See the discussion there or @kevinmarks comment explaining forms in issue 1.
One answer to this is to require that WebMention links only ever be to the same origin: ie. that people host their own WebMention end point. It could be argued that in that case we do indeed then find ourselves in the same situation pointed out by @kevinmarks with forms since the beginning of the web, and that extra context is not required:
I'd say that nearly 20 years without a spec more formal than that indicates that we may not need one, given how much has been built on it, theoretical drone strikes notwithstanding.
If an answer cannot be found to #10 then this would be a way to reach a minimal consensus.
Consider the current use case from the current WebMention spec Protocol Summary section.
Imagine that Aron's webmention statement points to the an endpoint on another origin, that understands the source
and target
attributes in a perfectly reasonable way for that service. Perhaps someone hacked Aaron's web site, or a man in the middle has altered Aaron's response to Barnaby's agent's request and changed the link to point to the other endpoint, or just simply because Aaron is himself intent on some form of mischief.
As a result Barnaby's webmention enabled agent ( this could be a client or server ) would - having published Barnaby's post referring to Aaron's entry - and having retrieved the mischievous webmention endpoint POST a message, send to this actually non webmention endpoint a message that is interpreted by that agent as meaning something completely different by the server. What kind of meaning could that agent give the source
and target
attributes? Pretty much anything. Here are some ideas:
In 6. even though there is no explicit authentication there is an implicit authentication because Barnaby's web agent somehow finds itself inside the firewall. This links this issue to issue #14 "Webmention MUST be done anonymously".
The examples here are examples that could be used by issue #1 "Introduce Property parameters". But if it is easy to imagine a protocol which specifies the relation between link and target, it is also easy to imagine that other people are already using such systems now, without reference to webmention, or that people could develop some inspired by it, or with mischievous intent.
Given that webmention is strictly based on form-urlencoded data, it'd be important to caution implementers on what the W3C Recommendation for HTML5 says (see note):
https://www.w3.org/TR/html5/forms.html#url-encoded-form-data
Assumptions:
$_REQUEST
or Ruby (Sinatra)'s params[:foo]
)The attacker, Chuck, creates a reply to one of Alice's blog posts at http://chuck.example.com/attack and sets the webmention endpoint for the page to the victim's URL, which may be behind a corporate firewall. The URL contains query string parameters that are crafted to cause the victim to perform some operation such as creating a new user account, or some other undesirable operation. An example might be an internal user registration form that creates accounts that can be used from outside the firewall. For example, Chuck sets the webmention endpoint to http://victim.internal/register?userid=chuck&password=1234
.
Once Chuck has set the webmention endpoint on his page to the internal server, he then sends a webmention to one of Alice's blog posts that is running on a blog inside the firewall at http://alice.internal/post/100. This causes Alice's server to fetch his reply and show it as a comment.
Chuck then sends a follow-up comment which Alice's server accepts. Then, following the Salmention rules, Alice's server sends webmentions to all previous comments to notify them of the new comment.
Alice's server discovers the maliciously crafted webmention endpoint on Chuck's original comment, and sends a post request to it that looks like the following:
POST /register?userid=chuck&password=1234 HTTP/1.1
Host: victim.internal
Content-type: application/x-www-form-encoded
source=http://alice.internal/post/100&target=http://chuck.example.com/attack
The vulnerable server ignores the source and target parameters but processes the query string parameters as if they were post values.
The summary is that the attacker can cause Alice's server to send arbitrary requests to a server.
Section 3.2.2 of https://www.w3.org/TR/webmention/#webmention-verification says
If the receiver is going to use the Webmention in some way, (displaying it as a comment on a post, incrementing a "like" counter, notifying the author of a post), then it must perform an HTTP GET request on source, and follow any HTTP redirects (up to a self-imposed limit such as 20) and confirm that it actually links to the target.
The suggestion of using a GET is wrong IMO because it 1) is doing a full resource request when a HEAD request will suffice at this stage and 2) by requiring a GET my implementation really cannot perform a HEAD (for the first reason)
My suggested change would be to say
" If the receiver is going to use the Webmention in some way, (displaying it as a comment on a post, incrementing a "like" counter, notifying the author of a post), then it SHOULD perform an HTTP HEAD request on source, and follow any HTTP redirects (up to a self-imposed limit such as 20) and confirm that it actually links to the target."
Change:
it must perform an HTTP GET request on source, and follow any HTTP redirects (up to a self-imposed limit such as 20) and confirm that it actually links to the target.
to
it must perform an HTTP GET request on source, following any HTTP redirects (and SHOULD limit the number of redirects it follows) to confirm that the target URL actually links to the target.
The suggested limit of 20 should be moved to the "Limits on GET requests" section.
I do not support this issue. I'm just helping organize here, separating it from #39.
In #39 there seem to be several proposals considered:
OPTION 1. Where Webmention currently uses form-encoding, it should instead use JSON-LD. That is, Webmention receivers would be required to understand JSON-LD and not required to understand form-encoding. This has a big problem: it would be a flag day change., which seems impossible in a decentralized system.
OPTION 2. Webmention receivers should be required to understand both JSON-LD and form-encoding. This has a medium-sized problem: it would make every Webmention receiver more complicated, since it would have to understand two syntaxes, not just one.
I leave it to someone who supports this issue to make the case. Hopefully they can do it with a very simple use case that shows how someone would benefit from one of these OPTIONS being adopted.
My suggestions is that if someone wants to use JSON-LD with webmentions, they instead use one of the non-webmention solutions that works with JSON-LD, like Semantic Pingback, Solid Inbox, or ActivityPub.
As noted here: https://tinokremer.nl/2015/benwerd-i-figured-out-what-goes-wrong-with-webmentions-comments
Webmention handlers should strip fragments before checking if the URL is one they handle. Fragments can be used to direct the webmention to a subsection of the destination URL, and if they are fragmentions, a particular phrase.
[raised on behalf of Addison Phillips, discussed in i18n telecon]
https://www.w3.org/TR/webmention/#receiving-webmentions
In Section3.2 there are examples of responses. They are giving the Content-Type
of text-plain
with no charset parameter. Please include a charset=UTF-8.
Should there also be a health warning to use UTF-8?
The WG charter states:
A JSON-based syntax to allow the transfer of social information
Yet I was unable to locate any JSON based examples in the text (tho perhaps I missed something)
Could this be made available for review?
The requirement to do an HTTP GET on the source and to verify whether it indeed references the target excludes important use cases, for example in web-based scholarly communication. I will explain by means of a very hot topic: linking publications with datasets. Other cases exist.
For publication/dataset linking, the publication (source) would use Webmention to inform the dataset (target) that it is being referenced in the paper. Typically:
It would be very hard (or even impossible) to perform "Webmention verification" as described in the spec because:
Even if one were to use the URIs of the actual content (PDF file, dataset) instead of the DOIs as source/target URIs, two of the above problems would remain.
I very much understand that this problem is to a large extent related to the fact that web-based scholarly communication does not necessarily operate in a manner that aligns very well with the way other pockets of the web do. Then again, I assume paywalls and landing pages exist beyond scholarly communication. And, most importantly, I would love if webmention could be used in scholarly communication, see eg slides 45-52 of [http://www.slideshare.net/hvdsomp/reminiscing-about-interoperability].
Hence a suggestion to consider an additional aspect regarding "Webmention verification", which could be along these lines "if the receiver has a trust relationship with the sender, verification is optional".
Cheers
Herbert Van de Sompel
Los Alamos National Laboratory
There is no document at :
http://www.w3.org/ns/webmention#
Describing the key terms used in webmention
Is there something intended to be there, or a pointer to draft text. That would make it easier to review.
The IANA registry is not used for HTML rels.
The existing-rels registry has been updated with the current specification link.
One way proposed by @sandhawke and others to reduce the risks of situations described in issue 10: Lack of context in WebMention is to force WebMentions to be anonymous. This way someone POSTing a form can never be taken to be responsible for doing so. This seems to be the argument given by @sandhawke in a comment to that post
It's a POST done without any credentials, so some would say it can't do anything bad, but I think bblfish is imagining that maybe sometimes doing I POST even without credentials could be taken as a commitment. I think there may be occasional poorly designed systems where that's true.
If it is only a problem with poorly designed systems, then they can be blamed for it. On the other hand systems in which the client has authenticated would be well designed. So for those the argument would go through.
Therefore it seems that WebMention cannot allow client to authenticate.
The current spec says:
- The receiver SHOULD perform a HTTP GET request on source to confirm that it actually links to target (note that the receiver will need to check the Content-type of the entity returned by source to make sure it is a textual response).
Making this a MUST seems a clearer statement of intent for the protocol. Posting unverified links is the Trackback problem.
Abstract:
Webmention is a simple way to notify any URL when you link to it on your site. From the receiver's perspective, it's a way to request notifications when other sites link to it.
Intro:
At a basic level, a Webmention is a notification that one URL links to another.
'link' is debatable given that a plain text file with a URL in it is a valid webmention source. Something like:
"Webmention is a simple way to notify any URL that it appears on your site. From the receiver's perspective, it's a way to request notifications when others sites refer to it."
"At a basic level, a Webmention is a notification that one URL appears at another."
Section 1.2 of the current draft is called "Protocol Summary" and does seem to describe some sort of protocol, but there is no introduction or context. Shouldn't there be one?
[raised by Addison Phillips, discussed in i18n telecon]
https://www.w3.org/TR/webmention/#receiving-webmentions
Section 3.2 say (in part):
The response body MAY contain content, in which case a human-readable response is recommended.
There is no mention of language negotiation or language identification here. The assumption appears to be that a wad of English is returned? ;-)
The example could include a Content-Language
header or might allow for other language identification in the body (complicated)
This is also applicable to at least 3.2.3 Error Responses as well.
If the Webmention was not successful because of something the sender did, it MUST return a 400 Bad Request status code and MAY include a description of the error in the response body.
[raised by Addison Phillips, discussed in i18n telecon]
https://www.w3.org/TR/webmention/#sender-notifies-receiver
Section 3.1.2 describes the submission of the source
and target
URLs in the x-www-form-urlencoded
format. There is no mention of character encoding, which normally is an important concern for this format. However, since the strings in question are URLs, they are presumably already "URL encoded" using the character encoding recognized by the host server. I'm raising this issue to point out that the charset issue works in this case. However, if fields were added to the payload in a future revision, the charset might become important.
In section 5 of the protocol
Barnaby's server sends a webmention to Aaron's post's webmention endpoint with
source set to Barnaby's post's permalink
target set to Aaron's post's permalink.
Source and target are form encoded parameter. While this makes sense in that context it suffers from a couple of weaknesses.
In order to make this more scalable it is advantageous to be able to systematically convert those parameters into URIs. This information could be gained either in a generic way that applies to all form encoded variables (tho none exist today), or described in the spec.
It is undesirable for software to do this ad hoc, as different decisions might be made by different code bases leading to interoperability issues.
Suggestions for this:
If there's no preference here, I'd suggest using the pingback vocab as it was one of the cited motivations for webmention.
A few words in the text of the spec could make clear how software providers could use webmention in the form context and also with linked data based systems.
The webmention spec lack interoperability with linked data.
What is needed is an explicit translation of
source=https://waterpigs.example/post-by-barnaby&
target=https://aaronpk.example/post-by-aaron
Into a format that can be implemented by those working with web standards such as JSON LD, Turtle, AS2 etc.
While the spec clarifies the namespace that could be used in the predicate position. The mapping remains unclear for implementors and needs to be stated explicitly.
It is clear that webmention alone suffers from the "webmention spam" problem leading to possible DDoS attacks. The argument I have heard is that while webmention is small, it's not a problem, but if it becomes a W3C REC extensibility will become critical. The samlention and vouch system essentially are replicating work done elsewhere in the W3C such as digital signatures and verifiable claims. Rather than replicating this work twice, or forking the working group, I would strongly suggest aligning the work now by adding an example to the spec to explicitly show how implementors can send source and target to a server using linked data (JSON-LD might be a good fit).
But this mapping is currently under specified. I suggest using this issue to come up with the closest mapping possible.
I thought the webmention spec was supposed to be agnostic of the markup used in the source. As such, a MUST on microformats2 parsing in 2.2.4 seems inappropriate.
Even though supporting updating mentions is MAY, this implies that if you are going to support it, you have to support mf2, which may deter people who are interested in other kinds of semantic markup, or indeed syntaxes other than html, from supporting updating at all. If someone wants to support updates, shouldn't it be left up to them to decide how they're going to get the updated data? (Which would be consistent with other similar decisions left up to implementers elsewhere in the spec).
It occurs to me that all your DDoS concerns that are creeping into the protocol are the same as email has been dealing with for years. In email, there are two main ways:
(http://www.openspf.org/Related_Solutions)
For webmentions, you could spec that:
Webmentions replaced XMLRPC with something RESTish, but they aren't fully RESTful because they still use indirection, by sending encoded sender=...&receiver=....
It would drastically cull the number of errors and be much more web happy if the protocol instead was:
This mixes the GET/POST pair that sender does into a single GET, which will look totally normal to servers that don't understand, without requiring any HTML code changes. As more people come online to webmentions, servers can progressively start to adopt them, but until then nothing untoward happens if a notification is sent when no one is listening. The current spec does a sorta preflight check by asking the receiver if and where its webmentions should go.
Another idea is simply to use the Referer header in step 1, which I think is something at least some blog engines used to do for finding trackbacks: blogs would ride on the back of their surfers for tracing out the web. But by just making the server pretend to be a surfer, you get immediate push notification.
I guess that having a separate webmention endpoint made deployment seem easier, especially to something like Wordpress. But it should be just as simple to write some middleware that catches X-Webmention headers. The longest part is what to do with the mention once you've got it, anyway.
Aaron and I noticed a problem with the normative references. There's a W3C process rule that specs aren't allowed to normatively refer to unstable technologies, because then conformance could change after-the-fact.
But aside from the process issue, which we could address other ways, I think the Salmention clause in Webmention isn't a good idea. The spec says:
A Webmention update implementation MAY support updating data from children, or other descendant objects of the primary object (e.g. a comment h-entry inside the h-entry of the page). If an implementation does support this, it MUST support it according to the [Salmention] extension specification (AKA a "Salmention implementation").
First off, I find that text very hard to understand. What are "children" and "descendant objects"? Those terms aren't defined in the spec, as far as I can tell, and mean nothing to me. Do I need to understand what they mean? By using "MUST" this text says I do. Are there test cases for this?
I do understand Salmention, I think. It's the practice that when your server receives a webmention and incorporates the content that mentions you into your own content, you should issue your webmentions again, so that "upstream" sites (things you point to) can see the "downstream" content (sites that point to you). But isn't that upsteam relaying is already implied by:
If the source URL was updated, the sender SHOULD re-send any previously sent Webmentions, (including re-sending a Webmention to a URL that may have been removed from the document), and SHOULD send Webmentions for any new links that appear at the URL.
... although maybe I don't understand that correctly. I think it means "If the source Resource was updated".
So, in effect, by following that SHOULD you're doing Salmentions without even knowing it, if you happen to include into your content any content that mentions you (with suitable provenance).
In short, can we remove the Salmentions text? And if not, can someone rephrase it in a way that makes quite plain the entirety of what I need to know in doing a complete and conformant Webmention implementation?
webmention is cool and I like it.
However, I think we need more to really make progress on the "Federation Protocol" deliverable mentioned in the charter.
Other than webmention, LDP is the other protocol mentioned in the charter as a 'possible input' to that deliverable, but it doesn't help prescribe much that helps with webmentions or pingbacks (that I can tell).
@melvincarvalho points out the similarity between webmention and something many of us have worked with for over a decade
Webmention has been implemented in linked data in the form of "Semantic Pingback" http://www.w3.org/wiki/Pingback#Semantic_Pingback
Notably, (AFAICT) the W3C has never produced a Pingback standard. The closest thing to a standard I can find is "Pingback 1.0" hosted on hixie.ch (Ian Hickson, WHATWG).
hixie is rad, and an editor of HTML5. But link rel pingback is not specified as part of HTML5. http://www.w3.org/TR/html5/links.html#linkTypes
So Pingback is not a standard we have to adhere to.
But I think there are some important questions to answer if we want the "Federation Protocol" to be a CR, and not just a Note: Why throw away 13 years of implementations and start from scratch? Can we salvage Pingback 1.0 into a W3C specification that is similar, but also modern enough to drive adoption of web federation for at least another 13 years (perhaps with a logical major version bump, '2.0')? The answer is not obvious.
The following two sections/lists are meant as starting points. Please suggest additions
Update. I have strikken through points where I was wrong, which were numerous..
Here are things that Pingback deals with that Webmention currently doesn't
HEAD
requests without having to download/parse HTMLHere are things that I personally don't like about the idea of implementing Pingback 1.0 in this era of the web
application/x-www-form-urlencoded
- Like webmention todayapplication/ld+json
- http://www.w3.org/TR/json-ld/application/activity+json
- http://www.w3.org/TR/activitystreams-core/text/turtle
- http://www.w3.org/TR/turtle/application/json
- https://tools.ietf.org/html/rfc7159
{ "source": "url", "target": "url" }
. The Activity Vocabulary is doing this nicely.application/x-www-form-urlencoded
just like Webmention does today.I expect this to stoke some strong opinions, but I believe it illustrates the possibility of a Middle Way to a Federated Social Web that doesn't ignore past efforts (webmention didn't ignore it) abandon existing pingback markup, and also considers the current uses and extensibility benefits of linked data.
If this is well-received, I could try to draft something that addresses these issues while attempting to be very close to a superset of Webmention and Pingback 1.0.
Currently the spec says:
The webmention endpoint is advertised in the HTTP Link header or a
<link>
or<a>
element withrel="webmention"
. If more than one of these is present, the HTTP Link header takes precedence, followed by the<link>
element, and finally the<a>
element. Clients MUST support all three options and fall back in this order.
However it does not address what should be done if multiple endpoints are indicated for a page using one of these techniques.
I'd suggest that in this case, sending a webmention to each one found makes sense. As webmention does not define what endpoints should do, it is clear that there could be different webmention triggered services - one that creates useful comment threading, and one that caches linked-from pages, for example.
In addition, if you are migrating from one webmention service to another, being able to ping both in parallel is best practice to ensure consistency.
In the Updating section, 410 is explicitly called out as triggering a deletion. It seems like many situations would result in a 404 to trigger deletion as well, such as the domain name being sold and reused, the content management system changing and becoming unaware of all previous URLs, and so forth.
Equally, simply don't refer to HTTP status codes, as one assumes that the 410 would not mention the target, and hence fall under the second clause that triggers deletion 😸
seeing that http://www.iana.org/assignments/link-relations/link-relations.xhtml currently does not list "webmention" as a registered link relation type, i am proposing to add a section to the draft that registers "webmention" in the IANA registry according to RFC 5988. i'd be more than happy to contribute such a section. this should make it easier for people discovering "webmention" links on the web to find the authoritative document specifying the link relation type.
link: https://hypothes.is/a/yHecWAE6SOewee7LbZheEg
Add a fourth discovery layer of whole-domain delegation via .well-known.
http://indiewebcamp.com/irc/2015-11-29#t1448856141695
(this could also just be documenting how to use webfinger for this)
This lets one domain delegate all its webmentioning to another provider without having to adjust server code to add new response headers or HTML
Started on #1, but that issue is specifically to introduce a new parameter in the currently specced application/x-www-form-urlencoded
request body content-type.
Webmention has been implemented in linked data in the form of "Semantic Pingback" http://www.w3.org/wiki/Pingback#Semantic_Pingback
Semantic Pingback is worth being familiar with in the webmention discussion. However, that document does not use normative vocabulary, and thus I don't think it is fit as-is.
I'd suggest to use namespaces for source, target etc. from the pingback vocab if wishing to implement this in the linked data world. Alternatively putting something under w3.org/ns could work.
Which may or may not be a good idea. Regardless, I've never seen LD namespaces in a application/x-www-form-urlencoded
string. So perhaps it is a good idea to standardize webmention pings in other Content-Types for use cases where more than just a 'source' and 'target' are known.
Iff webmention does standardize LD-friendly Content-Type ping bodies, I agree that it's a good idea to re-use (or explicitly equivalent) the pingback namespace (esp as it already defines source
and target
as in current webmention draft).
It seems to me that all the debates this morning about 'namespaces' really imply a request for further discussion on Content-Types.
The content that follows "How to test webmentions" is implementation rather than specification focused, and would be great to keep, just in a Note or other mechanism than a spec :)
Being able to transfer an Activity description (to be embedded in an ActivityStream, for example) would be great for integration. As AS has its own content-type, there would be no confusion as to what the payload consists of.
As an strawperson proposal, systems MUST support www-form-encoded, and SHOULD support AS?
This gives an extension point in the future if there are further syntaxes that can also carry the same information (perhaps with additional structure that's impossible in form-encoded), while still establishing form-encoded as the lingua franca for the 99% simple cases.
The webmention endpoint will validate and process the request, and return an HTTP status code. Most often, 202 Accepted will be returned, indicating that the request is queued and being processed asynchronously to prevent DoS attacks. The response body SHOULD include a URL that can be used to monitor the status of the request.
-> The Location
header SHOULD include a URL that can be used to monitor the status of the request, response body MAY contain this URL?
Document and incorporate the idea of using properties as a parameter to better employ webmentions.
This was already documented and mentioned here:
in which the IWC community is well-aware of (let me know if citations are needed). And, here on the Social Web WG mailing list:
https://lists.w3.org/Archives/Public/public-socialweb/2015Jul/0092.html
and several times on IRC #social . Let me know if citations are needed on that.
If this repository is strictly about IWC's webmention, then I propose renaming it.
When checking if target is a redirect, there is potentially no end to the 301 redirect chain. Browsers have a limit where they'll stop following redirects after N. While specifying N is not a good idea, the spec should at least have a note about following redirects up to a chosen limit, and possibly recommending something sane such as the default that browsers follow.
If an alleged webmention endpoint is blocked by robots.txt on that domain, senders should not send to it.
If a receiver is about to verify by requesting some endpoint, and that endpoint is blocked by robots.txt, the receiver should not request it.
http://www.robotstxt.org/robotstxt.html
Is this a good idea?
Two methods of discovery are listed in section 3.1.1
The sender MUST fetch the target URL (and follow redirects [FETCH]) and check for an HTTP Link header [RFC5988] with a rel value of webmention, or an HTML or element with a rel value of webmention.
Namely
Not all implementations on the social web are based on HTML. Other serializations need to be handled in order to achieve interoperability with existing standards.
Suggestions
I much prefer this link to be in the body, as not every implementor will have access to link headers, and some may wish to use mentions.
Forking this off from #1 as they are independent issues.
Currently the spec says that the target parameter is required, so that a minimal webmention parser can just check that the URL is in the source document. My naïve example:
https://github.com/kevinmarks/mentiontech/blob/master/main.py#L119
result = urlfetch.fetch(mention.source)
if result.status_code == 200:
mention.sourceHTML = unicode(result.content,'utf-8')
if mention.target in mention.sourceHTML:
mention.verified = True
else:
mention.verified = False
Clearly, actually parsing the source document for actual links would be an enhancement here.
in #1 (comment) @csarven says
If we want to talk about what is strictly required, it is just the source. If we want to continue talking about how to let the target know precisely why a target was mentioned, then you need both property and target. Both property and target help with the validation process. If you want to discuss in terms of "extensions", then everything outside of source is an extension.
source is a MUST, property is a SHOULD, target is a SHOULD.
This is true in the specific case that the webmention endpoint is tightly coupled to a particular domain, and thus can know a priori which links are within its purview. That is a common case for webmention, but it is not the only possible case, as webmention receivers can support mutiple target sites,
There is another case where only a source can work - if you are sending webmentions on behalf of a page. indiewebify.me does this. However this is more of a webmention supporting service than an implementation of the protocol (it accepts a url
parameter, not source
)
Further comments from #1:
@rhiaro:
I'd rather see property required than target optional.
Lets take taking target as optional first. I think @kevinmarks has a perfect example of anything doing webmention handling as a service being a key place where it is needed to always be there. This is a perfectly valid use-case and likely an important one. I can see this being the same issue for any site that has multiple users (silos even),especially for ones that allow custom domains for users. Moreover this would significantly effect processing of anything that refers to more than one URL. If you mention 1000 URLs in a list of "top 1000 URLs on subject X" you have to check ALL 1000 to see if ANY url you are managing is in there. When given the target URL you can easily verify that you actually care about the URL and that it is referenced by the source.
Making target optional was based on the fact that property and target are not absolutely needed, i.e., an endpoint can still be operational (one counter-example to the raised "issue" against that was: my http://csarven.ca/webmention endpoint).
The proper way forward is to provide both property and target. I was not advocating for target being optional any more than property being optional. They are equally valuable, which is why I last suggested that all source, property, and target should be MUSTs: #1 (comment) , on the basis that having all three represents the complete information of the webmention claim. source, property, target are strictly part of the data.
I hope I have captured everyone's arguments on this point. If not, please comment below.
In section 3.1.1
Clients MUST support all three options and fall back in this order
However, "Client" was not defined.
I found this sentence slightly confusing. What is actually meant by client?
In Lack of context WebMention the problem of the meaning of URLEncoded forms as paramter/values is considered.
This may not be that difficult to do. We could define a new Link:
relation say urlencoded
that would point to a transformer from urlencoding to rdf. This would allow a client on making a request to a webmention endpoint
GET /webmention HTTP/1.0
To retrieve a result such as this (see Web Linking RFC), where of course the "urlencoded" relation needs to be described and registered correctly.
200 Ok
Link: <http://w3c.org/social/WebMention>; rel="urlencoded"
The document at <http://w3c.org/social/WebMention>
would have both an HTML representation and a machine readable representation.
What one really wants is the ability to also retrieve a machine readable document from http://w3c.org/social/WebMention
that would describe the url encoded form. It would have some yet to be determined mime type (that is not html), and would return something like this:
PREFIX ping: <http://purl.org/net/pingback/>
CONSTRUCT {
[] ping:source ?source;
ping:target ?target .
} WITH ?source ?target
Where ?source
and ?target
are the attribute names of the form. This would allow the WebMention enabled clients to continue sending the attribue value pairs as they do now,
source=http://joe.name/card
target=http://jane.name/other
and would allow a robot to interpret that to be equivalent to the rdf graph written out in Turtle as
@prefix ping: <http://purl.org/net/pingback/> .
[] ping:source <http://joe.name/card>;
ping:target <http://jane.name/other> .
( clearly there is a piece of syntax still missing in the sketched language to turn the ?source and ?target strings into URLs) This is not that complicated and would allow us to de-siloeify all forms on the web.
This would allow the IndieWeb folk to increase the security of their protocol while retaining their principle of remaining accessible, and it would allow this to be integrated generically into the SoLiD platform, so as to reduce configuration mistakes, and make it easier to automatically create such resources. This would require from the LDPnext side to work out how one can increase the mime type to such a urlencoded form.
The discovery step is very clearly described, and allows for non-HTML content to participate using the protocol via HTTP Link headers. The verification step, where the source is retrieved and it is confirmed that it links to the target, is less clear as to what constitutes a valid link. Further processing requirements would be valuable here.
As a strawperson:
Just a minor issue, RFC2616 has been superseded by an array of new specs, RFC7230 and onwards. The reference should be updated to reflect that.
The intention of this issue is to clarify the following in the current spec in order to give insight into other issues.
The receiver SHOULD perform a HTTP GET request on source to confirm that it actually links to target (note that the receiver will need to check the Content-type of the entity returned by source to make sure it is a textual response).
Two simplified examples at source http://example.org/foo
:
<a href="http://example.org/bar">
:
a. If the endpoint receives http://example.org/bar
, there is a match.
b. If it receives a different value, there is no match, but may want to process further at its own discretion.
c. If it receives no value, receiver has to make a decision on what may be of interest at source.
<a href="http://t.co/abc">
(a short URL or any URL which may or may not eventually resolve at one of target's URLs):
a. If the endpoint receives http://t.co/abc
, there is a match.
b. If it receives a different value, there is no match, but may want to process further at its own discretion.
c. If it receives no value, receiver makes a decision on what may be of interest at source.
Options b and c are similar. I've intentionally kept them distinct to have a clear discussion. The intention here is not to necessarily lay down how verifications should be conducted but to understand the moving components better.
Note: This issue emerged from issue #16 . It may also reflect and incorporate the output of issue #1 , #5 , and #12 . See also proposed possible POSTs with interpretations.
Webmention's urlencoded form communication mechanism has a problem when crossing contexts ( eg, between servers owned by different organisations or individuals). This is also known as Cross Origin communication, and special rules apply to it in Web Browsers (eg. CORS).
For most html Forms on the web this has not been a problem, as the organisation writing the page containing the form is the same as the one writing the program that parses the POST
ed name/value pairs and does something with that information. But WebMention is designed to cross contexts.
The problem with urlencoded key/value pair forms when crossing origins, is that the keys lack a clear interpretation, when crossing namespace boundaries. The same key eg: source
and target
as proposed by the current WebMention spec, can have different meanings in different contexts. Since agents can come to a resource from any other server - we are in a p2p global information space after all - there has to be a way for the client to be able to be clear when he is posting something, what the meaning of what is being posted is, so that there is agreement on the client and the server.
As an example of what could go wrong: the army could quite plausibly set up a joining the army form, and by accident use exactly the same parameter names as webmention. Some people could by mistake or for profit publish links pointing to those forms, leading thereby a lot of people to join the army against their will, when they actually only wanted to send someone a ping message. This was known as taking the king's shilling.
This becomes prevalent when we are building User Agents that follow relations around Web Origins, say following a distributed social network, as these simple web agents won't be able to take the context, aesthetics and meaning of the page into account before acting. These agents therefore need to know before posting what the meaning of the content they will POST
is from the point of view of the receiver.
POST
ed has a well understood interpretation.
RDFa
to the form does the job of specifying the meaning of the form. (Is that actually specified somewhere?)All of these answers make it easier to integrate with the SoLiD, if only for the simple reason that it then becomes possible for a POST to an LDPC to create a resource that can return a number of different representations.
There are currently two types of webmention receivers implemented. The simple receiver exists on a single domain and accepts webmentions for URLs at that domain or URLs that redirect to that domain (such as shortURLs or aliases). Proxy receivers accept webmentions for domains other than themselves. Examples of these include:
Parts of the spec that apply differently to proxy receivers include:
Feel free to comment with other parts of the spec that differ for this class.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.