This has been sent to the MASQUE WG mailing list in hopes to get some discussion there. However, I want an issue that tracks this:
In the work of writing up our Connect-IP proposal (https://datatracker.ietf.org/doc/draft-kuehlewind-masque-connect-ip/) we looked into how to deal with MTU issues effectively. Our conclusion was that this is going to be a general problem for any user of HTTP Datagrams. Thus, we like to propose that MTU handling is done within HTTP datagram.
This email will start explaining what we see as requirements for a MTU signaling solution for HTTP datagram, then propose a potential solution.
Lets start with a figure that provides us with a framework to discuss the requirements:
+--------+ Path#1A +--------+ +-------+ +--------+
|Client A|<------->| HTTP | Path#2 | MASQUE| Path#3 | |
+--------+ | Inter- |<------>| Server|<------>| Target |
+--------+ Path#1B | mediary| |(proxy)| | |
|Client B|<------->| | | | | |
+--------+ +--------+ +-------+ +--------+
I think what makes this a bit more complex is the fact that we need to consider HTTP intermediaries, such as a front end load balancer that terminates a first QUIC and HTTP/3 connection between the client and that intermediary. From that intermediary another HTTP connection is used towards the HTTP server that consumes and produces the HTTP datagrams, and in the case of CONNECT-UDP and CONNECT-IP it also have a third path towards the target to consider. This figure includes two clients to remind us to consider that the HTTP intermediary may actually aggregate the HTTP request and response and HTTP datagram over one HTTP/2 or HTTP/3 connection over Path #2. In the case of MTU this complicates things more as the proxy cant assume that all HTTP requests have the same MTU for its datagrams on a HTTP/3 connection. And the Intermediary will be the entity that have direct knowledge of the client facing as well as the next hop MTU over the HTTP connections that may all differ in MTU.
We also have to consider the fact that the underlying transport connection may at any time be subject to a IP MTU change due to route change for the path between the nodes. In addition if one have enabled PMTUD in TCP or QUIC a larger MTU on the individual path could be made available and in some case desirable to use. Thus, we need to consider dynamic changes during the HTTP connections life time and each HTTP request response pairs usage of HTTP datagram.
So when using HTTP/3 datagram there are a strict MTU limit on the individual datagrams for it to be possible to be sent as QUIC datagrams, and not being forced to be encapsulated as CAPSULES over the reliable stream. This is clearly a possibility but results in that the datagrams are sent reliable and in order for each HTTP request, i.e. Connect-UDP or CONNECT-IP request. Also in case some end-to-end payloads fit in HTTP datagrams and others don’t there is potential for reordering among the payloads. Thus, to avoid this the client and the proxy needs to determine what the lowest currently supported HTTP datagram size on the path.
For each QUIC connection the end point will know what the initial MTU value is for this path when the HTTP/3 connection has been successful completed. However, that knowledge will not be available if one attempt to construct and send the HTTP request prior to connection establishment has concluded.
So the requirements we see for an MTU handling solution for HTTP Datagrams are the following.
- Hop-by-hop signaling across the HTTP entities of the lowest MTU of any sub-path
- Needs to be associated to a particular HTTP request or end-to-end path to support aggregation by HTTP intermediaries
- Endpoints needs to be able to initiate update of the MTU value upon detection of any changes during HTTP Datagram streams lifetime.
- HTTP Intermediaries needs to be able to initiate updates of the MTU value upon detecting MTU changes from the individual HTTP connections.
Solution proposal
A new HTTP Datagram Capsule is defined for MTU value exchange. This one is intended for the HTTP intermediary that needs to interpret, update or initiate sending of it. Thus, it needs to be a fixed registered type so it can be easily processed. It can also be exchanged in parallel to the Register_Datagram_* capsules as at that point the underlying HTTP connection is established and initial HTTP Datagram values will be known. A capsule will also travel all the way to the end. And an intermediary can initiate one in each direction for request paths. The only downside I see of this is that one is required per open stream when MTU changes occurs. Maybe someone have an idea of how to handle signaling when aggregating multiple endpoints streams onto one HTTP/3 connection.
To make it more efficient, rather than sending one MTU capsule per stream, an MTU capsule could list all streams it is applicable to. That way the number of MTU capsules would be no more than the number of end-to-end paths actually used.