quicwg / ops-drafts Goto Github PK

View Code? Open in Web Editor NEW

21.0 22.0 21.0 2.36 MB

Applicability and Manageability Statements

Home Page: https://quicwg.org/

Makefile 100.00%

quic ietf standards ops manageability applicability

ops-drafts's People

Contributors

Stargazers

Watchers

ops-drafts's Issues

Connection migration strategies

I think the applicability draft may benefit from exploring connection migration strategies as described in:

https://tools.ietf.org/html/draft-paulo-quic-migration-00#section-2

Provide reference to respective section in transport draft on "Packetization and latency"

Provide guidance on use of PING

The transport draft says:

"The PING frame can be used to keep a connection alive when an application or application protocol wishes to prevent the connection from timing out. An application protocol SHOULD provide guidance about the conditions under which generating a PING is recommended. This guidance SHOULD indicate whether it is the client or the server that is expected to send the PING. Having both endpoints send PING frames without coordination can produce an excessive number of packets and poor performance."

Probably we should state this in the applicability statement as well...

Connection IDs and ICMP Error messages (for PMTU discovery)

Any endpoint that identifies connections using a Connection ID requires that Connection ID in packets, including ICMP Error packets. Hence, that endpoint will not only request the peer to send Connection IDs but will also include Connection IDs on outgoing packets (in case they result in an ICMP Error packet). Such ICMP Error packets are required for PMTU discovery, among other things.

Such Connection IDs will be sent even if the peer gave its permission to omit Connection IDs using Transport Parameter omit_connection_id. See quicwg/base-drafts#953 for more.

Describe how and why a server might choose connection IDs

Moving quicwg/base-drafts#514 to the ops repo.

Provide further text on how to construct a connection ID in manageability statement

Of course update text on connection ID first when PR on symmetric IDs is landed.

Handshake Illustration

The manageability document should illustrate all the various packet sequences in a handshake, preferably with pretty pictures.

Maybe provide guidance on use of advertised idle timeout

This is related to quicwg/base-drafts#2602 and depends on the outcome there.

Improve and generalize text on DoS detection and mitigation

Current practices in detection and mitigation of Distributed Denial of Service (DDoS) attacks generally involve passive measurement using network flow data {{?RFC7011}}, classification of traffic into "good" (productive) and "bad" (DoS) flows, and filtering of these bad flows in a "scrubbing" environment

This is not how it works. A few examples:

Server port number selection

The applicability statement should provide / point to guidelines for application protocol port selection for QUIC. See the related issue on the base drafts.

DDOS mitigation section should reference LB draft for details on retry mechanism

As noted in discussion on #94.

Reference load balancer draft for retry service and give recommendations for address validation token

... in the applicability statement? I mean one example mechanism maybe...

Use of multiple quic connection on the same port/5-tuple

Write section to provide guidance; maybe in the same section than guidance in port selection in general

Setup CI

Someone with ownership rights needs to hit the button on circle (not travis).

GQUIC -> QUIC migration

From @martinduke in quicwg/base-drafts#1006:

We're pretty close to settling on a wire image, I think. It would be useful for the transport draft (and eventually the RFC) to have an appendix covering issues with simultaneous support of GQUIC versions and QUIC v1.

I believe all the entities actually trying to support GQUIC are heavily involved in the working group. However, I imagine there are quite a lot of middleboxes out there doing some basic ID/classification (if not more) on GQUIC today, and will need some guide on how to simultaneously handle two packet header formats, etc.

If people are opposed to an appendix, I suppose a short-lived internet draft would also get the job done. In any case, I'd like to see a placeholder sooner rather than later.

Do we want to provide any advise on use of coalesced packets?

Related to quicwg/base-drafts#2308 as a comment came up in the discussion that the transport draft does not explain why anybody would use coalesced packets. Do we want to elaborate some use cases and provide more guidance?

Operational guidelines for reducing timing-linkability across CID migration

Linkability across CID changes is in the common case so trivial that protocol features to defeat linkability through other means risk being useless.

"Find CID y where delay < d between last packet for CID x and first packet for CID y on 2-tuple {a,p}, given {x,a,p}" is an operation which requires zero additional state and a simple O(kn log n) search for any large on-path (passive surveillance) device that's halfway smart about keeping per-flow state -- i.e., it's basically a free operation, and its utility is baked into the physics of CIDs -- after all, this is what CIDs are for.

The ease of this analysis can only be mitigated by increasing the size of the anonymity set: ensuring that for any given delay window d, a minimum number of CIDs x transition on any given {a,p}. This seems like good operational advice for servers with enough traffic to build such anonymity sets (should they have interest in preventing client linkability, of course) -- small servers are probably out of luck though.

Applicability should call this out as a problem, manageability should suggest a solution space.

update -manageability for BKK spin bit consensus

Review and update text in -manageability to reflect BKK spin bit consensus:

single spin bit is in the protocol spec now
endpoints may opt out (discuss why; this is for any reason, not just proxies as suggested in #55),
endpoints should probabilistically out out to provide cover for endpoints opting out

Explain emulation of partial reliability with one message per stream?

Considerations for NATs

The manageability draft should talk a little bit about NATS for QUIC.

The laziest thing to do is just continue to have NAT mapping be by 4-tuple, as it is today. This is actually fine.

However, someone might attempt to be clever and use the CID as well. For example, client A and client B could share the same serverside address and port and the NAT can initially distinguish between the two because they have different CIDs. This appears to save address space, but doesn't actually work because the server could switch to a new client connection ID. In this case, the NAT won't be able to determine which client to route to.

Update to invariants -07

Applicability introduction is outdated

This is editorial, but the whole introduction is written as if the quic drafts are some early concept kicking around in the working group. This will not age well, and the intro should be worded as if it were already a published RFC.

The intro still talks about "HTTP/2 over QUIC" when it's now HTTP/3.

Add text on error handling for application-visible errors

Document how a server cluster might be upgraded and rolled back

Moving this from quicwg/base-drafts#504 to here.

Write subsection "Session resumption versus keep-alive"

Add text to section on tradeoffs for using 0-RTT session resumption rather then sending keep-alives. This was discussed at the interim in Paris, and it turns out that this might not be a great idea; tracking the open editor's note in the doc.

Streams as message-framing

It's useful to talk about the use of streams as messages, since they're designed to be lightweight. This is a feature of QUIC, and it's worth articulating.

recommendations for using ALPN in QUIC mappings

Both documents provide limited guidance on use of ports, however, there was an open question if the applicability statement should maybe say more and e.g. also talk about ALPN.

Note about small packets

Middlebox vendors (and others as well) used to TCP want to assume that the smallest packet might be ack packets. This is not true in QUIC, and manageabiity draft may want to say something about it.

Further discussion on connection identification at stateful firewalls

See Toma's email.

Point out how troubleshooting gets easier

The section on "Passive network performance measurement and troubleshooting" is probably a bit too pessimistic.

It's probably worth pointing out that one part of network troubleshooting -- "find out which box to blame" -- is substantially easier. Thanks to integrity protection, most middleboxes, can drop, delay/rate limit, and corrupt, and that's about it. These are fairly easy behaviors to diagnose.

I've had some conversations with customers afraid of not having their TCP headers anymore, and I believe this fear is largely misguided.

If people agree this belongs in the draft, I can file a PR.

How to handle rejected 0-RTT data

Probably needs some API for the app to indicate if 0-RTT should be retransmitted or withdrawn after a rejection.

Maybe say something about in-order delivery (in streams)

See also section on Streams in transport draft:
"Stream offsets allow for the octets on a stream to be placed in order. An endpoint MUST be capable of delivering data received on a stream in order. Implementations MAY choose to offer the ability to deliver data out of order. There is no means of ensuring ordering between octets on different streams."

unbreak circleci

Applicability 0RTT section

While TFO has some 0RTT-like properties, it does not fully replace it, not only because the SYN or option gets scrubbed, but also because you're limited to one packet payload.

Also, this ought to mention TLS stacks that offer replay protection.

I would think a modern QUIC interface would offer a signal to separate 0RTT from 1RTT data, so that both senders and receivers can manage their application data appropriately.

Indicate which parts of the draft are v1 specific

It might be the case that this is true for the whole draft and we should just state this at the beginning but better to double-check!

CID and length

Note in section about CID that as the server chooses the CIDs, it could use some own scheme to embed the length of the CID in the CID itself, instead remembering it, e.g. use the first few bits to indicate which of the sizes that are support was chosen.

Write subsection "Initial Handshake and PMTUD"

Section 2.5 needs content.

Advice on network state idle timeouts

For discussion: should the manageability draft advise that idle timeouts should be longer for flows inferred/detected to be QUIC flows?

I think the answer here is "it depends": for NATs, yes, state at the edge is not all that scarce anymore. For firewalls? Maybe not so much. Are there other applications to consider here, and is there any generalized advice this document could give?

If so, is there any generalized advice for how long the timeout should be?

update -manageability for changes to packet headers

Review the text of the -manageability draft to ensure that changes to headers in the latest version of -transport are correctly tracked.

verify all references to QUIC documents

The ops drafts often reference the core drafts by section number; make sure these section number references are correct for the WGLC drafts.

Document the challenges of creating a 'transparent' QUIC MITM

Feel free to close this if it's out of scope, but creating a proxy that attempts to passively observe QUIC is much more complex and potentially problematic than TLS over TCP.

For example, all major OS's have mature TCP implementations, whereas a MITM would have to pull in its own QUIC implementation. If it does so, it would be important that's it's both conformant to IETF specifications and have good performance. This is the full L7 termination case, which is possible, but a lot of work, and we should heavily encourage anyone doing this to use an existing("good") implementation.

A potential MITM option is to attempt to terminate the handshake, but then only observe ApplicationData packets. However, this could go wrong in many ways, so I believe it should be strongly discouraged. QUIC is designed as an end-to-end protocol and efforts to change that are likely to end very poorly.

For example, if the length of the CID changes, then the offset of the header changes and a middlebox who doesn't track all CIDs both peers may be unable to decrypt packets. I'm sure there are more gotchas, particularly with the introduction of extensions. It may also be worth pointing out that QUIC frames are not TLV, so any frames(ie: extensions) that are not understood cause all other frames to be unparseable.

Thanks to @DavidSchinazi for the specific examples of ways this could fail.

Mention accidental invariants and the calculus of blocking

From Spencer on the list:

Before I read this ^ paragraph, I was planning to ask you whether "blocking QUIC" merited a mention in https://tools.ietf.org/html/draft-ietf-quic-manageability-06, because we are in the awkward position with HTTP/3 that operators will face less blowback blocking HTTP/3-QUIC and forcing a fallback to HTTP/2-TCP than they will face blocking almost any non-HTTP/3 QUIC protocol (with less clearly defined fallbacks) (see: MASQUE).
ISTM that at a minimum, the "accidental invariants" might be mentioned in https://tools.ietf.org/html/draft-ietf-quic-manageability-06, which targets operators.

Guidance for port number use

Copying from quicwg/base-drafts#495

There's no requirement that servers use a particular UDP port for HTTP/QUIC or any other QUIC traffic. Using Alt-Svc, the server is able to pick and advertise any port number and a compliant client will handle it just fine. That's already the case, and isn't part of this issue. #424 updates the HTTP draft to highlight this, increasing the odds that implementations will test and support that case.

This issue is to track that it might actually be desirable from a privacy standpoint for servers to pick arbitrary port numbers and perhaps even rotate them periodically (though that requires coordination with their Alt-Svc advertisements and lifetimes, which could be challenging) in order to make it more difficult for a network observer to classify traffic (and therefore more difficult to ossify).

On the other hand, as we're wrestling with in each of these privacy/manageability debates, removing easy network visibility into the likely protocol by using arbitrary port numbers means that middleboxes will probably resort to other means of attempting to identify protocols and potentially doing it badly, which could result in even worse ossification. (E.g. indexing into the TLS ClientHello to find the ALPN list, then panicking on a different handshake protocol.)

There's further discussion on the original issue, but this belongs in the ops drafts, not the protocol drafts.

Provide interface for fixed-size packets for analysis resistance?

quic could offer an appliaction interface to specify the use of fixed length packets as input for an algorithm that handles padding to enable concealment of application characteristics based on packet length

Document how SNI from QUIC handshakes is presented

SNI is currently mentioned in the Manageability document, but you'd have to read a substantial chunk of the transport and tls drafts to understand how to extract it.

I'd suggest a short section that describes how it's presented on the wire with references to the appropriate sections on transport and tls.

Currently, the section on client Initial says: "The Client Hello datagram exposes version number, source and destination connection IDs, and information in the TLS Client Hello message, including any TLS Server Name Indication (SNI) present, in the clear."

I think saying the SNI is in the clear is a bit misleading to a casual observer.

Discuss counter-measures to version ossification

Track changes in server stateless reject

Ensure that the description of redirection via server stateless reject accurately reflects the present state of the transport draft.

Double check use of normative language

Not clear if these documents should use normative language at all. The applicability statement also sometimes uses normative language to recommend the implementation of certain interfaces; I don't think this is the right document for that but as we also don't have another document, we should decide if we want to do that or not!

Server-Generated Connection ID should include a cryptographic MAC

The guidance recommend that server generated connection IDs should include a crytographic MAC when being used for load balancing. This sounds like a good suggestion in principal, but the Connection ID is 64bits in size, I'm not a cryptographer but I don't see how this is possible - if it is possible then it perhaps would be good to include guidance within the documents?

My logic is as follows (hopefully I've made a mistake/misunderstood the concept somewhere)

As a CDN operator I'd probably want to segment the connection ID into 4 spaces:
POP id|machine id|unique connection|MAC

You could combine your POP id and machine ID into a single space but it is likely to be beneficial to have two tiers, the POP routes the request to the right general location and the machine id then identifies a machine within that space. Other ways to split this are also possible.

If 12 bits are used for POP id that would give 4096 distinct locations. Trends to deploy deeper into networks (particularly within mobile/cell base stations) doesn't make this number totally infeasible .
If 8 bits are then used for a machine id within a POP that gives 256 possible machines.

This means 20 bits might be used for load balancing purposes. This gives a bit (lot?) of room for flexibility and it is unlikely all million combinations are used but it is probably the right order of magnitude for a large widely distributed CDN (even if we drop down to 16 bits it doesn't significantly impact the analysis below).

The actual unique connection pool needs to be large enough to avoid repeating too quickly - even if you have a lot of clients taking advantage of 0RTT to aggressively close/re-open connections. There also presumably needs to be some randomization of the ids so that they aren't created sequentially. I've assumed 24 bits (which is only 16 million unique ids per server).

This only leaves 20 bits left for the MAC, that is about 1 million possible values. As I've said I'm not a cryptographer but to me this looks small enough that a brute force attack guided by some valid inputs might be feasible even on relatively low power machines?

For that matter isn't a simple replay attack feasible? The MAC can't be based upon the full 5-tuple as we need to support connection mobility. You could embed a timestamp into the id somewhere to protect against this but that would use up even more bits, otherwise you could slowly gather up valid (but presumably closed) connection ids and then aggressively replay them all in a short time period.

Write subsection "Thinking in zero RTT"

Jana noted at the interim in Paris that we should point out
that applications need to be re-thought slightly to get the benefits of zero
RTT. Add a little text to discuss this and why it's worth the effort,
before we go straight into the dragons.

quicwg / ops-drafts Goto Github PK

ops-drafts's People

Contributors

Stargazers

Watchers

Forkers

ops-drafts's Issues

Recommend Projects

Recommend Topics

Recommend Org