python-hyper / h2 Goto Github PK
View Code? Open in Web Editor NEWHTTP/2 State-Machine based protocol implementation
Home Page: https://h2.readthedocs.io/en/stable
License: MIT License
HTTP/2 State-Machine based protocol implementation
Home Page: https://h2.readthedocs.io/en/stable
License: MIT License
Since get_stream_by_id
got removed from the public API, I would like to get some way to check if a stream is closed, or does not yet exist, e.g. this distinction.
The problem is because (for memory cleanup reasons?) closed streams are deleted from h2_connection.streams
- and this information is therefore not accessible anymore.
A discussion on the HTTPBis mailing list has made me aware of the fact that, strictly speaking, changes in the decoder table size should be notified in-band using HPACK, rather than out-of-band via the SETTINGS frame. For this reason, we should not update the decoder table size when we receive a local settings acknowledgement: HPACK should do it instead.
It would be interesting to see the documentation dive into suggested design patterns when using Hyper-h2. These are potentially quite educational in general, in addition to providing more insight into how Hyper-h2 works. See this comment for more discussion.
Hi,
I want to create a new stream without knowing or generating a new stream_id myself.
The current problem is that:
begin_new_stream
get_or_create_stream
send_headers
all require a stream_id
- which I don't know at this point.
I would expect hyper-h2 to provide something like: send_headers(headers, end_stream=True)
(note there is no stream_id
argument!) or simply stream_id = h2conn.get_next_stream_id()
as well, as long as it is the next valid id available.
HTTP/2 is extendable: it's one of its great strengths. It would be really nice if H2 was relatively easily extensible. We should aim to start supporting pluggable HTTP/2 extensions.
What do we need to do this? Well, the answer is: what can extensions do?
This will also require work in hyperframe.
Initially we were being very defensive about the possibility that the user-level function calls (e.g. send_headers
and friends) might want to return events. None of them do, and it seems really unlikely that any will. We should consider abandoning that pattern.
It's just inconvenient, and leaks a library that really should be an implementation detail.
As shown above.
The first time a stream is reset due to the actions of the remote peer (e.g. invalid frames) and transitions to the CLOSED state, we need to fire an event to let them know. Ideally we'd do this only once, so that users don't have to keep track of too much state on their end.
Some implementations now object quite strongly to receiving header names that are not all-lowercase. We should lowercase our header names before we pass them down to HPACK.
We have lots of examples of HTTP/2 servers, but none of clients. We should fix that!
Asyncio is the new hotness: why not have an implementation like the Twisted server but for asyncio?
Calling h2.reset_stream(...)
on an already closed streams gives me an ProtocolError
, but since there is a StreamClosedError
in hyper-h2, I would have expected to get this.
Trace of error:
File ".../hyper-h2/h2/connection.py", line 680, in reset_stream
frames = stream.reset_stream(error_code)
File ".../hyper-h2/h2/stream.py", line 730, in reset_stream
self.state_machine.process_input(StreamInputs.SEND_RST_STREAM)
File ".../hyper-h2/h2/stream.py", line 98, in process_input
"Invalid input %s in state %s" % (input_, old_state)
ProtocolError: Invalid input StreamInputs.SEND_RST_STREAM in state StreamState.CLOSED
Is it possible to handle/check all errors for the correct exception class and emit the best fit?
Otherwise I need to catch multiple error classes, where a single one would have done the job.
Found when testing proposed Twisted HTTP/2 server.
6.1. DATA
✓ Sends a DATA frame with 0x0 stream identifier
✓ Sends a DATA frame on the stream that is not opend
× Sends a DATA frame with invalid pad length
- The endpoint MUST treat this as a connection error of type PROTOCOL_ERROR.
Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
RST_STREAM frame (ErrorCode: PROTOCOL_ERROR)
Connection close
Actual: DATA frame (Length: 4541, Flags: 0)
Currently hyper-h2 doesn't really support continuation frames properly. It should.
While the HPACK library we use is quite liberal (and that seems sensible), we should be a bit more restrictive in this library and do some header validation. The following things need fixing:
trailers
(PROTOCOL_ERROR) (§8.1.2.2).So, we know roughly how fast we go on CPython and PyPy:
Twisted server, CPython, 100 clients, 100000 requests: 289.65 seconds, 276.191 req/s, 1.49 MB/s. /cc @tatsuhiro_t
— Cory Benfield (@Lukasaoz) October 19, 2015
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
Twisted server, PyPy, 100 clients, 100000 requests: 40.42 seconds, 2449.33 req/s, 13.2 MB/s. 10x faster than CPython. @tatsuhiro_t
— Cory Benfield (@Lukasaoz) October 19, 2015
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
We should take advantage of this tool to start profiling Hyper-h2 and find where our bottlenecks are. Then, we should aim to target them to improve our performance where possible.
Why leave Tornado out of the fun of example implementations?
Hi,
In your docs it is stated:
In HTTP/2, settings changes need to be acknowledged. hyper-h2 does not
automatically acknowledge them, because it is possible that the caller may
not be happy with the changed setting (or would like to know about it).
Could you elaborate on why the caller "might not be happy" with the changes?
There is no negotiation of settings, either you ACK them, or you get a GoAway or something, as far as I understand it? Creating a RemoteSettingsChanged
by itself is nice to have, but manually having to ACK them is IMHO not necessary, it would be cleaner to handle that the same way pings are auto-handled.
Or am I missing some important edge-case?
Thanks for the explanation.
P.S. I know it is only 2 LOCs, but still - if we could put these lines into hyper-h2
I think it is easier to use.
Sending GoAway is currently forbidden in the CLOSED state, but should be allowed.
We should consider whether hyperframe needs an update here to throw its own exception instead of using the struct module, but either way we should wrap it as a ProtocolError in hyper-h2.
We want to take performance reasonably seriously, so we should have some regressible benchmarks. Twisted et. al. seem to be using codespeed with reasonable success, so we should consider doing the same.
We have a Twisted example server, it would be great if we could have an eventlet example as well.
We should enforce the flow control windows remotely. We need to keep track of what we believe the window is, and abort connections if the remote party misunderstands it.
Even when a stream is closed, H2Connection
keeps the object around indefinitely. In the short term this is useful because it allows us to handle late frame arrival, but in the long term this costs memory, and also hurts performance when we scan over the streams to check which ones are open, as we will need to scan over every stream that has ever been sent.
We should come up with a policy for expiring old streams.
A few notes from IRC:
Right now we don't handle priority at all. Of course, exactly what's done with priority information will be up to the full server implementation, but for now we at least need to be able to signal the priority information up the stack.ls
Goals:
Earlier tonight I ran h2load against our example Twisted server, serving its own source file, to compare it between CPython and PyPy, but also against nghttp2. This is because nghttp2 is a great comparison point: the closer we can get to its performance the better we're doing.
So let's have a website that compares them! Let's use something like codespeed to build a website that benchmarks all the HTTP/2 servers that I can find, in all their permutations. This would let us build up a corpus of results that we can aim to constantly improve over time. It'd also be really insightful, and probably would lead to some fun competition amongst different implementations.
@tatsuhiro-t @kazuho When I get this up and running I'll point you at it: you may find it fun! 😁
Right now H2Connection has a few too many methods as part of its public API, including begin_new_stream
and get_or_create_stream
. These are really artifacts of the internal structure of the class, and so we should "privatise" them by adding a leading underscore.
Right now the state machine implementation doesn't handle trailers properly because it gets them confused with responses, but it's not really good about handling the logic. We should write some tests that enforce the trailers logic correctly and then make the necessary adjustments to the state machine function to handle trailers.
They also need a new event to signal them correctly.
Per section 5.1 of RFC 7540:
If an endpoint receives additional frames, other than WINDOW_UPDATE, PRIORITY, or RST_STREAM, for a stream that is in this state, it MUST respond with a stream error (Section 5.4.2) of type STREAM_CLOSED.
We should more rigorously enforce this requirement.
Found by h2spec:
5.1. Stream States
× half closed (remote): Sends a HEADERS frame
- The endpoint MUST respond with a stream error (Section 5.4.2) of type STREAM_CLOSED.
Expected: GOAWAY frame (ErrorCode: STREAM_CLOSED)
RST_STREAM frame (ErrorCode: STREAM_CLOSED)
Connection close
Actual: GOAWAY frame (Length: 8, Flags: 0, ErrorCode: PROTOCOL_ERROR)
Right now we don't police that the flow control window of a stream or the connection avoids getting too high. If it does, we need to either tear down the connection or close the stream with a flow control error.
This is an interesting idea: should we reject invalid content-lengths? RFC 7540 § 8.1.2.6 says so:
A request or response that includes a payload body can include a content-length header field. A request or response is also malformed if the value of a content-length header field does not equal the sum of the DATA frame payload lengths that form the body.
Malformed requests or responses that are detected MUST be treated as a stream error (Section 5.4.2) of type PROTOCOL_ERROR.
This suggests that we should be keeping track of this. However, I'm a bit nervous that tools like mitmproxy may want to let the invalid stream stand, despite RFC 7540 warning against this:
Intermediaries that process HTTP requests or responses (i.e., any intermediary not acting as a tunnel) MUST NOT forward a malformed request or response.
The other option would be to require that the policing of the content length be done by the tools that wrap hyper-h2.
We should make sure that we handle unknown frame types correctly, i.e. by ignoring them.
When we receive a GOAWAY frame we have a tendency to choke on it, rather than correctly and cleanly transition into the closed state. We should fix that.
When the GOAWAY frame is received, we should:
Hyperframe emits a ValueError when invalid values in the stream header are found. Hyper-h2 does tear the connection down when that happens, but it doesn't really handle it properly and lets the exception leak instead of raising a ProtocolError. We should fix that.
Found when using h2spec.
Example traceback:
File "/Users/cory/Documents/Python/Twisted/twisted/internet/tcp.py", line 215, in _dataReceived
rval = self.protocol.dataReceived(data)
File "/Users/cory/Documents/Python/Twisted/twisted/protocols/tls.py", line 422, in dataReceived
self._flushReceiveBIO()
File "/Users/cory/Documents/Python/Twisted/twisted/protocols/tls.py", line 392, in _flushReceiveBIO
ProtocolWrapper.dataReceived(self, bytes)
File "/Users/cory/Documents/Python/Twisted/twisted/protocols/policies.py", line 120, in dataReceived
self.wrappedProtocol.dataReceived(data)
File "/Users/cory/Documents/Python/Twisted/twisted/web/http.py", line 2093, in dataReceived
return self._obj.dataReceived(data)
File "/Users/cory/Documents/Python/Twisted/twisted/web/http2.py", line 92, in dataReceived
events = self.conn.receive_data(data)
File "/Users/cory/Documents/Python/hyper-h2/h2/connection.py", line 700, in receive_data
for frame in self.incoming_buffer:
File "/Users/cory/Documents/Python/hyper-h2/h2/frame_buffer.py", line 53, in next
f, length = Frame.parse_frame_header(self.data[:9])
File "/Users/cory/Documents/Python/Twisted/build/lib/python2.7/site-packages/hyperframe/frame.py", line 86, in parse_frame_header
frame = FRAMES[type](stream_id)
File "/Users/cory/Documents/Python/Twisted/build/lib/python2.7/site-packages/hyperframe/frame.py", line 220, in __init__
super(DataFrame, self).__init__(stream_id, **kwargs)
File "/Users/cory/Documents/Python/Twisted/build/lib/python2.7/site-packages/hyperframe/frame.py", line 147, in __init__
super(Padding, self).__init__(stream_id, **kwargs)
File "/Users/cory/Documents/Python/Twisted/build/lib/python2.7/site-packages/hyperframe/frame.py", line 52, in __init__
raise ValueError('Stream ID must be non-zero')
exceptions.ValueError: Stream ID must be non-zero
H2Spec correctly flags that we don't really do any validation of settings values:
6.5.2. Defined SETTINGS Parameters
× SETTINGS_ENABLE_PUSH (0x2): Sends the value other than 0 or 1
- The endpoint MUST respond with a connection error of type PROTOCOL_ERROR.
Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
Connection close
Actual: SETTINGS frame (Length: 0, Flags: 1)
× SETTINGS_INITIAL_WINDOW_SIZE (0x4): Sends the value above the maximum flow control window size
- The endpoint MUST respond with a connection error of type FLOW_CONTROL_ERROR.
Expected: GOAWAY frame (ErrorCode: FLOW_CONTROL_ERROR)
Connection close
Actual: SETTINGS frame (Length: 0, Flags: 1)
× SETTINGS_MAX_FRAME_SIZE (0x5): Sends the value below the initial value
- The endpoint MUST respond with a connection error of type PROTOCOL_ERROR.
Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
Connection close
Actual: SETTINGS frame (Length: 0, Flags: 1)
× SETTINGS_MAX_FRAME_SIZE (0x5): Sends the value above the maximum allowed frame size
- The endpoint MUST respond with a connection error of type PROTOCOL_ERROR.
Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
Connection close
Actual: SETTINGS frame (Length: 0, Flags: 1)
Found by running H2Spec against proposed Twisted web server:
5.1.1. Stream Identifiers
× Sends even-numbered stream identifier
- The endpoint MUST respond with a connection error of type PROTOCOL_ERROR.
Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
Connection close
Actual: DATA frame (Length: 4541, Flags: 0)
The continuation frame buffer in the frame buffer class is currently unbounded. That represents a DoS vector. We need to bound the number of CONTINUATION frames we're prepared to receive.
When receiving overlarge frames we currently tear the whole connection down as a PROTOCOL_ERROR. We should either do that but signal the real problem (FRAME_SIZE_ERROR) or, even better, send RST_STREAM instead, per Section 4.2 of RFC 7540.
Found by h2spec:
4.2. Frame Size
× Sends large size frame that exceeds the SETTINGS_MAX_FRAME_SIZE
- The endpoint MUST send a FRAME_SIZE_ERROR error.
Expected: GOAWAY frame (ErrorCode: FRAME_SIZE_ERROR)
RST_STREAM frame (ErrorCode: FRAME_SIZE_ERROR)
Connection close
Actual: GOAWAY frame (Length: 8, Flags: 0, ErrorCode: PROTOCOL_ERROR)
It's really important that this state machine be able to log, but it's also really important that we don't break our rule about I/O. This means we can't just transparently use Python's standard logging
module: on systems like gevent this will implicitly cause context switches, and on systems like Twisted everything goes very wrong.
The best approach is probably to allow injection of a logger that conforms to the rules of this module (must not do I/O, must not context switch), and push that problem outwards. This logger needs to use the same interface as the standard library's logging
module, but otherwise everything else is fine.
This is tricky, but we can also defer it from 1.0.0 I think.
For sending GOAWAY frames we do an ad-hoc calculation of the last stream ID based on the active streams at the time. This is obviously a flawed calculation, and we should be able to do better.
We need to reject these both when we receive them and when a user tries to emit them. They're an error of type PROTOCOL_ERROR, and in both cases we'll treat them as a connection error (we could nominally count the "on a stream" case of this as a stream error, but being a bit stern with misbehaving peers doesn't hurt too badly).
In all cases, we want to raise a ProtocolError.
Received via email:
We have to keep track of two different kinds of settings: the settings that belong to the remote peer and the ones that belong to us. We must not violate the limits imposed by the remote peer, and we must enforce that the remote peer does not violate our limits.
To do:
Hi,
are there any plans to include flow control in hyper-h2
?
As of now, the user has to keep track of the inbound_flow_control_window
. Would it be possible to just generate a WindowUpdate for each incoming frame with the same increment, to keep the window always at the same size? Or are there any other viable solutions to include such functionality?
I guess dealing with outbound_flow_control_window
is a bit trickier because hyper-h2
does not actually control the socket, but we could include window_updates in data_to_send
interleaved with regular data frames? Splitting of too large data blogs into smaller frames (chunking) would also be a nice things to be included.
I'm happy to send PRs if we find a good approach!
We clearly need some documentation. Let's start having some!
Like all my best ideas, this one is actually @shazow's: why use threads and sockets when we can fake the whole lot out and use coroutines for the tests in test_interacting_stacks.py
? From IRC:
[19:11:34] shazow: if i had a ~month of funded work, i'd love to rewrite urllib3's testing framework
[19:11:37] shazow: maybe collab with you on that
[19:12:18] shazow: ideally release it as a standalone thing that you can use to test any client/server protocol thing
[19:12:27] lukasa: Oh that'd be super cool.
[19:12:48] lukasa: I think I want to rewrite this without an actual socket, instead providing a 'socket-like' that is actually a queue
[19:13:01] lukasa: Because all this messing about with events is fundamentally about ensuring that I don't leave data in sockets
[19:13:09] lukasa: Or get stuck waiting for data that will never come
[19:13:26] shazow: yes, exactly
[19:13:34] shazow: a mock socket would be really powerful
[19:13:37] shazow: and mock async-ness
[19:13:48] lukasa: Such a neat idea
[19:13:55] lukasa: OH CRAP
[19:13:56] shazow: (rather than threads)
[19:13:58] lukasa: Why am I not doing this with coroutines?
[19:14:02] lukasa: Alright
[19:14:02] shazow: :)
[19:14:08] lukasa: Tearing this up and doing new stuff
Right now we handle flow control locally by emitting a WindowUpdated event. That's not really enough, because we take no action to prevent users from sending more data than they should, and we don't provide the tools to allow external callers to check flow control state themselves.
We need:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.