Giter VIP home page Giter VIP logo

h2's People

Contributors

alexwlchan avatar annevk avatar apolcyn avatar berkerpeksag avatar brettcannon avatar chhsiao90 avatar claws avatar drpoggi avatar etataurov avatar felixonmars avatar fredthomsen avatar glyph avatar hanaasagi avatar hellysmile avatar jwilk avatar kazimir-malevich avatar kracekumar avatar kriechi avatar lukasa avatar lurst avatar mhils avatar mlvnd avatar nateprewitt avatar pgjones avatar requires avatar sam-bristow avatar sigmavirus24 avatar standy66 avatar sunu avatar toffer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

h2's Issues

Restrict the public API of H2Connection

Right now H2Connection has a few too many methods as part of its public API, including begin_new_stream and get_or_create_stream. These are really artifacts of the internal structure of the class, and so we should "privatise" them by adding a leading underscore.

Create website benchmarking popular HTTP/2 implementations on sample workloads.

Earlier tonight I ran h2load against our example Twisted server, serving its own source file, to compare it between CPython and PyPy, but also against nghttp2. This is because nghttp2 is a great comparison point: the closer we can get to its performance the better we're doing.

So let's have a website that compares them! Let's use something like codespeed to build a website that benchmarks all the HTTP/2 servers that I can find, in all their permutations. This would let us build up a corpus of results that we can aim to constantly improve over time. It'd also be really insightful, and probably would lead to some fun competition amongst different implementations.

@tatsuhiro-t @kazuho When I get this up and running I'll point you at it: you may find it fun! 😁

Validate settings.

H2Spec correctly flags that we don't really do any validation of settings values:

    6.5.2. Defined SETTINGS Parameters
      × SETTINGS_ENABLE_PUSH (0x2): Sends the value other than 0 or 1
        - The endpoint MUST respond with a connection error of type PROTOCOL_ERROR.
          Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
                    Connection close
            Actual: SETTINGS frame (Length: 0, Flags: 1)
      × SETTINGS_INITIAL_WINDOW_SIZE (0x4): Sends the value above the maximum flow control window size
        - The endpoint MUST respond with a connection error of type FLOW_CONTROL_ERROR.
          Expected: GOAWAY frame (ErrorCode: FLOW_CONTROL_ERROR)
                    Connection close
            Actual: SETTINGS frame (Length: 0, Flags: 1)
      × SETTINGS_MAX_FRAME_SIZE (0x5): Sends the value below the initial value
        - The endpoint MUST respond with a connection error of type PROTOCOL_ERROR.
          Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
                    Connection close
            Actual: SETTINGS frame (Length: 0, Flags: 1)
      × SETTINGS_MAX_FRAME_SIZE (0x5): Sends the value above the maximum allowed frame size
        - The endpoint MUST respond with a connection error of type PROTOCOL_ERROR.
          Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
                    Connection close
            Actual: SETTINGS frame (Length: 0, Flags: 1)

Add support for trailers.

Right now the state machine implementation doesn't handle trailers properly because it gets them confused with responses, but it's not really good about handling the logic. We should write some tests that enforce the trailers logic correctly and then make the necessary adjustments to the state machine function to handle trailers.

They also need a new event to signal them correctly.

WindowUpdates and Flow Control and Large Frames

Hi,

are there any plans to include flow control in hyper-h2?
As of now, the user has to keep track of the inbound_flow_control_window. Would it be possible to just generate a WindowUpdate for each incoming frame with the same increment, to keep the window always at the same size? Or are there any other viable solutions to include such functionality?

I guess dealing with outbound_flow_control_window is a bit trickier because hyper-h2 does not actually control the socket, but we could include window_updates in data_to_send interleaved with regular data frames? Splitting of too large data blogs into smaller frames (chunking) would also be a nice things to be included.

I'm happy to send PRs if we find a good approach!

Reject overlarge frames with RST_STREAM(FRAME_SIZE_ERROR), not GOAWAY(PROTOCOL_ERROR)

When receiving overlarge frames we currently tear the whole connection down as a PROTOCOL_ERROR. We should either do that but signal the real problem (FRAME_SIZE_ERROR) or, even better, send RST_STREAM instead, per Section 4.2 of RFC 7540.

Found by h2spec:

  4.2. Frame Size
    × Sends large size frame that exceeds the SETTINGS_MAX_FRAME_SIZE
      - The endpoint MUST send a FRAME_SIZE_ERROR error.
        Expected: GOAWAY frame (ErrorCode: FRAME_SIZE_ERROR)
                  RST_STREAM frame (ErrorCode: FRAME_SIZE_ERROR)
                  Connection close
          Actual: GOAWAY frame (Length: 8, Flags: 0, ErrorCode: PROTOCOL_ERROR)

Reject HEADERS frame received in HALF_CLOSED_REMOTE_ state.

Per section 5.1 of RFC 7540:

If an endpoint receives additional frames, other than WINDOW_UPDATE, PRIORITY, or RST_STREAM, for a stream that is in this state, it MUST respond with a stream error (Section 5.4.2) of type STREAM_CLOSED.

We should more rigorously enforce this requirement.

Found by h2spec:

  5.1. Stream States
    × half closed (remote): Sends a HEADERS frame
      - The endpoint MUST respond with a stream error (Section 5.4.2) of type STREAM_CLOSED.
        Expected: GOAWAY frame (ErrorCode: STREAM_CLOSED)
                  RST_STREAM frame (ErrorCode: STREAM_CLOSED)
                  Connection close
          Actual: GOAWAY frame (Length: 8, Flags: 0, ErrorCode: PROTOCOL_ERROR)

Keep better track of last stream ID

For sending GOAWAY frames we do an ad-hoc calculation of the last stream ID based on the active streams at the time. This is obviously a flawed calculation, and we should be able to do better.

Priority implementation.

Right now we don't handle priority at all. Of course, exactly what's done with priority information will be up to the full server implementation, but for now we at least need to be able to signal the priority information up the stack.ls

Goals:

  • Have events for priority changes.
  • Track priority information.
  • Have an API for constructing and returning the correct priority tree.
  • Correctly handle priority-carrying frames, both PRIORITY frames themselves and other frames that can carry priority information.

Server must reject even-numbered stream identifiers.

Found by running H2Spec against proposed Twisted web server:

    5.1.1. Stream Identifiers
      × Sends even-numbered stream identifier
        - The endpoint MUST respond with a connection error of type PROTOCOL_ERROR.
          Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
                    Connection close
            Actual: DATA frame (Length: 4541, Flags: 0)

Logging

It's really important that this state machine be able to log, but it's also really important that we don't break our rule about I/O. This means we can't just transparently use Python's standard logging module: on systems like gevent this will implicitly cause context switches, and on systems like Twisted everything goes very wrong.

The best approach is probably to allow injection of a logger that conforms to the rules of this module (must not do I/O, must not context switch), and push that problem outwards. This logger needs to use the same interface as the standard library's logging module, but otherwise everything else is fine.

This is tricky, but we can also defer it from 1.0.0 I think.

Fix leaking ValueError from hyperframe.

Hyperframe emits a ValueError when invalid values in the stream header are found. Hyper-h2 does tear the connection down when that happens, but it doesn't really handle it properly and lets the exception leak instead of raising a ProtocolError. We should fix that.

Found when using h2spec.

Example traceback:

      File "/Users/cory/Documents/Python/Twisted/twisted/internet/tcp.py", line 215, in _dataReceived
        rval = self.protocol.dataReceived(data)
      File "/Users/cory/Documents/Python/Twisted/twisted/protocols/tls.py", line 422, in dataReceived
        self._flushReceiveBIO()
      File "/Users/cory/Documents/Python/Twisted/twisted/protocols/tls.py", line 392, in _flushReceiveBIO
        ProtocolWrapper.dataReceived(self, bytes)
      File "/Users/cory/Documents/Python/Twisted/twisted/protocols/policies.py", line 120, in dataReceived
        self.wrappedProtocol.dataReceived(data)
      File "/Users/cory/Documents/Python/Twisted/twisted/web/http.py", line 2093, in dataReceived
        return self._obj.dataReceived(data)
      File "/Users/cory/Documents/Python/Twisted/twisted/web/http2.py", line 92, in dataReceived
        events = self.conn.receive_data(data)
      File "/Users/cory/Documents/Python/hyper-h2/h2/connection.py", line 700, in receive_data
        for frame in self.incoming_buffer:
      File "/Users/cory/Documents/Python/hyper-h2/h2/frame_buffer.py", line 53, in next
        f, length = Frame.parse_frame_header(self.data[:9])
      File "/Users/cory/Documents/Python/Twisted/build/lib/python2.7/site-packages/hyperframe/frame.py", line 86, in parse_frame_header
        frame = FRAMES[type](stream_id)
      File "/Users/cory/Documents/Python/Twisted/build/lib/python2.7/site-packages/hyperframe/frame.py", line 220, in __init__
        super(DataFrame, self).__init__(stream_id, **kwargs)
      File "/Users/cory/Documents/Python/Twisted/build/lib/python2.7/site-packages/hyperframe/frame.py", line 147, in __init__
        super(Padding, self).__init__(stream_id, **kwargs)
      File "/Users/cory/Documents/Python/Twisted/build/lib/python2.7/site-packages/hyperframe/frame.py", line 52, in __init__
        raise ValueError('Stream ID must be non-zero')
    exceptions.ValueError: Stream ID must be non-zero

Get next available stream_id to begin a new stream

Hi,

I want to create a new stream without knowing or generating a new stream_id myself.
The current problem is that:

  • begin_new_stream
  • get_or_create_stream
  • send_headers

all require a stream_id - which I don't know at this point.

I would expect hyper-h2 to provide something like: send_headers(headers, end_stream=True) (note there is no stream_id argument!) or simply stream_id = h2conn.get_next_stream_id() as well, as long as it is the next valid id available.

Reject DATA frames with invalid padding length.

Found when testing proposed Twisted HTTP/2 server.

  6.1. DATA
    ✓ Sends a DATA frame with 0x0 stream identifier
    ✓ Sends a DATA frame on the stream that is not opend
    × Sends a DATA frame with invalid pad length
      - The endpoint MUST treat this as a connection error of type PROTOCOL_ERROR.
        Expected: GOAWAY frame (ErrorCode: PROTOCOL_ERROR)
                  RST_STREAM frame (ErrorCode: PROTOCOL_ERROR)
                  Connection close
          Actual: DATA frame (Length: 4541, Flags: 0)

Correctly validate headers.

While the HPACK library we use is quite liberal (and that seems sensible), we should be a bit more restrictive in this library and do some header validation. The following things need fixing:

  • Reject header fields with uppercase characters in the name (PROTOCOL_ERROR) (§8.1.2).
  • Reject pseudo-header fields appearing after regular header fields (PROTOCOL_ERROR) (§8.1.2.1).
  • Reject any request containing the connection header field (PROTOCOL_ERROR) (§8.1.2.2).
  • Reject a TE header field containing any value other than trailers (PROTOCOL_ERROR) (§8.1.2.2).
  • Reject duplicate pseudo-header fields (PROTOCOL_ERROR) (§8.1.2.3).

Consider policing content-length.

This is an interesting idea: should we reject invalid content-lengths? RFC 7540 § 8.1.2.6 says so:

A request or response that includes a payload body can include a content-length header field. A request or response is also malformed if the value of a content-length header field does not equal the sum of the DATA frame payload lengths that form the body.

Malformed requests or responses that are detected MUST be treated as a stream error (Section 5.4.2) of type PROTOCOL_ERROR.

This suggests that we should be keeping track of this. However, I'm a bit nervous that tools like mitmproxy may want to let the invalid stream stand, despite RFC 7540 warning against this:

Intermediaries that process HTTP requests or responses (i.e., any intermediary not acting as a tunnel) MUST NOT forward a malformed request or response.

The other option would be to require that the policing of the content length be done by the tools that wrap hyper-h2.

@Kriechi @mhils thoughts?

Documentation improvements

Received via email:

  • According to this, Hyper-h2 is an HTTP/2 implementation, so does it make sense to say that it's for "writing fully-fledged HTTP/2 implementations".
  • At first reading, it's not at all clear to me what you mean by the "top" and "bottom" of an H2Connection objection.
  • It seems odd to me that things like "send headers" and "send data" are formatted as code.
  • The five points describing the loop sort of make sense -- but I'm not sure whether it describes anything beyond a normal event loop
  • Before 3.5, socket.listen() requires a backlog argument, so the code in your example doesn't work
  • You should make it clear that the reader should install hyper-h2 alongside installing hyper.
  • When you said "But what we didn’t see was hyper making a request, and hyper was clearly hanging, waiting for something. Why?" it's not clear what the relationship is between us not seeing hyper making a request, and hyper hanging.
  • I reckon it's worth making it clear right at the top that Hyper-h2 doesn’t do any I/O, and it'd be interesting to talk about why.

Complete local flow control handling.

Right now we handle flow control locally by emitting a WindowUpdated event. That's not really enough, because we take no action to prevent users from sending more data than they should, and we don't provide the tools to allow external callers to check flow control state themselves.

We need:

  • To enable users to query the flow control window for a given stream, which must take the connection-level flow control window into account.
  • To throw exceptions when users attempt to send more data than they can fit into the flow control window.

Correctly enforce flow control window size.

Right now we don't police that the flow control window of a stream or the connection avoids getting too high. If it does, we need to either tear down the connection or close the stream with a flow control error.

Prevent DoS attack by CONTINUATION frame.

The continuation frame buffer in the frame buffer class is currently unbounded. That represents a DoS vector. We need to bound the number of CONTINUATION frames we're prepared to receive.

Design Patterns Documentation

It would be interesting to see the documentation dive into suggested design patterns when using Hyper-h2. These are potentially quite educational in general, in addition to providing more insight into how Hyper-h2 works. See this comment for more discussion.

State machine emits too generic errors

Calling h2.reset_stream(...) on an already closed streams gives me an ProtocolError, but since there is a StreamClosedError in hyper-h2, I would have expected to get this.

Trace of error:

  File ".../hyper-h2/h2/connection.py", line 680, in reset_stream
    frames = stream.reset_stream(error_code)
  File ".../hyper-h2/h2/stream.py", line 730, in reset_stream
    self.state_machine.process_input(StreamInputs.SEND_RST_STREAM)
  File ".../hyper-h2/h2/stream.py", line 98, in process_input
    "Invalid input %s in state %s" % (input_, old_state)
ProtocolError: Invalid input StreamInputs.SEND_RST_STREAM in state StreamState.CLOSED

Is it possible to handle/check all errors for the correct exception class and emit the best fit?
Otherwise I need to catch multiple error classes, where a single one would have done the job.

Use h2load to find bottlenecks.

So, we know roughly how fast we go on CPython and PyPy:

Twisted server, CPython, 100 clients, 100000 requests: 289.65 seconds, 276.191 req/s, 1.49 MB/s. /cc @tatsuhiro_t

— Cory Benfield (@Lukasaoz) October 19, 2015
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Twisted server, PyPy, 100 clients, 100000 requests: 40.42 seconds, 2449.33 req/s, 13.2 MB/s. 10x faster than CPython. @tatsuhiro_t

— Cory Benfield (@Lukasaoz) October 19, 2015
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

We should take advantage of this tool to start profiling Hyper-h2 and find where our bottlenecks are. Then, we should aim to target them to improve our performance where possible.

Add a Curio example

Why not? @dabeaz has written a 3.5-only async library called curio. He claims it's "barely in a prerelease state", but why not give it a chance to show us what it can do by having an example for it. More examples can only be better!

Improve basic documentation.

A few notes from IRC:

  1. Consider adding a "Step 6: putting it all together" section before the final code example.
  2. Link to the examples in the repository (or maybe include them outright?)

Reject flow control increments of zero.

We need to reject these both when we receive them and when a user tries to emit them. They're an error of type PROTOCOL_ERROR, and in both cases we'll treat them as a connection error (we could nominally count the "on a stream" case of this as a stream error, but being a bit stern with misbehaving peers doesn't hurt too badly).

In all cases, we want to raise a ProtocolError.

Complete remote flow control handling.

We should enforce the flow control windows remotely. We need to keep track of what we believe the window is, and abort connections if the remote party misunderstands it.

Fully-functional settings logic

We have to keep track of two different kinds of settings: the settings that belong to the remote peer and the ones that belong to us. We must not violate the limits imposed by the remote peer, and we must enforce that the remote peer does not violate our limits.

To do:

  • Add an API for making changes to local settings and emitting the SETTINGS frame necessary for those changes.
  • Add an underlying data structure or interface that lets us tell the difference between the settings that the user requested and the ones that have been ACK'd by the remote peer (put another way, tell the difference between the settings we want and the ones the peer is currently meant to be obeying)
  • Add a SettingsChangeAcknowledged event to signal to the user when a settings change is ACKd.
  • Add enforcement for all settings.

Stream closed or not yet created?

Since get_stream_by_id got removed from the public API, I would like to get some way to check if a stream is closed, or does not yet exist, e.g. this distinction.

The problem is because (for memory cleanup reasons?) closed streams are deleted from h2_connection.streams - and this information is therefore not accessible anymore.

We incorrectly update our table size in _local_settings_acked

A discussion on the HTTPBis mailing list has made me aware of the fact that, strictly speaking, changes in the decoder table size should be notified in-band using HPACK, rather than out-of-band via the SETTINGS frame. For this reason, we should not update the decoder table size when we receive a local settings acknowledgement: HPACK should do it instead.

H2Connection never deletes old streams.

Even when a stream is closed, H2Connection keeps the object around indefinitely. In the short term this is useful because it allows us to handle late frame arrival, but in the long term this costs memory, and also hurts performance when we scan over the streams to check which ones are open, as we will need to scan over every stream that has ever been sent.

We should come up with a policy for expiring old streams.

Notify users when streams are subtly reset.

The first time a stream is reset due to the actions of the remote peer (e.g. invalid frames) and transitions to the CLOSED state, we need to fire an event to let them know. Ideally we'd do this only once, so that users don't have to keep track of too much state on their end.

Benchmarks

We want to take performance reasonably seriously, so we should have some regressible benchmarks. Twisted et. al. seem to be using codespeed with reasonable success, so we should consider doing the same.

Auto-ACK Settings

Hi,

In your docs it is stated:

In HTTP/2, settings changes need to be acknowledged. hyper-h2 does not
automatically acknowledge them, because it is possible that the caller may
not be happy with the changed setting (or would like to know about it).

Could you elaborate on why the caller "might not be happy" with the changes?
There is no negotiation of settings, either you ACK them, or you get a GoAway or something, as far as I understand it? Creating a RemoteSettingsChanged by itself is nice to have, but manually having to ACK them is IMHO not necessary, it would be cleaner to handle that the same way pings are auto-handled.

Or am I missing some important edge-case?
Thanks for the explanation.

P.S. I know it is only 2 LOCs, but still - if we could put these lines into hyper-h2 I think it is easier to use.

Extensibility framework

HTTP/2 is extendable: it's one of its great strengths. It would be really nice if H2 was relatively easily extensible. We should aim to start supporting pluggable HTTP/2 extensions.

What do we need to do this? Well, the answer is: what can extensions do?

  • They can add new frames. This means we need to be able to add:
    • new frame types that can be successfully parsed;
    • new transitions in the state machine, both connection and stream, that occur when a new frame is received;
    • specific frame callbacks, both connection and stream, that allow manipulating the connection and stream objects;
    • allow new frames to follow the state machine transitions and callbacks of a pre-existing frame;
  • They can add new state machine transitions in the absence of new frames, or adjust the existing state machine transitions;
  • They can add new settings. We need to be able to inject useful representations of settings, and allow callbacks for those settings if they're changed;
  • They can add new error codes.

This will also require work in hyperframe.

Rewrite threaded integration tests using coroutines.

Like all my best ideas, this one is actually @shazow's: why use threads and sockets when we can fake the whole lot out and use coroutines for the tests in test_interacting_stacks.py? From IRC:

[19:11:34] shazow:  if i had a ~month of funded work, i'd love to rewrite urllib3's testing framework
[19:11:37] shazow:  maybe collab with you on that
[19:12:18] shazow:  ideally release it as a standalone thing that you can use to test any client/server protocol thing
[19:12:27] lukasa:  Oh that'd be super cool.
[19:12:48] lukasa:  I think I want to rewrite this without an actual socket, instead providing a 'socket-like' that is actually a queue
[19:13:01] lukasa:  Because all this messing about with events is fundamentally about ensuring that I don't leave data in sockets
[19:13:09] lukasa:  Or get stuck waiting for data that will never come
[19:13:26] shazow:  yes, exactly
[19:13:34] shazow:  a mock socket would be really powerful
[19:13:37] shazow:  and mock async-ness
[19:13:48] lukasa:  Such a neat idea
[19:13:55] lukasa:  OH CRAP
[19:13:56] shazow:  (rather than threads)
[19:13:58] lukasa:  Why am I not doing this with coroutines?
[19:14:02] lukasa:  Alright
[19:14:02] shazow:  :)
[19:14:08] lukasa:  Tearing this up and doing new stuff

Failure to safely handle received GOAWAY frames.

When we receive a GOAWAY frame we have a tendency to choke on it, rather than correctly and cleanly transition into the closed state. We should fix that.

When the GOAWAY frame is received, we should:

  • transition to CLOSED
  • clear the outbound data buffer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.