envoyproxy / nighthawk Goto Github PK
View Code? Open in Web Editor NEWL7 (HTTP/HTTPS/HTTP2/HTTP3) performance characterization tool
License: Apache License 2.0
L7 (HTTP/HTTPS/HTTP2/HTTP3) performance characterization tool
License: Apache License 2.0
Incorporate a full fledged url parser (e.g. Chromium URL libs) so we can correctly parse and validate those.
See if we can excite that behavior and address it. We may need to globally enforce rate limiting while doing this and introduce locks, so discussing the design of the rate limiter would be good.
Also see #38
Disable libevent threading support while/when we don’t need it.
Earlier experiments with this show a few percent gains in max. throughput and measurement-accuracy.
Currently Nighthawk supports a CLI which allows for quick and easy execution of single test runs.
But it would be really neat to have a service next to that, which would accept configuration (GRPC or CLI) and (re-)configure on the fly as requested.
Recently Envoy added an ASSERT
that sometimes fires when the integration-test server calls sleep()
from it's constructor before affected Nighthawk tests fork()
the integration test server.
The time-system enforces that sleep()
gets called on a single thread, but this isn't aware of forks.
The fork()
was done as a workaround because without it we get into a fight with Envoy's integration test server about who owns the Runtime
. This needs to be addressed in Nighthawk to deflake test-runs.
For reference, the offending code (take a deep breath):
Line 44 in 0783204
One idea is to implement a series of tests in python that start up release builds of envoy + another server (e.g. nginx) with the same configuration, and run a predefined series of nighthawk tests against them. Adding this may also deprecate client_test.cc
which does something hideous to fork the integration test server into its own process in the test framework while avoiding an assert (both NH and Envoy try to own the runtime loader singleton).
We either need to:
zlib
as a dependency in bazel
apt install zlib1g-dev
as a dependency in README.md
Add the capability to load-test the gRPC protocol.
We should add the delay as proto field, and synthesize a request header from that which the fault-filter will understand.
It would be great to synthesize request headers based on the test-server's own configuration to control the delays that Envoy's fault filter induces, instead of having to send a separate request header for that.
We might want to have a global rate limit, since different workers might be moving at different effective rates (e.g. due to hyper-threading imbalance).
Currently Nighthawk relies on both Envoy's statistical concepts, as well as implements its own.
These two have converged rapidly to do the same thing. It would be great to unify these two.
One key difference is that Nighthawk has HdrHistogram_c
, a feature which would be good to preserve.
One step in unifying Envoy's statistics and Nighthawks statistics would be to reuse https://github.com/envoyproxy/envoy/blob/master/api/envoy/admin/v2alpha/metrics.proto and not define its own version.
Envoy and NH header includes started overlapping after the transfer from envoy-perf/nighthawk to the root of this repo, resulting in potential conflicts. Tidy this up.
Running bazel/gen_compilation_database.sh
fails with:
oschaaf@burst:~/code/envoy-perf-vscode/nighthawk$ bazel/gen_compilation_database.sh
DEBUG: /home/oschaaf/code/envoy-perf-vscode/nighthawk/bazel-compilation-database-0.3.1/aspects.bzl:99:9: Rule with no sources: @com_google_protobuf//:cc_wkt_protos
ERROR: /home/oschaaf/.cache/bazel/_bazel_oschaaf/385722e931c3493bb3c210a3b1bab888/external/com_lyft_protoc_gen_validate/validate/BUILD:39:1: in //bazel-compilation-database-0.3.1:aspects.bzl%compilation_database_aspect aspect on cc_library rule @com_lyft_protoc_gen_validate//validate:cc_validate:
Traceback (most recent call last):
File "/home/oschaaf/.cache/bazel/_bazel_oschaaf/385722e931c3493bb3c210a3b1bab888/external/com_lyft_protoc_gen_validate/validate/BUILD", line 39
//bazel-compilation-database-0.3.1:aspects.bzl%compilation_database_aspect(...)
File "/home/oschaaf/code/envoy-perf-vscode/nighthawk/bazel-compilation-database-0.3.1/aspects.bzl", line 120, in _compilation_database_aspect_impl
target.cc
<target @com_lyft_protoc_gen_validate//validate:cc_validate> (rule 'cc_library') doesn't have provider 'cc'
ERROR: Analysis of aspect '//bazel-compilation-database-0.3.1:aspects.bzl%compilation_database_aspect of //api/client:benchmark_options_cc' failed; build aborted: Analysis of target '@com_lyft_protoc_gen_validate//validate:cc_validate' failed; build aborted
INFO: Elapsed time: 1.335s
INFO: 0 processes.
Some sleuthing pinpointed this this breaking when bazel 0.25
was released.
After reverting to 0.24
this works again. The docker image we use for clang-tidy
also
has bazel 0.24
, so that still works.
Figure out max throughput under set latency. A variable might be degree of concurrency.
Find the points to sample to draw the QPS vs latency curve with minimal sample points based on examining gradients between existing sampled points
Currently we rely on tclap and fmt::format in c++ to interface with the benchmarking libs. Alternatively, we could use python or some such as a specialized tool to implement the front-end (e.g. cli, http server).
Re-use and apply Envoy's configuration for tuning all available pool settings and limits
Currently workers in Nighthawk spend two seconds in a spin/yield loop, polling the clock for assessing time-to-start. This should be configurable, and the default can probably be much shorter.
There seems to be some inaccuracy in code coverage measurement, GCOV_EXCLUDE_XXX is ignored and it is some code in headers is always being flagged as not run (inlined code?).
Envoy is anticipated to switch to native coverage as well in the future, at which point it makes sense to revisit this.
Currently the test server will apply per-request configuration specified in per request-header to the process-level configuration, by performing a proto-level Merge()
.
This doesn't always work well: it's not possible to override the server configuration with type-specific defaults. For example, response-size (int, default=0) cannot be overridden to 0 when the server-level configuration is non-zero.
A bool-valued field called clear
could be helpful here, to allow the client to indicate it wants to fully specify the configuration (and not inherit from what the server has configured).
Today, header -I
paths are setup so we have the following situation:
#include "common/api/api_impl.h"
#include "common/common/cleanup.h"
#include "common/common/thread_impl.h"
#include "common/event/dispatcher_impl.h"
#include "common/event/real_time_system.h"
#include "common/filesystem/filesystem_impl.h"
#include "common/frequency.h"
#include "common/network/utility.h"
#include "common/runtime/runtime_impl.h"
#include "common/thread_local/thread_local_impl.h"
#include "common/uri_impl.h"
#include "common/utility.h"
Some of these headers, e.g. real_time_system.h
, come from Envoy's tree, effectively @envoy//source/common/event/real_time_system.h
, others come from NH, e.g. //source/common:uri_impl.h
.
Ideally we have a cleaner way to visually disambiguate this in the header block.
Snapshot the envoy/nh stats periodically from the workers for plotting them through time. What happens upon shutdown of the pool is interesting as well, so knowing what the stats look like pre and post shutdown is interesting too.
Currently the output lists histograms in seconds, which is hard to read.
Defaulting to milliseconds
would be better.
Copied from envoyproxy/envoy-perf#32, this issue tracks high priority items and technical debt. This needs to be split out, but for now copy over so we can close this over at envoy-perf
.
using ::testing::XXX
in tests.BenchmarkClientTest
to get a reusable integration test base.client_test.cc
which does something hideous to fork the integration test server into its own process in the test framework.Explicit control of TLS ciphers and session-reuse makes it easier when it comes to comparing server to server performance. Additionally, it would be nice to be able to test just the overhead of setting up these connections (and not perform any requests) and/or track times of specific milestones during the connection/tls setup process.
Update the image we use in CI to whatever is being used on Envoy's master branch.
After doing so, see if we can get rid a hack we did to force-override the linked used during the build process in ASAN/TSAN runs, as there have been changes upstream.
Hack in Nighthawk that would be great to get rid of: https://github.com/envoyproxy/nighthawk/blob/master/ci/do_ci.sh#L40
See envoyproxy/envoy#6314 (comment)
Currently Nighthawk mostly uses a single http/2 connection to issue requests. This may lead to hotspotting processes on the benchmark target. The expectation is that this will be fixed upstream in Envoy.
Currently Nighthawk is capable of doing closed-loop testing, which means that when configured resource limits are met (e.g. max connections, max streams), no new requests will be issued, even when that means not reaching the requested request pacing configuration (this wil show up in the output as a time-spend-blocked histogram).
In real life, clients will not wait like that, and supporting open-loop testing will help measuring latencies under these circumstances.
An important part of this feature is that load generation should auto-terminate upon detecting a certain amount of in-flight requests, and maybe when it detects certain resource shortages (like running out of file descriptors).
Currently workers use a yield/spin-loop to determine when it is time for a request to be initiated. Add an option, that upon enabling, makes the client use the dispatcher to sleep()
as an alternative.
For example, it would be nice to be able to separately track latencies for connection-setup and tls-handshake stages.
Creating a stream decoder pool helps reducing allocations in the fast path, as well as gives us an easy way to check for leaked decoders upon quiescence.
clang-tidy-8
provides more checks, which is great. And perhaps more importantly, seems be more robust (at least on my workstation).
It would be awesome to have support for QUIC eventually. It is anticipated that once this feature is landed in Envoy, adding it here will be low hanging fruit.
Allow control of request attributes for requests send by Nighthawk
:
Proactively prefetching connections that are lost while running a benchmark test would be a nice enhancement on top of the initial prefetching
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.