Giter VIP home page Giter VIP logo

Comments (13)

bojand avatar bojand commented on August 12, 2024 1
  • -n - The total number of requests to run.
  • -c - Number of requests to run concurrently. We basically have a worker for each one of these. So if n is 18000 and c is 50, each worker will do 18000 / 50 = 360 requests.
  • -q - Rate limit, in queries per second (QPS). This would limit each worker to roughly q requests per second. This essentially spaces out the request by adding a timeout to each worker before it attempts each request. So for example -q 100 would add a 10 ms wait timeout before each request attempt.
  • -x - Maximum duration of application to send requests with n setting respected. If duration is reached before n requests are completed, application stops and exits. This is basically to set an upper bound for a test run duration while n is still respected. The application stops if we hit n requests OR x time is reached, whichever comes first.
  • -z - Duration of application to send requests. When duration is reached, application stops and exits. If duration is specified, n is ignored. This is more of a setting where you want to just make as many requests as possible (given other settings) in a given amount of time.

So with -n 18000 -x 3m we would run 50 workers in parallel each doing 360 requests. Whatever event is reached earlier, 18000 requests or 3m, would trigger the end of the test. In your case the test takes 1.2 s since all 18000 requests have been performed and completed. If you want each worker to do roughly 100 requests / s use -q 100 argument. Note that this still would not result in RPS report of 100 req/s unless the server actually is performing in such a way. To get that in the report you want you would use -c 50 -n 18000 and your server would have to literally respond to each request in 500 ms. Just add a sleep to your test handler of 500 ms and run the test with -n 18000

Summary:
  Count:	18000
  Total:	181728.18 ms
  Slowest:	760.45 ms
  Fastest:	500.39 ms
  Average:	504.17 ms
  Requests/sec:	99.05

Response time histogram:
  500.386 [1]	|
  526.393 [17977]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  552.400 [6]	|
  578.406 [0]	|
  604.413 [0]	|
  630.419 [0]	|
  656.426 [0]	|
  682.432 [0]	|
  708.439 [0]	|
  734.446 [10]	|
  760.452 [6]	|

Latency distribution:
  10% in 501.33 ms
  25% in 502.18 ms
  50% in 503.89 ms
  75% in 505.43 ms
  90% in 506.59 ms
  95% in 507.37 ms
  99% in 508.98 ms
Status code distribution:
  [OK]	18000 responses

from ghz.

bojand avatar bojand commented on August 12, 2024

Hello,

Hmm really not sure I think we need more details as I am able to get it working fine. A quick simple server in Node.js:

const grpc = require('grpc')
const protoLoader = require('@grpc/proto-loader')

const PROTO_PATH = __dirname + '/score.proto'

const packageDefinition = protoLoader.loadSync(PROTO_PATH)

const scoreProto = grpc.loadPackageDefinition(packageDefinition).adapter

let score = 0

function getScore (call, fn) {
  console.log(call.request)
  score++
  fn(null, { score })
}

function main () {
  const server = new grpc.Server()
  server.addService(scoreProto.ScoreService.service, { getScore })
  server.bind('0.0.0.0:50051', grpc.ServerCredentials.createInsecure())
  server.start()
}

main()

and then the ghz call:

./ghz -insecure -proto ./score.proto -call adapter.ScoreService.GetScore -d '{"body":"test", "fields": {"key1":"test1", "key2":"test2"}}' localhost:50051

Summary:
  Count:	200
  Total:	48.51 ms
  Slowest:	23.22 ms
  Fastest:	3.32 ms
  Average:	11.13 ms
  Requests/sec:	4123.03

Response time histogram:
  3.324 [1]	|
  5.314 [3]	|∎
  7.303 [39]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  9.292 [82]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  11.281 [25]	|∎∎∎∎∎∎∎∎∎∎∎∎
  13.271 [1]	|
  15.260 [0]	|
  17.249 [14]	|∎∎∎∎∎∎∎
  19.238 [3]	|∎
  21.228 [0]	|
  23.217 [32]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎

Latency distribution:
  10% in 7.12 ms
  25% in 7.42 ms
  50% in 8.59 ms
  75% in 12.88 ms
  90% in 22.44 ms
  95% in 22.64 ms
  99% in 22.91 ms
Status code distribution:
  [OK]	200 responses

from ghz.

indusbull avatar indusbull commented on August 12, 2024

Thanks for quick response and confirming that command works.

My test client code is able to communicate to same gprc server instance. I would clean up my server code and share.

My server is hosted via java-grpc and I am running ghz from windows machine if it makes any difference.

from ghz.

indusbull avatar indusbull commented on August 12, 2024

I can confirm that when I run ghz from linux box to connect to grpc endpoint, it works fine. But when I try from windows machine and hit same grpc endpoint I get above exception.

Separate Question - Also, does ghz create new connection for each request or utilizes persistent connection? Earlier I was measuring response time using custom client/server grpc interceptors and I was getting responses in 1-2ms when utilizing same persistent connection. But ghz reports avg response time of 20.32ms. Would be great if you can clarify this?

from ghz.

bojand avatar bojand commented on August 12, 2024

Hi, thanks for looking further into this. I had a suspicion that it was related to Windows, but unfortunately it's not a platform I have easy access to. Just to confirm... where are you seeing this error show up? On ghz side? Or server side?

As for the average, it is strictly a mathematical average of the the total response time of all successful requests / total successful requests. If you are testing services under load with concurrent requests and lots of requests over time... some requests may take longer than others depending on what your service is doing. Additionally strict mathematical averages can especially be susceptible to outliers / extreme values as well. So that may answer the question regarding the average value. If you suspect we are reporting something incorrectly please provide some more details and a reproducible example would be very helpful and I can dig deeper.

Just a note on a little bit of theory... The measurements, what's important and interpretation is very tied to context, so it's important to know what you care about. But in general with regards to latency measurements, normally average is not considered as important or reliable measurement as 95th or usually 99th percentile. This is especially true in microservice architecture where gRPC would likely be used, and where a single top-level request may fanout and touch lots of services. An extreme example that demonstrates the concept from the article:

Satisfying a search request can involve thousands of machines. The core idea is that small performance hiccups on a few machines causes higher overall latencies and the more machines the worse the tail latency. The statistics of the situation are highly unintuitive. Only 1% of requests will take over a second with a server that has a 1ms average response time and a one second 99th percentile latency. If a request has to access 100 servers, now 63% of all requests will take over a second.

So just something to keep in mind as you explore your metrics! :)

from ghz.

indusbull avatar indusbull commented on August 12, 2024

Hi, thanks for looking further into this. I had a suspicion that it was related to Windows, but unfortunately it's not a platform I have easy access to. Just to confirm... where are you seeing this error show up? On ghz side? Or server side?

It was on ghz side when I run it in command prompt.

My question was more towards if ghz opens separate connection per request ? One of the benefit of grpc is persistent connection because of reliance on http2. Now opening separate connection per request test makes sense for apps where you have millions of clients connecting (ie front-end or mobile apps). We are rather using grpc in microservice architecture where we expect connection to be persisted between two services hosts. If ghz opens separate connection per request as if not utilizing same persistent connection, it doesn't give clear picture in microservice architecture.

Thanks for sharing your thoughts on measurement metrics. I understand the diff between the avg vs the 95/99th percentile and not to rely on it as sole barometer. I just wrote a quick & dirty solution to give us some latency measurements before I found ghz.

That article seems interesting. I will read it later today. Thanks.

from ghz.

bojand avatar bojand commented on August 12, 2024

Hello, sorry I misunderstood and didn't address that specific point. We only open 1 connection that we establish at the start of the test run, and that single connection is used for all requests. This is the gRPC recommendation to the best of my knowledge (Edit: additional info). Hope that's adds more detail to how ghz works and performs the tests and reported results.

from ghz.

indusbull avatar indusbull commented on August 12, 2024

Thanks for clarification. This helps.

I will have to dig deeper. Overall I see response times much higher with ghz than my custom interceptor modeled after code in java-grpc-prometheus.

One more question :-)
I was running test to make 18000 requests over 3minute period. I tried with below options but the ghz returns immediately without spacing out requests. Anything wrong with options?

./ghz -insecure -proto ./score.proto -call adapter.ScoreService.GetScore -d '{"body":"test", "fields": {"key1":"test1", "key2":"test2"}}' -n 18000 -x 3m localhost:5300

I love the simplicity of using this tool. Thanks.

from ghz.

bojand avatar bojand commented on August 12, 2024

Hmm I am not sure why that would be behaving that way. Works fine for me against my test server from above (even adding a random 0-250 ms timeout to each response).

./ghz -insecure -proto score.proto -call adapter.ScoreService.GetScore -d '{"body":"test", "fields": {"key1":"test1", "key2":"test2"}}' -n 18000 -x 3m localhost:50051

Summary:
  Count:	18000
  Total:	48193.07 ms
  Slowest:	382.44 ms
  Fastest:	0.51 ms
  Average:	125.97 ms
  Requests/sec:	373.50

Response time histogram:
  0.513 [1]	|
  38.706 [2660]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  76.898 [2804]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  115.091 [2777]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  153.283 [2803]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  191.476 [2681]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  229.668 [2759]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  267.861 [1489]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  306.053 [11]	|
  344.246 [10]	|
  382.438 [5]	|

Latency distribution:
  10% in 27.25 ms
  25% in 63.47 ms
  50% in 125.20 ms
  75% in 188.13 ms
  90% in 225.51 ms
  95% in 238.03 ms
  99% in 248.69 ms
Status code distribution:
  [OK]	18000 responses

I just released 0.18.0 which does a slightly better job of reporting errors that happen when we try to perform the test requests. Perhaps give that a try and see if ghz prints any errors now.

With regards to discrepancy between your interceptor observations and ghz reporting... some more detail and context would be helpful. Are you running client requests concurrently? The default concurrency for ghz runs is 50:

-c  Number of requests to run concurrently. Total number of requests cannot
      be smaller than the concurrency level. Default is 50.

That is we have 50 workers running concurrently sending requests (18000 divided among the 50 workers) to the service. This would certainly add more load to the service than if you were just exercising the call one at a time for example.

from ghz.

indusbull avatar indusbull commented on August 12, 2024

I just tried 0.19.0 and it still shows same behavior while running on linux box. It makes 14588 reqs/sec as can be seen in below result.

./ghz -insecure -proto score.proto -call adapter.ScoreService.GetScore -d '{"body":"test", "fields": {"key1":"test1", "key2":"test2"}}' -n 18000 -x 3m localhost:5300


Summary:
  Count:        18000
  Total:        1233.88 ms
  Slowest:      149.34 ms
  Fastest:      0.33 ms
  Average:      3.24 ms
  Requests/sec: 14588.18

Response time histogram:
  0.328 [1]     |
  15.230 [17948]        |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  30.131 [1]    |
  45.033 [0]    |
  59.934 [0]    |
  74.835 [0]    |
  89.737 [0]    |
  104.638 [0]   |
  119.540 [0]   |
  134.441 [0]   |
  149.343 [50]  |

Latency distribution:
  10% in 1.36 ms
  25% in 1.81 ms
  50% in 2.49 ms
  75% in 3.56 ms
  90% in 4.67 ms
  95% in 5.69 ms
  99% in 8.45 ms
Status code distribution:
  [OK]  18000 responses

With regards to discrepancy between your interceptor observations and ghz reporting... some more detail and context would be helpful. Are you running client requests concurrently? The default concurrency for ghz runs is 50:

Actually, I had a for loop on single thread making requests every x ms interval. So there was no real concurrency. This tool will help achieve it if I can resolve above issue.

from ghz.

bojand avatar bojand commented on August 12, 2024

Ohh you're referring to the the "Requests/sec" measurement in the report? So the actual test performed 18000 requests using the parameters specified. The Requests/sec in the report is a theoretical RPS for your service based on the results. It literally just takes the total request count / total time of the run. So in this case: 18000 requests / 1.23388 s = 14588 requests / s . Depending on the arguments used (especially -c and -q option for example) it may make this measurement less meaningful. I will work to document the output a bit better.

from ghz.

indusbull avatar indusbull commented on August 12, 2024

Sorry if I am not stating my issue clearly or misunderstanding config options.

I am trying to make total 18000 requests over 3mins period (with default -c = 50 worker threads) I used -n 18000 -x 3m options and assumed that it would make 18000 over 3minutes ( appx 18000 reqs/180secs = 100 request/secs) but I am getting above results.

Do I need to use -c & -q options as well to achieve 100reqs/sec?

from ghz.

indusbull avatar indusbull commented on August 12, 2024

It completely makes sense now. I am able to run tests as per my requirement using combination of settings. Thank you for detailed response. Great nifty tool!!

from ghz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.