I have a very simple program: <div class="highlight highlight-source-go notranslat

Below are few recommendations regarding the code: Use <a href=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Do you plan to implement pipelining soon ? <p dir="auto

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Can't get high throughput about fasthttp HOT 10 CLOSED

valyala commented on July 25, 2024

Can't get high throughput

from fasthttp.

Comments (10)

valyala commented on July 25, 2024 4

Below are few recommendations regarding the code:

Use gofmt for formatting go source code.
Import github.com/pkg/profile instead of github.com/davecheney/profile - see http://dave.cheney.net/2014/10/22/simple-profiling-package-moved-updated for the reasoning.
There are no any benefits in using reuseport listener for single-process setup.
Prefer memory profile over CPU profile and analyze it with go tool pprof --alloc_objects.
The CPU profile provided by you has significant discrepancy comparing to my profile. Make sure you passed correct executable to go tool. Below is my CPU profile after removing the []byte->string conversion for etag (see below for details):

(pprof) top
10600ms of 17580ms total (60.30%)
Dropped 203 nodes (cum <= 87.90ms)
Showing top 10 nodes out of 116 (cum >= 270ms)
      flat  flat%   sum%        cum   cum%
    8160ms 46.42% 46.42%     8560ms 48.69%  syscall.Syscall
     410ms  2.33% 48.75%      660ms  3.75%  github.com/valyala/fasthttp.(*ResponseHeader).parseHeaders
     380ms  2.16% 50.91%      380ms  2.16%  runtime.epollwait
     290ms  1.65% 52.56%    16120ms 91.70%  github.com/valyala/fasthttp.(*Server).serveConn
     270ms  1.54% 54.10%      270ms  1.54%  runtime.memmove
     230ms  1.31% 55.40%      230ms  1.31%  runtime/internal/atomic.Cas
     220ms  1.25% 56.66%     2880ms 16.38%  net.(*netFD).Read
     220ms  1.25% 57.91%      280ms  1.59%  runtime.deferreturn
     210ms  1.19% 59.10%      210ms  1.19%  runtime.indexbytebody
     210ms  1.19% 60.30%      270ms  1.54%  runtime.netpollblock

This profile shows that more than 46% of all the time is spent in system calls. peek command shows that two syscalls were used - read and write:

(pprof) peek syscall.Syscall
15.46s of 17.58s total (87.94%)
Dropped 203 nodes (cum <= 0.09s)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                             6.50s 77.20% |   syscall.write
                                             1.92s 22.80% |   syscall.read
     8.16s 46.42% 46.42%      8.56s 48.69%                | syscall.Syscall
                                             0.31s 77.50% |   runtime.entersyscall
                                             0.09s 22.50% |   runtime.exitsyscall
----------------------------------------------------------+-------------

It looks like there are no significant bottlenecks in the code. It could be optimized further by pipelining buffered requests to the server in order to minimize the number of read and write syscalls. Currently fasthttp client doesn't provide requests' pipelining, though it is in the TODO. So the only option at the moment is to implement it yourself on top of Request and Response objects.

40% CPU time in runtime.memclr in your CPU profile may indicate that you proxy quite big responses. Currently fasthttp client isn't optimized for big responses, since it reads the whole response body in memory before passing it to the caller. The better solution is to stream big responses directly to the client.
The following code may lead to unnecessary memory allocation and copy during []byte -> string conversion:

  etag := string(ctx.Response.Header.Peek("Etag"))
  ctx.Response.Header.Del("Etag")
  ctx.Response.Header.Set("ETag", etag)

So it would be better to rewrite it in zero-alloc fashion:

        h.SetBytesV("ETag", h.Peek("Etag"))
        h.Del("Etag")

Make sure you send requests to the proxy from a dedicated set of machines. If you run load tests on the same machine where the proxy is located, your results will be skewed, since load tests may eat significant share of CPU time.
Make sure you have enough network bandwidth for the proxy. It would be better to have two distinct physical network interfaces on the proxy machine - the first one is for incoming requests to the proxy and the second one is for outgoing requests to the server. If you have only a single network interface on the proxy, results may be skewed, since proxy usually doubles load on the network, so the network may become a bottleneck.
Proxy isn't free. It always eats CPU and network resources.

The final code I profiled above:

package main

import (
        "flag"
        "github.com/pkg/profile"
        "github.com/valyala/fasthttp"
        "log"
        "time"
)

var (
        addr = flag.String("addr", ":10000", "TCP address to listen to")
        c    = &fasthttp.HostClient{
                Addr:            "127.0.0.1:80",
                ReadTimeout:     30 * time.Second,
                WriteTimeout:    30 * time.Second,
                ReadBufferSize:  64 * 1024,
                WriteBufferSize: 64 * 1024,
        }
)

func main() {
        flag.Parse()
        defer profile.Start(profile.CPUProfile).Stop()

        s := &fasthttp.Server{
                Handler: requestHandler,
                DisableHeaderNamesNormalizing: true,
        }
        if err := s.ListenAndServe(*addr); err != nil {
                log.Fatalf("Error in ListenAndServe: %s", err)
        }
}

func requestHandler(ctx *fasthttp.RequestCtx) {
        err := c.Do(&ctx.Request, &ctx.Response)
        if err != nil {
                log.Printf("Error: %s", err)
        }
        h := &ctx.Response.Header
        h.SetBytesV("ETag", h.Peek("Etag"))
        h.Del("Etag")
}

from fasthttp.

valyala commented on July 25, 2024

@djannot , FYI, I fixed the problem in fasthttp, which could reduce its' throughput when working with big bodies in request and/or response.

from fasthttp.

valyala commented on July 25, 2024

Try verifying proxy throughput now

from fasthttp.

djannot commented on July 25, 2024

Thanks. I'll check it and let you know

from fasthttp.

djannot commented on July 25, 2024

I've checked and only get a slight improvement.
I'm trying to build a reverse proxy and I'll have to handle requests with both small and large body.
Do you plan to implement pipelining soon ?

from fasthttp.

valyala commented on July 25, 2024

Do you plan to implement pipelining soon ?

I have no near-term plans regarding requests' pipelining. Actually I tried implementing it in our internal project. But results weren't very good, because of the following problems:

Certain servers don't support pipelined requests.
Pipelined requests usually have higher response times because of head of line blocking. So they must be used with caution if response latency is in priority.

from fasthttp.

valyala commented on July 25, 2024

@djannot , I'd recommend starting with nginx or haproxy and measuring their throughput in proxy mode for your case. Since both apps are highly optimized at the lowest level possible, it is unlikely fasthttp will beat them without requests' pipelining. Moreover, haproxy may skip requests' and responses' parsing and just proxy http connections to upstream server. The results collected from these apps will show the maximum throughput possible in your setup. Then compare these results to fasthttp.

While haproxy and nginx usually outperform fasthttp in proxy mode, fasthttp allows implementing arbitrary custom logic in Go. This is much easier comparing to customizing low-level C inside event loops and state machines present in nginx and haproxy.

from fasthttp.

valyala commented on July 25, 2024

Closing this issue. Feel free opening new one if throughput problems related to fasthttp occur again.

from fasthttp.

valyala commented on July 25, 2024

@djannot , just FYI, fasthttp now supports pipelined requests with PipelineClient.

from fasthttp.

djannot commented on July 25, 2024

@valyala Awesome. Thanks

from fasthttp.

Can't get high throughput about fasthttp HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent