Giter VIP home page Giter VIP logo

Comments (10)

valyala avatar valyala commented on July 25, 2024 4

Below are few recommendations regarding the code:

  • Use gofmt for formatting go source code.
  • Import github.com/pkg/profile instead of github.com/davecheney/profile - see http://dave.cheney.net/2014/10/22/simple-profiling-package-moved-updated for the reasoning.
  • There are no any benefits in using reuseport listener for single-process setup.
  • Prefer memory profile over CPU profile and analyze it with go tool pprof --alloc_objects.
  • The CPU profile provided by you has significant discrepancy comparing to my profile. Make sure you passed correct executable to go tool. Below is my CPU profile after removing the []byte->string conversion for etag (see below for details):
(pprof) top
10600ms of 17580ms total (60.30%)
Dropped 203 nodes (cum <= 87.90ms)
Showing top 10 nodes out of 116 (cum >= 270ms)
      flat  flat%   sum%        cum   cum%
    8160ms 46.42% 46.42%     8560ms 48.69%  syscall.Syscall
     410ms  2.33% 48.75%      660ms  3.75%  github.com/valyala/fasthttp.(*ResponseHeader).parseHeaders
     380ms  2.16% 50.91%      380ms  2.16%  runtime.epollwait
     290ms  1.65% 52.56%    16120ms 91.70%  github.com/valyala/fasthttp.(*Server).serveConn
     270ms  1.54% 54.10%      270ms  1.54%  runtime.memmove
     230ms  1.31% 55.40%      230ms  1.31%  runtime/internal/atomic.Cas
     220ms  1.25% 56.66%     2880ms 16.38%  net.(*netFD).Read
     220ms  1.25% 57.91%      280ms  1.59%  runtime.deferreturn
     210ms  1.19% 59.10%      210ms  1.19%  runtime.indexbytebody
     210ms  1.19% 60.30%      270ms  1.54%  runtime.netpollblock

This profile shows that more than 46% of all the time is spent in system calls. peek command shows that two syscalls were used - read and write:

(pprof) peek syscall.Syscall
15.46s of 17.58s total (87.94%)
Dropped 203 nodes (cum <= 0.09s)
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                             6.50s 77.20% |   syscall.write
                                             1.92s 22.80% |   syscall.read
     8.16s 46.42% 46.42%      8.56s 48.69%                | syscall.Syscall
                                             0.31s 77.50% |   runtime.entersyscall
                                             0.09s 22.50% |   runtime.exitsyscall
----------------------------------------------------------+-------------

It looks like there are no significant bottlenecks in the code. It could be optimized further by pipelining buffered requests to the server in order to minimize the number of read and write syscalls. Currently fasthttp client doesn't provide requests' pipelining, though it is in the TODO. So the only option at the moment is to implement it yourself on top of Request and Response objects.

  • 40% CPU time in runtime.memclr in your CPU profile may indicate that you proxy quite big responses. Currently fasthttp client isn't optimized for big responses, since it reads the whole response body in memory before passing it to the caller. The better solution is to stream big responses directly to the client.
  • The following code may lead to unnecessary memory allocation and copy during []byte -> string conversion:
  etag := string(ctx.Response.Header.Peek("Etag"))
  ctx.Response.Header.Del("Etag")
  ctx.Response.Header.Set("ETag", etag)

So it would be better to rewrite it in zero-alloc fashion:

        h.SetBytesV("ETag", h.Peek("Etag"))
        h.Del("Etag")
  • Make sure you send requests to the proxy from a dedicated set of machines. If you run load tests on the same machine where the proxy is located, your results will be skewed, since load tests may eat significant share of CPU time.
  • Make sure you have enough network bandwidth for the proxy. It would be better to have two distinct physical network interfaces on the proxy machine - the first one is for incoming requests to the proxy and the second one is for outgoing requests to the server. If you have only a single network interface on the proxy, results may be skewed, since proxy usually doubles load on the network, so the network may become a bottleneck.
  • Proxy isn't free. It always eats CPU and network resources.

The final code I profiled above:

package main

import (
        "flag"
        "github.com/pkg/profile"
        "github.com/valyala/fasthttp"
        "log"
        "time"
)

var (
        addr = flag.String("addr", ":10000", "TCP address to listen to")
        c    = &fasthttp.HostClient{
                Addr:            "127.0.0.1:80",
                ReadTimeout:     30 * time.Second,
                WriteTimeout:    30 * time.Second,
                ReadBufferSize:  64 * 1024,
                WriteBufferSize: 64 * 1024,
        }
)

func main() {
        flag.Parse()
        defer profile.Start(profile.CPUProfile).Stop()

        s := &fasthttp.Server{
                Handler: requestHandler,
                DisableHeaderNamesNormalizing: true,
        }
        if err := s.ListenAndServe(*addr); err != nil {
                log.Fatalf("Error in ListenAndServe: %s", err)
        }
}

func requestHandler(ctx *fasthttp.RequestCtx) {
        err := c.Do(&ctx.Request, &ctx.Response)
        if err != nil {
                log.Printf("Error: %s", err)
        }
        h := &ctx.Response.Header
        h.SetBytesV("ETag", h.Peek("Etag"))
        h.Del("Etag")
}

from fasthttp.

valyala avatar valyala commented on July 25, 2024

@djannot , FYI, I fixed the problem in fasthttp, which could reduce its' throughput when working with big bodies in request and/or response.

from fasthttp.

valyala avatar valyala commented on July 25, 2024

Try verifying proxy throughput now

from fasthttp.

djannot avatar djannot commented on July 25, 2024

Thanks. I'll check it and let you know

from fasthttp.

djannot avatar djannot commented on July 25, 2024

I've checked and only get a slight improvement.
I'm trying to build a reverse proxy and I'll have to handle requests with both small and large body.
Do you plan to implement pipelining soon ?

from fasthttp.

valyala avatar valyala commented on July 25, 2024

Do you plan to implement pipelining soon ?

I have no near-term plans regarding requests' pipelining. Actually I tried implementing it in our internal project. But results weren't very good, because of the following problems:

  • Certain servers don't support pipelined requests.
  • Pipelined requests usually have higher response times because of head of line blocking. So they must be used with caution if response latency is in priority.

from fasthttp.

valyala avatar valyala commented on July 25, 2024

@djannot , I'd recommend starting with nginx or haproxy and measuring their throughput in proxy mode for your case. Since both apps are highly optimized at the lowest level possible, it is unlikely fasthttp will beat them without requests' pipelining. Moreover, haproxy may skip requests' and responses' parsing and just proxy http connections to upstream server. The results collected from these apps will show the maximum throughput possible in your setup. Then compare these results to fasthttp.

While haproxy and nginx usually outperform fasthttp in proxy mode, fasthttp allows implementing arbitrary custom logic in Go. This is much easier comparing to customizing low-level C inside event loops and state machines present in nginx and haproxy.

from fasthttp.

valyala avatar valyala commented on July 25, 2024

Closing this issue. Feel free opening new one if throughput problems related to fasthttp occur again.

from fasthttp.

valyala avatar valyala commented on July 25, 2024

@djannot , just FYI, fasthttp now supports pipelined requests with PipelineClient.

from fasthttp.

djannot avatar djannot commented on July 25, 2024

@valyala Awesome. Thanks

from fasthttp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.