Comments (10)
Below are few recommendations regarding the code:
- Use gofmt for formatting go source code.
- Import
github.com/pkg/profile
instead ofgithub.com/davecheney/profile
- see http://dave.cheney.net/2014/10/22/simple-profiling-package-moved-updated for the reasoning. - There are no any benefits in using
reuseport
listener for single-process setup. - Prefer memory profile over CPU profile and analyze it with
go tool pprof --alloc_objects
. - The CPU profile provided by you has significant discrepancy comparing to my profile. Make sure you passed correct executable to
go tool
. Below is my CPU profile after removing the[]byte
->string
conversion foretag
(see below for details):
(pprof) top
10600ms of 17580ms total (60.30%)
Dropped 203 nodes (cum <= 87.90ms)
Showing top 10 nodes out of 116 (cum >= 270ms)
flat flat% sum% cum cum%
8160ms 46.42% 46.42% 8560ms 48.69% syscall.Syscall
410ms 2.33% 48.75% 660ms 3.75% github.com/valyala/fasthttp.(*ResponseHeader).parseHeaders
380ms 2.16% 50.91% 380ms 2.16% runtime.epollwait
290ms 1.65% 52.56% 16120ms 91.70% github.com/valyala/fasthttp.(*Server).serveConn
270ms 1.54% 54.10% 270ms 1.54% runtime.memmove
230ms 1.31% 55.40% 230ms 1.31% runtime/internal/atomic.Cas
220ms 1.25% 56.66% 2880ms 16.38% net.(*netFD).Read
220ms 1.25% 57.91% 280ms 1.59% runtime.deferreturn
210ms 1.19% 59.10% 210ms 1.19% runtime.indexbytebody
210ms 1.19% 60.30% 270ms 1.54% runtime.netpollblock
This profile shows that more than 46% of all the time is spent in system calls. peek
command shows that two syscalls were used - read and write:
(pprof) peek syscall.Syscall
15.46s of 17.58s total (87.94%)
Dropped 203 nodes (cum <= 0.09s)
----------------------------------------------------------+-------------
flat flat% sum% cum cum% calls calls% + context
----------------------------------------------------------+-------------
6.50s 77.20% | syscall.write
1.92s 22.80% | syscall.read
8.16s 46.42% 46.42% 8.56s 48.69% | syscall.Syscall
0.31s 77.50% | runtime.entersyscall
0.09s 22.50% | runtime.exitsyscall
----------------------------------------------------------+-------------
It looks like there are no significant bottlenecks in the code. It could be optimized further by pipelining buffered requests to the server in order to minimize the number of read
and write
syscalls. Currently fasthttp client doesn't provide requests' pipelining, though it is in the TODO. So the only option at the moment is to implement it yourself on top of Request
and Response
objects.
- 40% CPU time in
runtime.memclr
in your CPU profile may indicate that you proxy quite big responses. Currently fasthttp client isn't optimized for big responses, since it reads the whole response body in memory before passing it to the caller. The better solution is to stream big responses directly to the client. - The following code may lead to unnecessary memory allocation and copy during
[]byte
->string
conversion:
etag := string(ctx.Response.Header.Peek("Etag"))
ctx.Response.Header.Del("Etag")
ctx.Response.Header.Set("ETag", etag)
So it would be better to rewrite it in zero-alloc fashion:
h.SetBytesV("ETag", h.Peek("Etag"))
h.Del("Etag")
- Make sure you send requests to the proxy from a dedicated set of machines. If you run load tests on the same machine where the proxy is located, your results will be skewed, since load tests may eat significant share of CPU time.
- Make sure you have enough network bandwidth for the proxy. It would be better to have two distinct physical network interfaces on the proxy machine - the first one is for incoming requests to the proxy and the second one is for outgoing requests to the server. If you have only a single network interface on the proxy, results may be skewed, since proxy usually doubles load on the network, so the network may become a bottleneck.
- Proxy isn't free. It always eats CPU and network resources.
The final code I profiled above:
package main
import (
"flag"
"github.com/pkg/profile"
"github.com/valyala/fasthttp"
"log"
"time"
)
var (
addr = flag.String("addr", ":10000", "TCP address to listen to")
c = &fasthttp.HostClient{
Addr: "127.0.0.1:80",
ReadTimeout: 30 * time.Second,
WriteTimeout: 30 * time.Second,
ReadBufferSize: 64 * 1024,
WriteBufferSize: 64 * 1024,
}
)
func main() {
flag.Parse()
defer profile.Start(profile.CPUProfile).Stop()
s := &fasthttp.Server{
Handler: requestHandler,
DisableHeaderNamesNormalizing: true,
}
if err := s.ListenAndServe(*addr); err != nil {
log.Fatalf("Error in ListenAndServe: %s", err)
}
}
func requestHandler(ctx *fasthttp.RequestCtx) {
err := c.Do(&ctx.Request, &ctx.Response)
if err != nil {
log.Printf("Error: %s", err)
}
h := &ctx.Response.Header
h.SetBytesV("ETag", h.Peek("Etag"))
h.Del("Etag")
}
from fasthttp.
@djannot , FYI, I fixed the problem in fasthttp, which could reduce its' throughput when working with big bodies in request and/or response.
from fasthttp.
Try verifying proxy throughput now
from fasthttp.
Thanks. I'll check it and let you know
from fasthttp.
I've checked and only get a slight improvement.
I'm trying to build a reverse proxy and I'll have to handle requests with both small and large body.
Do you plan to implement pipelining soon ?
from fasthttp.
Do you plan to implement pipelining soon ?
I have no near-term plans regarding requests' pipelining. Actually I tried implementing it in our internal project. But results weren't very good, because of the following problems:
- Certain servers don't support pipelined requests.
- Pipelined requests usually have higher response times because of head of line blocking. So they must be used with caution if response latency is in priority.
from fasthttp.
@djannot , I'd recommend starting with nginx or haproxy and measuring their throughput in proxy mode for your case. Since both apps are highly optimized at the lowest level possible, it is unlikely fasthttp
will beat them without requests' pipelining. Moreover, haproxy
may skip requests' and responses' parsing and just proxy http connections to upstream server. The results collected from these apps will show the maximum throughput possible in your setup. Then compare these results to fasthttp.
While haproxy
and nginx
usually outperform fasthttp
in proxy mode, fasthttp
allows implementing arbitrary custom logic in Go
. This is much easier comparing to customizing low-level C
inside event loops and state machines present in nginx
and haproxy
.
from fasthttp.
Closing this issue. Feel free opening new one if throughput problems related to fasthttp occur again.
from fasthttp.
@djannot , just FYI, fasthttp now supports pipelined requests with PipelineClient.
from fasthttp.
@valyala Awesome. Thanks
from fasthttp.
Related Issues (20)
- how can i use fasthttp bare minimum "net" library only without routing through the http overhead? just pure tcp rpc stuff will do HOT 1
- What is a correct way to create a proper fasthttp.RequestCtx in the unit tests? HOT 1
- Timeout stream response connection does not clear buffer data for re-use HOT 2
- Propagate request error to RetryIfFunc HOT 1
- Expose `timeout` field in Request HOT 1
- NewFastHTTPHandler does not set response status code. HOT 1
- NewFastHTTPHandler does not set response status code HOT 7
- Error when serving connection "x.x.x.x:443"<->"172.56.198.126:18271": EOF HOT 3
- http parsing of the request line - where does that happen? HOT 1
- gnet is going to have tls working on it very soon. was wondering if anyone has tried to port fasthttp to use gnet instead. HOT 1
- Feature request : update and show all examples for doing zero allocation when using fasthttp. 100% zero alloc for each req / resp. HOT 3
- It is not safe to read all stream body to memory without a max size limit. HOT 4
- serving a compress enabled public folder without +w permissions results in a 404 HOT 1
- no free connections available to host HOT 3
- when setting MaxConnsPerIP to value greater than zero, the TLSConnectionState( ) is null on a TLS connection returned from the worker pool. HOT 11
- invalid memory address or nil pointer dereference for firstByteReader'c HOT 2
- peeking nested query args HOT 1
- Incorrect Tag for version 1.54.0 HOT 1
- PFlag/Cobra compatibility in prefork HOT 1
- `\r` is improperly permitted in header names. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fasthttp.