Question GoPool seems to perform poorly from the benchmarks I ran<

Could you please try the same test as <a href="https://github.com/bytedance/gopkg/blob

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Got these results <div class="snippet-clipboard-content notranslate position-relat

it also saves a lot of memory <a class="u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

GoPool's performance in comparison to other goroutine pool implementations about gopkg HOT 12 OPEN

bytedance commented on September 26, 2024 1

GoPool's performance in comparison to other goroutine pool implementations

from gopkg.

Comments (12)

PureWhiteWu commented on September 26, 2024 1

Could you please try the same test as https://github.com/bytedance/gopkg/blob/develop/util/gopool/pool_test.go？

from gopkg.

alphadose commented on September 26, 2024 1

@lesismal

But as I know, lock-free is only suitable for a few scenarios such as a queue's push and pop operations. It can't be used to optimize a process with multiple steps which have beyond the atomic operations' ability.

I went through the golang's scheduling code and there is room for improvement by switching to a lock-free stack in some places. I got the idea itself from a gophercon talk delivered by a member of golang core team and I tested it out, got some good results. The major performance improvement by switching to a lock free stack would be scheduling all operations entirely in user-space and avoid lock()/unlock() system call context switches.

Since it's unsafe, the benchmark result should not be used to compare with std other goroutine pools, it doesn't make sense and would make misleading.

Agreed, I will update the readme so that people are aware that its experimental and unsafe.

from gopkg.

alphadose commented on September 26, 2024

Got these results

2022/07/05 16:20:06.801184 logger.go:190: [Error] GOPOOL: panic in pool: test: test: goroutine 2025 [running]:
runtime/debug.Stack()
	/opt/homebrew/Cellar/go/1.18/libexec/src/runtime/debug/stack.go:24 +0x68
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1.1.1()
	/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:64 +0x80
panic({0x100601700, 0x100623510})
	/opt/homebrew/Cellar/go/1.18/libexec/src/runtime/panic.go:838 +0x204
command-line-arguments.testPanicFunc()
	/Users/alphadose/assignment/go-threadpool-benchmarks/gopool_test.go:27 +0x30
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1.1(0x1400010c2b0?, 0x10060c8a0?)
	/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:69 +0x5c
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1()
	/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:70 +0x1a8
created by github.com/bytedance/gopkg/util/gopool.(*worker).run
	/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:41 +0x64
%!(EXTRA []interface {}=[])
goos: darwin
goarch: arm64
BenchmarkPool-8      	     367	   3841601 ns/op	  179468 B/op	   10227 allocs/op
BenchmarkGo-8        	     100	  11113182 ns/op	  167427 B/op	   10016 allocs/op
BenchmarkItogami-8   	     369	   3109478 ns/op	  174749 B/op	   10073 allocs/op
PASS
ok  	command-line-arguments	4.771s

https://github.com/alphadose/go-threadpool-benchmarks/blob/master/gopool_test.go

from gopkg.

PureWhiteWu commented on September 26, 2024

Thanks for your tests!
For the first test, I think that's not the correct way to use goroutine pools, it's better to use the native goroutines instead.
For the second test, seems that Itogami outperforms gopool in cpu-bound tasks. I took a quick glance at the code, and found that it makes good use of the runtime's internal funcs, that's great!
I'll investigate if we could optimize the performance of gopool without the usage of runtime internal funcs.

from gopkg.

alphadose commented on September 26, 2024

The first test was to mimic 1000s of HTTP requests each of which typically last within milliseconds, but yes there should be some computation and variance between requests for relation to an actual use case

it's better to use the native goroutines instead.

if we remove the limit on itogami then it essentially becomes an infinite goroutine pool but it also saves a lot of memory (due to recycling via the stack) as compared to using native goroutines, hence I think its beneficial to use itogami in this case too with some modifications

from gopkg.

lesismal commented on September 26, 2024

I think it's not enough to test cpu cost handler only, in most of the real scenarios, blocking is the much more usual operation than pure cpu cost, eg: http handler, rpc handler, and there is a write syscall at least.

I tried with a short sleep and get a much worse performance from gopool than std:

func testFunc() {
	// DoCopyStack(0, 0)
	time.Sleep(time.Microsecond * 10)
}

result:

BenchmarkPool-8   	       1	13349818686 ns/op	  727840 B/op	   20125 allocs/op
BenchmarkGo-8     	      82	  13735426 ns/op	  968846 B/op	   20042 allocs/op

This much worse result may be because the default pool size is only 8 in my env(runtime.GOMAXPROCS(0)==8).
I change the pool size to 1000 or 10000 and gopool runs much better but still worse than std:

# pool size: 1000
BenchmarkPool-8   	      10	 109975368 ns/op	  466008 B/op	   12162 allocs/op
BenchmarkGo-8     	      85	  13984710 ns/op	  970441 B/op	   20082 allocs/op

# pool size: 10000
BenchmarkPool-8   	      68	  17622188 ns/op	 1421435 B/op	   30038 allocs/op
BenchmarkGo-8     	      86	  14227710 ns/op	  971272 B/op	   20074 allocs/op

from gopkg.

lesismal commented on September 26, 2024

it also saves a lot of memory

@alphadose
I think the memory cost for each Go operation can be ignored since it runs 10000 times in each b.N loop, it's a much smaller memory cost for each Go operation:
https://github.com/bytedance/gopkg/blob/develop/util/gopool/pool_test.go#L24

But indeed we need to limit the pool size to control the total memory cost of the process.

from gopkg.

lesismal commented on September 26, 2024

I took a quick glance at the code, and found that it makes good use of the runtime's internal funcs, that's great!

@alphadose Cool!

from gopkg.

lesismal commented on September 26, 2024

@alphadose
I took a look at your test and found that it's unfair because itogami's pool size is 5e4 but gopool size is much smaller. For itogami's pool it's similar to using std goroutine but your implementation performs a little better.
https://github.com/alphadose/go-threadpool-benchmarks/blob/master/gopool_test.go#L89

And the same point with gopool's default cpu cost handler, if I limit the pool size and change to a testFunc with a short sleep, itogami doesn't perform better than std in my env:

# pool size: 1000
BenchmarkPool-8      	      10	 110218422 ns/op	  461439 B/op	   12138 allocs/op
BenchmarkGo-8        	      84	  13746197 ns/op	  969096 B/op	   20052 allocs/op
BenchmarkItogami-8   	      10	 104207091 ns/op	  200136 B/op	   10413 allocs/op

It's only faster than std when the PoolSize >= benchmarkTimes or the blocking time is shorter than b.N calling loop cost, else it's worse than std.

if we remove the limit on itogami then it essentially becomes an infinite goroutine pool but it also saves a lot of memory (due to recycling via the stack) as compared to using native goroutines, hence I think its beneficial to use itogami in this case too with some modifications

It's no use if we don't limit the pool size and I think users prefer to use std goroutine rather than use another implementation like std.

from gopkg.

lesismal commented on September 26, 2024

@alphadose

I met a SIGSEGV from Itogami when running the benchmark:

goos: linux
goarch: amd64
pkg: github.com/alphadose/go-threadpool-benchmarks
cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
BenchmarkGo-8         	     328	   3554240 ns/op	  972669 B/op	   20124 allocs/op
BenchmarkPool-8       	     206	   6353972 ns/op	  590759 B/op	   19165 allocs/op
BenchmarkItogami-8    	fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x330 pc=0x4372a7]

......

from gopkg.

alphadose commented on September 26, 2024

@lesismal I made itogami as a proof of concept for a better golang scheduler, and as it uses golang runtime internals it is unsafe because of some edge cases here and there. Hence, its also unsafe to use for real use cases.

If you are interested here is the corresponding issue in golang core golang/go#53398
^ the benchmarking in the above case is based on a no-limit version of itogami which can be viewed in this branch https://github.com/alphadose/itogami/tree/golang-scheduler

from gopkg.

lesismal commented on September 26, 2024

If you are interested here is the corresponding issue in golang core golang/go#53398 ^ the benchmarking in the above case is based on a no-limit version of itogami which can be viewed in this branch https://github.com/alphadose/itogami/tree/golang-scheduler

I took a quick look at 53398, It's good trying to use lock-free to optimize performance.
But as I know, lock-free is only suitable for a few scenarios such as a queue's push and pop operations. It can't be used to optimize a process with multiple steps which have beyond the atomic operations' ability.

@lesismal I made itogami as a proof of concept for a better golang scheduler, and as it uses golang runtime internals it is unsafe because of some edge cases here and there. Hence, its also unsafe to use for real use cases.

Since it's unsafe, the benchmark result should not be used to compare with std other goroutine pools, it doesn't make sense and would make misleading.

About http performance, I agree with you that fasthttp is really fast, but gnet's http benchmark is not a fully implementation of http protocol, details here, that also makes misleading. I used to opend an issue to advise not to use that benchmark result to claim gnet's performance but wasn't accepted.. But I would still be glad to see if someday the author deletes the http benchmark result, or if he fully supports the protocol.

I have another lib that does these tls/http/websocket things with goroutine num and memory usage under controlled: https://github.com/lesismal/nbio
There's also a sub-package that implements goroutine pools with limited size, it can't perform better than std but can limit pool size, and it seems the benchmark with short sleep performs a little better than I attached in the previous comments, if interested, welcome to try.

from gopkg.

GoPool's performance in comparison to other goroutine pool implementations about gopkg HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent