eapache / go-resiliency Goto Github PK
View Code? Open in Web Editor NEWResiliency patterns for golang
Home Page: https://godoc.org/github.com/eapache/go-resiliency
License: MIT License
Resiliency patterns for golang
Home Page: https://godoc.org/github.com/eapache/go-resiliency
License: MIT License
Missing a number of new fixes, including RunCtx :)
if Run()
returns ErrTimeOut
, the work function would block on sending to channel result
.
Greetings,
I'm a user of the retry package, mostly because it's a dependency of the shopify/sarama package - so it's already available in my vendor directory.
I'm interested to know if you'd be in favor of supporting the context package for retrier -- I imagine the interface to be
(r *Retrier) RunCtx(context.Context, func() error) error
This would allow the retrier to exit early if the provided context is cancelled prior to the retry wait period.
If the context is cancelled, it could return the error of the last failure, or ctx.Err()
- I'm unsure which would be preferred.
I would imagine the retier struct would need to be extended to include a time.Timer that would be reset upon each call to Run/RunCtx, and each iteration of attempts.
Exiting early would me a simple matter of having a function such as
// sleep will return true if the time channel has a message
// sent on it before the context expires
//
// accepts a <-chan time.Time for more deterministic testing
// no need to rely on the runtime to fire the timer
func sleep(ctx context.Context, t <-chan time.Time) bool {
select {
case <-t:
return true
case <-ctx.Done():
return false
}
}
the call to time.Sleep() could then be replaced with
if !sleep(ctx, r.timer.C) {
// return previous error?
return ctx.Err()
}
The implementation of Run
could then become
(r *Retrier) Run(fn func() error) error {
return r.RunCtx(context.Background(), fn)
}
This approach is seen in the standard library, and is a nice way to maintain backwards compatibility.
Please let me know your thoughts - if this is something you'd approve of, I could likely find the time to do the implementation.
Hello
we discover a possible race condition on batcher since version go-resiliency 1.4 and 1.5
Here is a minimal example with a test that trigger this issue
I tested with latest go (1.22) and go-resiliency 1.5
source code is available here https://github.com/peczenyj/example-race
package: https://github.com/peczenyj/example-race/blob/main/foo/foo.go
test: https://github.com/peczenyj/example-race/blob/main/foo/foo_test.go
$ go test -race ./foo/...
==================
WARNING: DATA RACE
Write at 0x00c00009eec8 by goroutine 7:
github.com/eapache/go-resiliency/batcher.(*Batcher).submitWork()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/vendor/github.com/eapache/go-resiliency/batcher/batcher.go:74 +0xed
github.com/eapache/go-resiliency/batcher.(*Batcher).Run()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/vendor/github.com/eapache/go-resiliency/batcher/batcher.go:55 +0x20e
github.com/peczenyj/example-race/foo.(*Foo).Get()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/foo/foo.go:36 +0x2e7
github.com/peczenyj/example-race/foo_test.TestBatcher()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/foo/foo_test.go:23 +0x2e8
testing.tRunner()
/usr/local/go1.22.0/src/testing/testing.go:1689 +0x21e
testing.(*T).Run.gowrap1()
/usr/local/go1.22.0/src/testing/testing.go:1742 +0x44
Previous read at 0x00c00009eec8 by goroutine 8:
github.com/eapache/go-resiliency/batcher.(*Batcher).batch()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/vendor/github.com/eapache/go-resiliency/batcher/batcher.go:100 +0x3d8
github.com/eapache/go-resiliency/batcher.(*Batcher).submitWork.gowrap2()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/vendor/github.com/eapache/go-resiliency/batcher/batcher.go:76 +0x33
Goroutine 7 (running) created at:
testing.(*T).Run()
/usr/local/go1.22.0/src/testing/testing.go:1742 +0x825
testing.runTests.func1()
/usr/local/go1.22.0/src/testing/testing.go:2161 +0x85
testing.tRunner()
/usr/local/go1.22.0/src/testing/testing.go:1689 +0x21e
testing.runTests()
/usr/local/go1.22.0/src/testing/testing.go:2159 +0x8be
testing.(*M).Run()
/usr/local/go1.22.0/src/testing/testing.go:2027 +0xf17
main.main()
_testmain.go:49 +0x2bd
Goroutine 8 (finished) created at:
github.com/eapache/go-resiliency/batcher.(*Batcher).submitWork()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/vendor/github.com/eapache/go-resiliency/batcher/batcher.go:76 +0x1ef
github.com/eapache/go-resiliency/batcher.(*Batcher).Run()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/vendor/github.com/eapache/go-resiliency/batcher/batcher.go:55 +0x20e
github.com/peczenyj/example-race/foo.(*Foo).Get()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/foo/foo.go:36 +0x283
github.com/peczenyj/example-race/foo_test.TestBatcher()
/home/weborama.office/tiago/work/go/src/github.com/peczenyj/example-race/foo/foo_test.go:22 +0x284
testing.tRunner()
/usr/local/go1.22.0/src/testing/testing.go:1689 +0x21e
testing.(*T).Run.gowrap1()
/usr/local/go1.22.0/src/testing/testing.go:1742 +0x44
==================
--- FAIL: TestBatcher (2.01s)
testing.go:1398: race detected during execution of test
FAIL
FAIL github.com/peczenyj/example-race/foo 2.017s
FAIL
while using go-resiliency 1.3 there is no race
$ go test -race ./foo/...
ok github.com/peczenyj/example-race/foo 3.016s
regards
The sole current release is missing years of fixes
As the title says, it might be handy to have a package for retriable actions with different back-off strategies etc.
Hello,
Can you please tag and version this project?
I am the Debian Maintainer for go-resiliency and versioning would help Debian keep up with development.
Hello, I've some feature request regarding the title
Basically, I propose we use errors.Is()
from standard library to check for the blacklisted/whitelisted error types (the err == pass
bit)
go-resiliency/retrier/classifier.go
Lines 40 to 44 in b98ce28
The reason is that currently we need to specify exacly what the error kind is, without checking if an error actually contains ("wraps") another error which are actually listed
Say we've implements a custom type to better handle specific error usecase
type HTTPError struct {
error
host string
path string
}
func (h HTTPError) Error() string { // implements Go's `error` type
return "failed on HTTP request to " + h.host + h.path + " reason: " + h.error.Error()
}
func (h HTTPError) Unwrap() error { // will be called from standard library error unwrapping
return h.error
}
And that the error type wraps context.Canceled
, which is defined as a blacklist
backoff := // ...
class := retrier.BlacklistClassifier{
context.Canceled,
}
retry := retrier.New(backoff, class)
// ...
retry.Run(func() {
// ...
return HTTPError{error: context.Cancelled, host: "...", path: "..."}
})
The blacklist check would fail, because err == pass
won't know what the "internal" value of the error actually is
The deadline example says:
// check stopper function and give up if timed out
What is the stopper function? Stopper is a channel. Perhaps the example can be expanded to show how to do this?
Imagine that I have an application that is an event processor that connects to some event log and for every event that arrives I want to do something with it. Also imagine that I have a Retrier
with exponential backoff (100ms initial backoff time) and 5 retries.
We can get errors because of network issues (server not ready to accept connections, server disconnected for some reason like server LoadBalancer downscaling instances) or errors processing events.
If the app fails 5 times to process an event the retry loop should end and return error. This is good.
If my app fails to connect 5 times in a row the same should happen.
But if my app connects to the server, happily processes messages for 2hours then gets a server disconnection. It will retry the connection and (if successful) continue processing events. In the next few hours we receive 4 more server disconnections, but after each one we're able to reconnect and continue processing messages for some time. It does not make sense to say "we've failed 5 times so let's just finish here and return an error".
So basically the Retrier
should expose a Reset()
method which would cause the retrier to "forget" that it encountered errors before so that the next time that it encounters an error, instead of sleeping for a long time due to exponential backoff it will just sleep for the initial 100ms (as per the example) and it could tolerate 5 more errors later.
Here's a very simple skeleton of code that shows how this feature could be used:
r := retrier.New(retrier.ExponentialBackoff(5, 100 * time.Milliseconds), nil)
err := r.Run(func() error {
server, err := connect(address)
if err != nil {
return err
}
for {
event, err := server.getNextEvent()
if err != nil {
return err
}
err = processEvent(event)
if err != nil {
return err
}
// Because we've been able to process an event it seems that all is working correctly
// so it doesn't matter that in the past we had (for example 2 errors), the next time we
// see an error should be like it was the first error we've seen, and the backoff time to
// to wait until next try should be the initial one passed by the user.
r.Reset()
}
}
From what I see in this repository, this circuit breaker does not have a function to reset itself, so that the treshold all resets but without creating new CircuitBreaker. This help so much when the circuit breaker is intended to do multiple breaker on diffrent function in one file.
I like to use retrier to wrap calls to external services one thing that is generating a bit of verbose code right now is that the work functions do not return any value in their signature (totally understandable) .
The idea here would be to create a new Run/RunCtx signature using generics and make the work function return the type that is passed to the function...
func (r *Retrier) RunR[R any](work func() (R, error)) error
func (r *Retrier) RunCtxR[R any](ctx context.Context, work func(ctx context.Context) (R, error)) error
This way the caller can just wrap their normal function without having to var the return...
I understand this would force the library to jump all the way to go 1.18 but just wondering if you would be open to me PRing this..
I can also try to use some type of interface{} approach so that there is no need to go go 1.18
Hello,
I recently made a change to the library and I would like new tag to be so I can rely on it, in my project. The specific change is the introduction of a new option called "WithInfiniteRetry()" allowing users to configure the retrier for infinite retry attempts.
New option: WithInfiniteRetry()
Users can now set this option to enable infinite retries for scenarios where continuous retry attempts are desired.
Infinite retries can be useful in scenarios where a user wants to continuously attempt an operation until success without a predefined limit. This change enhances the flexibility and adaptability of the retrier, catering to a wider range of use cases.
IMHO, it's supposed to be an alternativ to handle in a more proper way a panicOnError, in a transactional context.
Can you please create a new tag for this enhancement, making it easier for users to integrate this feature into their projects. I understand and appreciate your time and effort in maintaining this library. ๐
Thank you for considering this request. If there are any additional details or steps required from my end, please let me know.
Thanks a lot !
This minimizes the risk of correlated retries. Compare to goback's jitter: https://github.com/carlescere/goback#jitter-backoff
I feel that you should add log when some occur in function func (b *Breaker) doWork(state uint32, work func() error) error
, it convenient for us to find error
Hello!
We are getting the following error in the Debian CI:
--- FAIL: TestBreakerAsyncStateTransitions (4.03s)
breaker_test.go:173: circuit breaker is open
FAIL
exit status 1
FAIL gopkg.in/eapache/go-resiliency.v1/breaker 14.081s
=== RUN TestDeadline
--- FAIL: TestDeadline (0.04s)
deadline_test.go:27: timed out waiting for function to finish
FAIL
exit status 1
FAIL gopkg.in/eapache/go-resiliency.v1/deadline 0.080s
=== RUN TestConstantBackoff
Looks like it only happens some of the time.
When ctx
passed to RunCtx(ctx, func(ctx ...
is cancelled, the RunCtx method does not return early, instead it waits until the current backoff time passes.
This is caused by the following code:
go-resiliency/retrier/retrier.go
Lines 98 to 103 in b912f08
While researching the proper API usage I stumbled upon a blog post that proposes such construct.
In my opinion this blog post is a bit misleading, because it applies a solution for time.After
to a time.NewTimer
usage, which makes no sense.
So I used sourcegraph.com/search to search for some of the usages of the time.NewTimer
which will confirm my above conclusion.
So I found the following NewTimer usage in the net/http
package of the standard library:
timer := time.NewTimer(nextPollInterval())
defer timer.Stop()
for {
if srv.closeIdleConns() {
return lnerr
}
select {
case <-ctx.Done():
return ctx.Err()
case <-timer.C:
timer.Reset(nextPollInterval())
}
}
As shown above, there is no need to wait on timer.C
when ctx
is cancelled, we just need to ensure the timer is stopped when it goes out of scope.
So combining all the above, the issue I am reporting (that RunCtx does not return immadiately when ctx is cancel) could be fixed by the following patch:
diff --git a/retrier/retrier.go b/retrier/retrier.go
index e1b4349..9914a29 100644
--- a/retrier/retrier.go
+++ b/retrier/retrier.go
@@ -69,6 +69,7 @@ func (r *Retrier) RunCtx(ctx context.Context, work func(ctx context.Context) err
// is returned to the caller regardless. The work function takes 2 args, the context and
// the number of attempted retries.
func (r *Retrier) RunFn(ctx context.Context, work func(ctx context.Context, retries int) error) error {
+ var timer *time.Timer
retries := 0
for {
ret := work(ctx, retries)
@@ -81,7 +82,11 @@ func (r *Retrier) RunFn(ctx context.Context, work func(ctx context.Context, retr
return ret
}
- timer := time.NewTimer(r.calcSleep(retries))
+ if d := r.calcSleep(retries); timer == nil {
+ timer = time.NewTimer(d)
+ } else {
+ timer.Reset(d)
+ }
if err := r.sleep(ctx, timer); err != nil {
return err
}
@@ -96,10 +101,7 @@ func (r *Retrier) sleep(ctx context.Context, timer *time.Timer) error {
case <-timer.C:
return nil
case <-ctx.Done():
- if !timer.Stop() {
- <-timer.C
- }
-
+ timer.Stop()
return ctx.Err()
}
}
@eapache If you agree with the above reasoning, would you want me to send the above as PR?
Usage of <-time.After(c.Timeout) here may lead to memory leak.
timeout := time.After(r.calcSleep(retries))
According to the official time package docs:
After waits for the duration to elapse and then sends the current time on the returned channel. It is equivalent to NewTimer(d).C. The underlying Timer is not recovered by the garbage collector until the timer fires. If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed.
Possible solution:
func (r *Retrier) RunCtx(ctx context.Context, work func(ctx context.Context) error) error {
retries := 0
for {
ret := work(ctx)
switch r.class.Classify(ret) {
case Succeed, Fail:
return ret
case Retry:
if retries >= len(r.backoff) {
return ret
}
timer := time.NewTimer(r.calcSleep(retries))
if err := r.sleep(ctx, timer); err != nil {
return err
}
retries++
}
}
}
func (r *Retrier) sleep(ctx context.Context, timer *time.Timer) error {
select {
case <-timer.C:
return nil
case <-ctx.Done():
if !timer.Stop() {
<-timer.C
}
return ctx.Err()
}
}
References:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.