Comments (19)
Btw the way in which I am trying to reproduce "a request is sent but just never gets a response" might be totally invalid, I wasn't really sure how to recreate that when testing with a memcache daemon on localhost since I didn't want the client to get a connection reset from the OS by closing the serverSocket down (since in my case there was no OS to send reset packets back on the other side). I am not familiar enough with Netty to understand what would happen with it's channels if the server on the other end suddenly went away without closing connections properly.
from folsom.
What version of folsom were you using? Without knowing this, debugging is pretty much impossible.
With the latest master (and 0.6.1) I expect this to happen:
The memcached server stops responding - which would lead to the timeout checker to kill the connection (or the memcached server breaks the tcp connection directly). All pending requests should immediately fail with MemcacheClosedException. All new requests will also fail with this. The DefaultRawMemcacheClient is now closed and won't ever be used again.
Then the reconnectingclient will try to reconnect periodically. Once the server is available again, a new DefaultRawMemcacheClient is created.
But perhaps there's a bug in here somewhere which both keeps the old instance around and the DefaultRawMemcacheClient doesn't consider itself closed.
@danielnorberg or @protocol7, do you have time to investigate? I may not have much time at the moment.
from folsom.
Maybe we could try reproing this using SIGSTOP
?
from folsom.
Hm, maybe this bugfix is important enough to justify a new release:
47f9826
from folsom.
Also, DefaultRawMemcacheClient doesn't check if it's closed in its send.
Does that mean that we don't get the failure callback in the call to:
channel.write(request, new RequestWritePromise(channel, request)); ?
Or maybe it just means that we don't actually decrement the counter from that codepath.
@danielnorberg - That seems like your area of expertise :)
from folsom.
I guess it's not really a big problem if a disconnected client keeps failing with overloaded instead of closed (but that should be fixed too), the problem in this case was probably that it failed to reconnect, which is probably related to the bugfix in master (but we'll need to verify that)
from folsom.
Writing on a closed channel always fails.
from folsom.
What version of folsom were you using?
0.6.1.
from folsom.
@danielnorberg Right, so we increment pending, then write the request which immediately fails, but then we never decrement pending.
from folsom.
With the latest master (and 0.6.1) I expect this to happen:
The memcached server stops responding - which would lead to the timeout checker to kill the connection (or the memcached server breaks the tcp connection directly). All pending requests should immediately fail with MemcacheClosedException. All new requests will also fail with this. The DefaultRawMemcacheClient is now closed and won't ever be used again.Then the reconnectingclient will try to reconnect periodically. Once the server is available again, a new DefaultRawMemcacheClient is created.
I am not 100% sure yet but I think what happened in my case was that the memcached server never broke the connection. I am going to go back through the logs from when this happened to see if requests were getting MemcacheClosedException during the downtime or something else. I also will try to look into creating a test that better reproduces what I saw.
from folsom.
Huge apologies but after double-checking our logs again it turns out I misread / misremembered them - after the memcached host came back online, we were not actually get folsom MemcacheOverloadedException as originally claimed but instead a "too many outstanding requests" exception from a downstream service (which we only call on cache misses in memcache).
Really sorry for the false alarm.
I think there might still be some merit to the original test case where a server that is still "connected" but simply not sending any response to requests causes MemcacheOverloadedExceptions to be thrown even after the "stuck" requests time out, but that seems tangental to my original report (where the server was down for hours and there would have to be some really confusing behavior in Netty and/or the OS to think the socket was still open).
I think a part of the reason why I conflated the two exceptions is that our Metrics listener was reporting "failures" for all GETs after the server came back - but that might be because we are attaching transform functions directly to the ListenableFuture from MemcacheClient and calling that downstream service in the same-thread-executor, so an exception in the attached transform function might bubble up to the onFailure method of a FutureCallback attached to the original ListenableFuture - but that is totally our fault.
Apologies again for the mixup. Feel free to close this one.
from folsom.
While the original report may be incorrect, I still think there might be some more things worth investigating, so I suggest leaving it open for now.
from folsom.
I just saw this issue come up with one of our services in production. Basically at some point we hit the too many outstanding requests limit within folsom and the folsom client basically just died. It seems like the request queue just never flushes once it hits the limit. During this time, folsom seems to eat up a ton of resources causing my service to return 503's for a brief amount of time. Eventually the connection to memcached just dies completely and folsom never successfully reconnects.
I have tried looking into what is causing this but havent been able to nail it down as of yet, however this seems to be a pretty large issue.
Stack Trace:
: Mar 30, 2015 1:41:19 PM com.google.common.util.concurrent.Futures$CombinedFuture setExceptionAndMaybeLog
974d9da0808dd536180afb27d31dec871f7bb7[1]: SEVERE: input future failed.
974d9da0808dd536180afb27d31dec871f7bb7[1]: com.spotify.folsom.MemcacheOverloadedException: too many outstanding requests
974d9da0808dd536180afb27d31dec871f7bb7[1]: at com.spotify.folsom.client.DefaultRawMemcacheClient.send(DefaultRawMemcacheClient.java:162)
974d9da0808dd536180afb27d31dec871f7bb7[1]: at com.spotify.folsom.reconnect.ReconnectingClient.send(ReconnectingClient.java:92)
974d9da0808dd536180afb27d31dec871f7bb7[1]: at com.spotify.folsom.ketama.KetamaMemcacheClient.sendSplitRequest(KetamaMemcacheClient.java:98)
974d9da0808dd536180afb27d31dec871f7bb7[1]: at com.spotify.folsom.ketama.KetamaMemcacheClient.send(KetamaMemcacheClient.java:69)
974d9da0808dd536180afb27d31dec871f7bb7[1]: at com.spotify.folsom.retry.RetryingClient.send(RetryingClient.java:46)
974d9da0808dd536180afb27d31dec871f7bb7[1]: at com.spotify.folsom.client.binary.DefaultBinaryMemcacheClient.multiget(DefaultBinaryMemcacheClient.java:188)
974d9da0808dd536180afb27d31dec871f7bb7[1]: at com.spotify.folsom.client.binary.DefaultBinaryMemcacheClient.getAndTouch(DefaultBinaryMemcacheClient.java:198)
974d9da0808dd536180afb27d31dec871f7bb7[1]: at com.spotify.folsom.client.binary.DefaultBinaryMemcacheClient.get(DefaultBinaryMemcacheClient.java:157)
from folsom.
Thanks for the report, we should investigate further. One known problem is that we never decrement the pending counter for some types of failed requests (failed writes), but since a failed write only happens if it disconnects, I didn't think that was important enough to fix.
I am on parental leave now, but I will take a look when I get back to work next week.
@danielnorberg Maybe this is something you want to investigate? I think you understand the netty integration best.
from folsom.
Sounds bad. Would be nice if we had some way to reproduce it. I'll take a stab at putting together a repro by connecting a client to a memcached mockup, hit the outstanding request limit and then verify that it can recover.
from folsom.
@danielnorberg in case you missed it I attached a test case in my original report up top.
from folsom.
Release 0.6.2 should be out now and fix this issue.
from folsom.
Can this issue be closed?
from folsom.
Yep, I think #43 tests the same condition that I tried to reproduce in the test attached to this issue. Thanks
from folsom.
Related Issues (20)
- file descriptor leak HOT 3
- Configurable number of connections to memcached server HOT 2
- Binary client doesn't obey TTL in some cases HOT 14
- ReconnectingClient prints "Lost connection" for old nodes when another one is added - while requests go through just fine HOT 4
- Remove or shade/hide guava dependency HOT 4
- Reusable buffers as parameter for getters HOT 1
- Support for incr/decr commands HOT 1
- let's add docs how to run build and tests locally? (Docker container) HOT 1
- Add tracing HOT 2
- Feature request: per node telemetry
- Exceptions with java 11 HOT 1
- Add Micrometer metrics HOT 3
- Folsom - Spring Boot DevTools class loader HOT 3
- Release notes or CHANGELOG HOT 1
- Extending MemcacheClientBuilder to more easily allow other AbstractMultiMemcacheClient impl HOT 2
- Folsom
- Introduce OpenTelemetry metrics and tracing HOT 3
- Last 2 versions were not released HOT 1
- Client failure after responding to too many requests where value is too large HOT 2
- How can I connect a sasl memcached server via folsom HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from folsom.