Comments (18)
First try upgrading the gnatsd to the current version of 0.5.2 and see if
the error can be reproduced there. Also, what is the configuration and
setup for gnatsd? Does it run in clustered mode?
On Tue, Jun 17, 2014 at 6:17 AM, JimmyMa [email protected] wrote:
In our CloudFoundry deployment, the CloudController (the NATS client) is
getting below error intermittently when the CloudController receives the
"dea.advertise" message from NATS (gnatsd) server{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,"8fbd89d3-d26c-47b2-94de-9b774bb9687f":1,"a432a6b2-c60a-4801-ba28-cf52811be799":1,"7a98cdb9-630d-4a21-b2b9-8c2012eacc43":1,"3736f818-a39b-4e18-a0a0-afe7f748bd2a":1,"dcf171b3-c13c-4ed9-9INFO
{"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}The "INFO" message gets messed with the "dea.advertise" message.
Now I'm guessing below is the scenario which has the above problem:
- DEA publishes the below "dea.advertise" message to NATS server:
{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,"8fbd89d3-d26c-47b2-94de-9b774bb9687f":1,"a432a6b2-c60a-4801-ba28-cf52811be799":1,"7a98cdb9-630d-4a21-b2b9-8c2012eacc43":1,"3736f818-a39b-4e18-a0a0-afe7f748bd2a":1}}
- NATS server tries to relay the message to CC, but during the
communication with CC, the network gets into problem, so only part of the
message reaches to CC, as below:{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,
CC NATS client receives the part of the message and stores it into @buf
https://github.com/buf (
https://github.com/derekcollison/nats/blob/v0.5.0.beta.12/lib/nats/client.rb#L521
).
2.Due to the network problem, CC NATS client loses the connection to
NATS server, and it tries to re-connect to NATS server.
3.NATS server receives the new connection from the CC, and sends below
"INFO" message back to CC NATS client. (NATS server always sends an "INFO"
message when a client connection gets created
https://github.com/apcera/gnatsd/blob/master/server/server.go#L373).INFO
{"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}
- CC NATS client receives the "INFO" message and appends it to @buf
https://github.com/buf, then the content of @buf
https://github.com/buf is below:
{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,INFO
{"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}Maybe the @buf https://github.com/buf needs to be clean when the client
do the reconnection:
https://github.com/derekcollison/nats/blob/v0.5.0.beta.12/lib/nats/client.rb#L720—
Reply to this email directly or view it on GitHub
#68.
from nats.rb.
The same problem happens with 0.5.2, see below error log:
/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.stderr.log:343343:/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/nats-0.5.0.beta.12/lib/nats/client.rb:618:in `block in connection_completed': Unknown protocol: {"id":"22-574d3c89e5e248d6b7e64b9119970bf9","stacks":["lucid64"],"available_memory":14592,"available_disk":379296,"app_id_to_count":{"27532d8b-90f1-43e8-9409-0d45b4a1e9ad":1,"1fad2097-68de-4fd6-84d5-ff7ae0d2919INFO {"server_id":"b7a0fe6b1f65433671c7840c2bbaa956","version":"0.5.2","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576} (NATS::ServerError)
Our gnatsd is in clustered mode with 2 nodes.
Please let me know if you need more info. Thanks.
from nats.rb.
Can you send me the conf file for the 2 gnatsd servers? Thanks.
On Tue, Jun 17, 2014 at 7:45 AM, JimmyMa [email protected] wrote:
The same problem happens with 0.5.2, see below error log:
/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.stderr.log:343343:/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/nats-0.5.0.beta.12/lib/nats/client.rb:618:in
`block in connection_completed': Unknown protocol:
{"id":"22-574d3c89e5e248d6b7e64b9119970bf9","stacks":["lucid64"],"available_memory":14592,"available_disk":379296,"app_id_to_count":{"27532d8b-90f1-43e8-9409-0d45b4a1e9ad":1,"1fad2097-68de-4fd6-84d5-ff7ae0d2919INFO
{"server_id":"b7a0fe6b1f65433671c7840c2bbaa956","version":"0.5.2","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}
(NATS::ServerError)Our gnatsd is in clustered mode with 2 nodes.
Please let me know if you need more info. Thanks.
—
Reply to this email directly or view it on GitHub
#68 (comment).
from nats.rb.
Here are the conf file,
ip of 0.nats.default.stage0.microbosh is xx.xx.xx.197
ip of 1.nats.default.stage0.microbosh is xx.xx.xx.19
GNATSD 1:
net: "xx.xx.xx.197"
port: 4222
prof_port: 0
monitor_port: 0
pid_file: "/var/vcap/sys/run/nats/nats.pid"
log_file: "/var/vcap/sys/log/nats/nats.log"
debug: false
trace: false
authorization {
user: "nats"
password: "time4fun"
timeout: 15
}
cluster {
host: "xx.xx.xx.197"
port: 4223
authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}
routes = [
nats-route://nats:[email protected]:4223
nats-route://nats:[email protected]:4223
]
}
GNATSD 2:
net: "xx.xx.xx.19"
port: 4222
prof_port: 0
monitor_port: 0
pid_file: "/var/vcap/sys/run/nats/nats.pid"
log_file: "/var/vcap/sys/log/nats/nats.log"
debug: false
trace: false
authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}
cluster {
host: "xx.xx.xx.19"
port: 4223
authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}
routes = [
nats-route://nats:[email protected]:4223
nats-route://nats:[email protected]:4223
]
}
from nats.rb.
In the routes section you should only have one route to the other gnatsd
server, not an entry to the same server as well.
On Tue, Jun 17, 2014 at 8:40 AM, JimmyMa [email protected] wrote:
Here are the conf file,
ip of 0.nats.default.stage0.microbosh is xx.xx.xx.197
ip of 1.nats.default.stage0.microbosh is xx.xx.xx.19GNATSD 1:
net: "xx.xx.xx.197"
port: 4222
prof_port: 0
monitor_port: 0pid_file: "/var/vcap/sys/run/nats/nats.pid"
log_file: "/var/vcap/sys/log/nats/nats.log"
debug: false
trace: falseauthorization {
user: "nats"
password: "time4fun"
timeout: 15
}cluster {
host: "xx.xx.xx.197"
port: 4223authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}routes = [
nats-route://nats:[email protected]:4223
nats-route://nats:[email protected]:4223
]
}GNATSD 2:
net: "xx.xx.xx.19"
port: 4222
prof_port: 0
monitor_port: 0pid_file: "/var/vcap/sys/run/nats/nats.pid"
log_file: "/var/vcap/sys/log/nats/nats.log"
debug: false
trace: falseauthorization {
user: "nats"
password: "xxxxx"
timeout: 15
}cluster {
host: "xx.xx.xx.19"
port: 4223authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}routes = [
nats-route://nats:[email protected]:4223
nats-route://nats:[email protected]:4223
]
}—
Reply to this email directly or view it on GitHub
#68 (comment).
from nats.rb.
Hi Derek,
So the wrong configuration about routes caused the "Unknown Protocol" error in nats client?
BTW, the "Unknown Protocol" error not happens always, it only happens in 2-3 of our cloudcontrollers, and 0 - 5 times each day.
from nats.rb.
Unclear, just trying to eliminate things. Is it possible to have a nats
client accidentally connect to the route port on 4223?
On Tue, Jun 17, 2014 at 8:59 AM, JimmyMa [email protected] wrote:
Hi Derek,
So the wrong configuration about routes caused the "Unknown Protocol"
error in nats client?BTW, the "Unknown Protocol" error not happens always, it only happens in
2-3 of our cloudcontrollers, and 0 - 5 times each day.—
Reply to this email directly or view it on GitHub
#68 (comment).
from nats.rb.
Hi Derek,
Only 4222 is configured in the nats client, so I don't think it could connect on 4223 accidentally.
I think the root cause is the intermittent network issue in our environment. nats client has "re-connect" capability, and I suspect the "Unknown Protocol" error happens after the "re-connect".
Thanks.
from nats.rb.
What version of the nats client are you running?
On Jun 17, 2014, at 5:28 PM, JimmyMa [email protected] wrote:
Hi Derek,
Only 4222 is configured in the nats client, so I don't think it could connect on 4223 accidentally.
I think the root cause is the intermittent network issue in our environment. nats client has "re-connect" capability, and I suspect the "Unknown Protocol" error happens after the "re-connect".
Thanks.
—
Reply to this email directly or view it on GitHub.
from nats.rb.
According to the below log, the nats client version is 0.5.0.beta.12.
/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.stderr.log:343343:/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/nats-0.5.0.beta.12/lib/nats/client.rb:
from nats.rb.
Thanks, I will try to dig in a bit deeper on the initial description.
On Tue, Jun 17, 2014 at 9:04 PM, JimmyMa [email protected] wrote:
According to the below log, the nats client version is 0.5.0.beta.12.
/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.stderr.log:343343:/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/nats-0.5.0.beta.12/lib/nats/client.rb:
—
Reply to this email directly or view it on GitHub
#68 (comment).
from nats.rb.
Hi Derek,
One more thing I want to mention is that all messed messages contain "INFO" message, hope this could give you more clues. Thanks.
from nats.rb.
Ok thanks. Why is connection being dropped?
On Jun 20, 2014, at 5:35 AM, JimmyMa [email protected] wrote:
Hi Derek,
One more thing I want to mention is that all messed messages contain "INFO" message, hope this could give you more clues. Thanks.
—
Reply to this email directly or view it on GitHub.
from nats.rb.
Hi Derek,
We suspect our network is not stable, we are investigating/confirming the network instability issue now ...
from nats.rb.
You should be able to close this issue now with the latest changes merged.
from nats.rb.
Still trying to get some flaky tests passing reliably, will release to
rubygems today. Thanks again.
On Fri, Jun 27, 2014 at 4:27 AM, Michael Fraenkel [email protected]
wrote:
You should be able to close this issue now with the latest changes merged.
—
Reply to this email directly or view it on GitHub
#68 (comment).
from nats.rb.
From what I can tell, the connections dropped are always preceeded by a
"Slow Consumer Detected" message on the NATS server.
On Jun 20, 2014 8:35 AM, "JimmyMa" [email protected] wrote:
Hi Derek,
One more thing I want to mention is that all messed messages contain
"INFO" message, hope this could give you more clues. Thanks.—
Reply to this email directly or view it on GitHub
#68 (comment).
from nats.rb.
ok I pushed 0.5.0-beta.14 to rubygems. Thanks again for the PR.
from nats.rb.
Related Issues (20)
- support custom certs in TLS HOT 1
- SRV record support HOT 2
- correctly handling retries HOT 5
- timing issue around @pending queue HOT 4
- unneeded dependencies HOT 4
- Further concurrency related corruption in @pending HOT 3
- Support for NATS Streaming HOT 6
- allow discovery of servers to be disabled HOT 6
- undefined method `<<' for nil:NilClass HOT 2
- Replied data might be lost in request-reply processing. HOT 2
- update eventmachine HOT 1
- Compiled binary? HOT 2
- Endless loop - EM release machine -> unbind -> reconnect HOT 5
- Ruby NATS client doesn't verify hostname when connecting over TLS HOT 2
- Nats server closed connection with Stale Client Connection HOT 1
- TLS Peer verification: `ssl_verify_peer` does not support multiple CAs HOT 9
- TLS connection error `oversize record received` HOT 9
- Fix IPv6 address being sent to `gethostname` still containing []'s
- does not resend auto unsubscribe with updated value
- Support NATS JetStream HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nats.rb.