Giter VIP home page Giter VIP logo

Comments (18)

derekcollison avatar derekcollison commented on May 18, 2024

First try upgrading the gnatsd to the current version of 0.5.2 and see if
the error can be reproduced there. Also, what is the configuration and
setup for gnatsd? Does it run in clustered mode?

On Tue, Jun 17, 2014 at 6:17 AM, JimmyMa [email protected] wrote:

In our CloudFoundry deployment, the CloudController (the NATS client) is
getting below error intermittently when the CloudController receives the
"dea.advertise" message from NATS (gnatsd) server

{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,"8fbd89d3-d26c-47b2-94de-9b774bb9687f":1,"a432a6b2-c60a-4801-ba28-cf52811be799":1,"7a98cdb9-630d-4a21-b2b9-8c2012eacc43":1,"3736f818-a39b-4e18-a0a0-afe7f748bd2a":1,"dcf171b3-c13c-4ed9-9INFO
{"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}

The "INFO" message gets messed with the "dea.advertise" message.

Now I'm guessing below is the scenario which has the above problem:

  1. DEA publishes the below "dea.advertise" message to NATS server:

{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,"8fbd89d3-d26c-47b2-94de-9b774bb9687f":1,"a432a6b2-c60a-4801-ba28-cf52811be799":1,"7a98cdb9-630d-4a21-b2b9-8c2012eacc43":1,"3736f818-a39b-4e18-a0a0-afe7f748bd2a":1}}

  1. NATS server tries to relay the message to CC, but during the
    communication with CC, the network gets into problem, so only part of the
    message reaches to CC, as below:

{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,

CC NATS client receives the part of the message and stores it into @buf
https://github.com/buf (
https://github.com/derekcollison/nats/blob/v0.5.0.beta.12/lib/nats/client.rb#L521
).
2.

Due to the network problem, CC NATS client loses the connection to
NATS server, and it tries to re-connect to NATS server.
3.

NATS server receives the new connection from the CC, and sends below
"INFO" message back to CC NATS client. (NATS server always sends an "INFO"
message when a client connection gets created
https://github.com/apcera/gnatsd/blob/master/server/server.go#L373).

INFO
{"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}

  1. CC NATS client receives the "INFO" message and appends it to @buf
    https://github.com/buf, then the content of @buf
    https://github.com/buf is below:
    {"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,INFO
    {"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}

Maybe the @buf https://github.com/buf needs to be clean when the client
do the reconnection:
https://github.com/derekcollison/nats/blob/v0.5.0.beta.12/lib/nats/client.rb#L720


Reply to this email directly or view it on GitHub
#68.

from nats.rb.

JimmyMa avatar JimmyMa commented on May 18, 2024

The same problem happens with 0.5.2, see below error log:
/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.stderr.log:343343:/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/nats-0.5.0.beta.12/lib/nats/client.rb:618:in `block in connection_completed': Unknown protocol: {"id":"22-574d3c89e5e248d6b7e64b9119970bf9","stacks":["lucid64"],"available_memory":14592,"available_disk":379296,"app_id_to_count":{"27532d8b-90f1-43e8-9409-0d45b4a1e9ad":1,"1fad2097-68de-4fd6-84d5-ff7ae0d2919INFO {"server_id":"b7a0fe6b1f65433671c7840c2bbaa956","version":"0.5.2","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576} (NATS::ServerError)

Our gnatsd is in clustered mode with 2 nodes.

Please let me know if you need more info. Thanks.

from nats.rb.

derekcollison avatar derekcollison commented on May 18, 2024

Can you send me the conf file for the 2 gnatsd servers? Thanks.

On Tue, Jun 17, 2014 at 7:45 AM, JimmyMa [email protected] wrote:

The same problem happens with 0.5.2, see below error log:
/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.stderr.log:343343:/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/nats-0.5.0.beta.12/lib/nats/client.rb:618:in
`block in connection_completed': Unknown protocol:
{"id":"22-574d3c89e5e248d6b7e64b9119970bf9","stacks":["lucid64"],"available_memory":14592,"available_disk":379296,"app_id_to_count":{"27532d8b-90f1-43e8-9409-0d45b4a1e9ad":1,"1fad2097-68de-4fd6-84d5-ff7ae0d2919INFO
{"server_id":"b7a0fe6b1f65433671c7840c2bbaa956","version":"0.5.2","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}
(NATS::ServerError)

Our gnatsd is in clustered mode with 2 nodes.

Please let me know if you need more info. Thanks.


Reply to this email directly or view it on GitHub
#68 (comment).

from nats.rb.

JimmyMa avatar JimmyMa commented on May 18, 2024

Here are the conf file,

ip of 0.nats.default.stage0.microbosh is xx.xx.xx.197
ip of 1.nats.default.stage0.microbosh is xx.xx.xx.19

GNATSD 1:

net: "xx.xx.xx.197"
port: 4222
prof_port: 0
monitor_port: 0

pid_file: "/var/vcap/sys/run/nats/nats.pid"

log_file: "/var/vcap/sys/log/nats/nats.log"
debug: false
trace: false

authorization {
user: "nats"
password: "time4fun"
timeout: 15
}

cluster {
host: "xx.xx.xx.197"
port: 4223

authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}

routes = [

nats-route://nats:[email protected]:4223

nats-route://nats:[email protected]:4223

]
}

GNATSD 2:

net: "xx.xx.xx.19"
port: 4222
prof_port: 0
monitor_port: 0

pid_file: "/var/vcap/sys/run/nats/nats.pid"

log_file: "/var/vcap/sys/log/nats/nats.log"
debug: false
trace: false

authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}

cluster {
host: "xx.xx.xx.19"
port: 4223

authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}

routes = [

nats-route://nats:[email protected]:4223

nats-route://nats:[email protected]:4223

]
}

from nats.rb.

derekcollison avatar derekcollison commented on May 18, 2024

In the routes section you should only have one route to the other gnatsd
server, not an entry to the same server as well.

On Tue, Jun 17, 2014 at 8:40 AM, JimmyMa [email protected] wrote:

Here are the conf file,

ip of 0.nats.default.stage0.microbosh is xx.xx.xx.197
ip of 1.nats.default.stage0.microbosh is xx.xx.xx.19

GNATSD 1:

net: "xx.xx.xx.197"
port: 4222
prof_port: 0
monitor_port: 0

pid_file: "/var/vcap/sys/run/nats/nats.pid"

log_file: "/var/vcap/sys/log/nats/nats.log"
debug: false
trace: false

authorization {
user: "nats"
password: "time4fun"
timeout: 15
}

cluster {
host: "xx.xx.xx.197"
port: 4223

authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}

routes = [

nats-route://nats:[email protected]:4223

nats-route://nats:[email protected]:4223

]
}

GNATSD 2:

net: "xx.xx.xx.19"
port: 4222
prof_port: 0
monitor_port: 0

pid_file: "/var/vcap/sys/run/nats/nats.pid"

log_file: "/var/vcap/sys/log/nats/nats.log"
debug: false
trace: false

authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}

cluster {
host: "xx.xx.xx.19"
port: 4223

authorization {
user: "nats"
password: "xxxxx"
timeout: 15
}

routes = [

nats-route://nats:[email protected]:4223

nats-route://nats:[email protected]:4223

]
}


Reply to this email directly or view it on GitHub
#68 (comment).

from nats.rb.

JimmyMa avatar JimmyMa commented on May 18, 2024

Hi Derek,

So the wrong configuration about routes caused the "Unknown Protocol" error in nats client?

BTW, the "Unknown Protocol" error not happens always, it only happens in 2-3 of our cloudcontrollers, and 0 - 5 times each day.

from nats.rb.

derekcollison avatar derekcollison commented on May 18, 2024

Unclear, just trying to eliminate things. Is it possible to have a nats
client accidentally connect to the route port on 4223?

On Tue, Jun 17, 2014 at 8:59 AM, JimmyMa [email protected] wrote:

Hi Derek,

So the wrong configuration about routes caused the "Unknown Protocol"
error in nats client?

BTW, the "Unknown Protocol" error not happens always, it only happens in
2-3 of our cloudcontrollers, and 0 - 5 times each day.


Reply to this email directly or view it on GitHub
#68 (comment).

from nats.rb.

JimmyMa avatar JimmyMa commented on May 18, 2024

Hi Derek,

Only 4222 is configured in the nats client, so I don't think it could connect on 4223 accidentally.

I think the root cause is the intermittent network issue in our environment. nats client has "re-connect" capability, and I suspect the "Unknown Protocol" error happens after the "re-connect".

Thanks.

from nats.rb.

derekcollison avatar derekcollison commented on May 18, 2024

What version of the nats client are you running?

On Jun 17, 2014, at 5:28 PM, JimmyMa [email protected] wrote:

Hi Derek,

Only 4222 is configured in the nats client, so I don't think it could connect on 4223 accidentally.

I think the root cause is the intermittent network issue in our environment. nats client has "re-connect" capability, and I suspect the "Unknown Protocol" error happens after the "re-connect".

Thanks.


Reply to this email directly or view it on GitHub.

from nats.rb.

JimmyMa avatar JimmyMa commented on May 18, 2024

According to the below log, the nats client version is 0.5.0.beta.12.
/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.stderr.log:343343:/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/nats-0.5.0.beta.12/lib/nats/client.rb:

from nats.rb.

derekcollison avatar derekcollison commented on May 18, 2024

Thanks, I will try to dig in a bit deeper on the initial description.

On Tue, Jun 17, 2014 at 9:04 PM, JimmyMa [email protected] wrote:

According to the below log, the nats client version is 0.5.0.beta.12.

/var/vcap/sys/log/cloud_controller_ng/cloud_controller_ng.stderr.log:343343:/var/vcap/packages/cloud_controller_ng/cloud_controller_ng/vendor/bundle/ruby/1.9.1/gems/nats-0.5.0.beta.12/lib/nats/client.rb:


Reply to this email directly or view it on GitHub
#68 (comment).

from nats.rb.

JimmyMa avatar JimmyMa commented on May 18, 2024

Hi Derek,

One more thing I want to mention is that all messed messages contain "INFO" message, hope this could give you more clues. Thanks.

from nats.rb.

derekcollison avatar derekcollison commented on May 18, 2024

Ok thanks. Why is connection being dropped?

On Jun 20, 2014, at 5:35 AM, JimmyMa [email protected] wrote:

Hi Derek,

One more thing I want to mention is that all messed messages contain "INFO" message, hope this could give you more clues. Thanks.


Reply to this email directly or view it on GitHub.

from nats.rb.

JimmyMa avatar JimmyMa commented on May 18, 2024

Hi Derek,

We suspect our network is not stable, we are investigating/confirming the network instability issue now ...

from nats.rb.

fraenkel avatar fraenkel commented on May 18, 2024

You should be able to close this issue now with the latest changes merged.

from nats.rb.

derekcollison avatar derekcollison commented on May 18, 2024

Still trying to get some flaky tests passing reliably, will release to
rubygems today. Thanks again.

On Fri, Jun 27, 2014 at 4:27 AM, Michael Fraenkel [email protected]
wrote:

You should be able to close this issue now with the latest changes merged.


Reply to this email directly or view it on GitHub
#68 (comment).

from nats.rb.

nastacio avatar nastacio commented on May 18, 2024

From what I can tell, the connections dropped are always preceeded by a
"Slow Consumer Detected" message on the NATS server.
On Jun 20, 2014 8:35 AM, "JimmyMa" [email protected] wrote:

Hi Derek,

One more thing I want to mention is that all messed messages contain
"INFO" message, hope this could give you more clues. Thanks.


Reply to this email directly or view it on GitHub
#68 (comment).

from nats.rb.

derekcollison avatar derekcollison commented on May 18, 2024

ok I pushed 0.5.0-beta.14 to rubygems. Thanks again for the PR.

from nats.rb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.