nats-io / nats.rb Goto Github PK

View Code? Open in Web Editor NEW

880.0 47.0 134.0 1.39 MB

Ruby client for NATS, the cloud native messaging system.

Home Page: https://nats.io

License: Apache License 2.0

Ruby 99.82% Shell 0.18%

ruby nats client eventmachine messaging pubsub cncf

nats.rb's People

Contributors

Stargazers

Watchers

Forkers

mtodd wilson ezmobius edwardt agilemethods usutani reinh dsabeti achied migue pmenglund shouta-dev nivertech stefanschneider slimyang kkas wujiee jayon lgn21st gruis paloaltocp brockgr ttilley duankai stevezuo mouse3150 c0mpsc1 joshlong-attic liyi-at-gz zhoudengpeng huangciyin xiaoshengaimm resouer takeshi einsitang nsnt atheorist bnugman chenkaigithub cf-frameworks geku nlab-kinyatomita tsujimotohiroaki silverballs svoflee aduan cloudfoundry-attic democraci pwd jackytu ipandanet tsuyo vito flyingliang corlettb rodney-vin pombredanne yeyanbo andybrier aravind-pavuluri truemany ronakbanka krobertson imace zhuqling bletzer michft championl shumei crafet nagyist cliffyin deepakkumarnd keyofspectator x-plat yinhongquan alisheikh jon-n brickstack hafent lyt23 rsanders ruanbinfeng aosalias wheelcomplex rtacconi jason-weng cmy924 fanyeren subicura sunshine-zhd1229 wallyqs dreamrivulet lyz06221022 fraenkel aforward-oss klauskeidel benvanstaveren r3labs zfjoy520

nats.rb's Issues

In sublist, a.>.b won't be matched by anything

a.>.>, a.>.* as well.
It is very likely, this is just by design. So, just a notice.

cannot see published messages when useing Go NATS server

I am using gnats server 0.5.6, ruby-nats 0.5.0.beta.12 or 0.5.0.beta.16. When I use the Go pub and sub examples it works. If I use the examples provided here https://github.com/derekcollison/ruby-nats/blob/cluster/examples/auth_pub.rb and here https://github.com/derekcollison/ruby-nats/blob/cluster/examples/auth_sub.rb I can see that the message is published (by using the -d option in the nat server) but the subscriber does not print anything. I have a feeling the the ruby-nats gem is working with some old version of the gnatsd server but probably server 0.5.6 is not working.

Differentiate between errors that close the connection & not

There is no difference between a NATS::ServerError that closes the connection (like AUTH_REQUIRED) and one that keeps the connection open - this makes it hard to deal with this on the client side.

It would be nice to have something like NATS::ServerError::Fatal that means the client has to retry the connection.

"Lock EM to 0.12.10" is a no-go

eventmachine 0.12.0 fails to build on a list of distributions, and the build is fixed in their newer release (eventmachine/eventmachine#345).

Implement NATS.flush

I use NATS.publish('done') { NATS.stop } type of functionality all the time, but the NATS.publish is confusing when all we want is to get a callback when the server has processed all the current input from the client.

config file authorization:password

Also allow pass

Use random removal for sublist cache

Random removal should offer a good tradeoff between code simplicity, algorithmic complexity, and cache efficiency.

How to deal with "Unknown Protocal"

Hi,derek!
We came across some problems when using nats and want to ask you for advice.I am confused,why the client will crack when receiving unknown protocal immediately? Will it be better the client wait for some time?
Thanks！

Gem with reconnect fixes

We just ran across issues that require your reconnect fixes.
Can you release a new version?

Thanks.

MAX_PAYLOAD_SIZE should be available in the client

so one knows what size not to exceed when sending messages

Publish a stream from within a subscribe block

I'm having difficulty understanding the pending of messages that happens with NATS.publish. I want to do something like;

NATS.start do
  NATS.subscribe('stream') do |msg, _, sub|
    RealtimeStream.each do |line|
      NATS.publish('output', line)
    end
  end
end

But no matter what I do NATS.publish doesn't send the messages straight away. The only way I can do it is to wrap the publishing in Process.fork like so;

NATS.start do
  NATS.subscribe('stream') do |msg, _, sub|
    RealtimeStream.each do |line|
      Process.fork do
        NATS.start{NATS.publish('output', line){NATS.stop}}
      end
    end
  end
end

Which just seems crazy! What am I missing?

meet error in connection.rb

in version 0.5.0.beta.12, we meet below error while running nats:
/home/work/nats/bin/nats/lib/nats/server/connection.rb:23:in `unpack_sockaddr_in': can't convert nil into String (TypeError)

our team dig into the code, the following code may not run right:

def client_info
@client_info ||= (get_peername.nil? ? 'N/A' : Socket.unpack_sockaddr_in(get_peername))
end

as "getpeername" is not a stable function , the result of the function-calling can not get the right address and port , so our team consider below code is a better one

def client_info
cur_peername = get_peername
@client_info ||= (cur_peername.nil? ? 'N/A' : Socket.unpack_sockaddr_in(cur_peername))
end

NATS client forgets all servers and never reconnects

While this seems to be the expected behavior as coded, you questioned whether its correct.
https://github.com/nats-io/ruby-nats/blob/master/lib/nats/client.rb#L793

For us, this is the wrong behavior since

its expected. A rolling upgrade of gnatsd servers will cause a sequence of failed connects. The servers should not be removed.
the only way to detect this situation is to add my own err callback and look at the server_pool. However, it seems that process_disconnect may be called multiple times, but not in all cases which makes it very difficult to know when it's safe to rebuild a nats client. Although in our case I believe our only option is to restart our process since we cannot hand out a new NATS client.

I would like to either remove this behavior or at least add an option to never remove a server regardless of the situation and just cycle through them.

NATS.subscribe(channel) { |msg, reply| ... } - reply is nil

I'm running a standalone CloudFoundry DEA and playing with it via nats. I'm attempting to do a publish/reply 'dea.announce' request, but the reply part doesn't seem to be working.

https://gist.github.com/4057377 is my attempt to reproduce it.

The DEA subscribes to dea.discover, and can reply via the reply variable.

# dea/agent.rb

NATS.subscribe('dea.discover') { |msg, reply| process_dea_discover(msg, reply) }

But, reply is nil.

The NATS.publish call I'm making is:

#!/usr/bin/env ruby

require "nats/client"
require "json"

NATS.start do
  NATS.subscribe('>') { |msg, reply, sub| puts "Msg received on [#{sub}] : '#{msg}'" }
  message = {
    'runtime_info' => {
      'name' => 'ruby19',
      'executable' => 'ruby',
      'version_output' => 'ruby 1.9.3p286'
    },
    'limits' => {
      'mem' => 256
    },
    'droplet' => 'DROPLET_ID_1234'
  }
  NATS.publish('dea.discover', message.to_json) do |response|
    puts "Got dea.discover response: #{response.inspect}"
  end
end

I can reproduce the problem with a node.js client equivalent:

var nats = require('nats').connect();

// Simple Subscriber
nats.subscribe('>', function(msg, reply, subject) {
  console.log('Msg received on [' + subject + '] : ' + msg);
});

var message = {
  'runtime_info': {
    'name': 'ruby19',
    'executable': 'ruby',
    'version_output': 'ruby 1.9.3p286'
  },
  'limits': {
    'mem': 256
  },
  'droplet': 'DROPLET_ID_1234'
};
nats.publish('dea.discover', JSON.stringify(message), function(response) {
  console.log("Got dea.discover response: " + response);
});

The dea reply is nil in both scenarios; and hence response is nil in both ruby/javascript client examples.

What am I possibly doing wrong?

undefined method `<<' for nil:NilClass

I am working on a logstash-output-plugin for nats and am hitting this error in ruby-nats:

{
  "timestamp": "2016-08-15T15:27:46.019000-0700",
  "message": "nats: Failed to send event to Nats",
  "event": { ... },
  "exception": "undefined method `<<' for nil:NilClass",
  "backtrace": [
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/nats-0.8.0/lib/nats/client.rb:909:in `send_command'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/nats-0.8.0/lib/nats/client.rb:378:in `publish'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/nats-0.8.0/lib/nats/client.rb:256:in `publish'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/local_gems/bd0cc268/logstash-output-nats-1.0.0/lib/logstash/outputs/nats.rb:108:in `send_to_nats'",
    "org/jruby/RubyProc.java:281:in `call'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-codec-json-3.0.1/lib/logstash/codecs/json.rb:42:in `encode'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/local_gems/bd0cc268/logstash-output-nats-1.0.0/lib/logstash/outputs/nats.rb:52:in `receive'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-core-5.0.0.alpha3-java/lib/logstash/outputs/base.rb:87:in `multi_receive'",
    "org/jruby/RubyArray.java:1613:in `each'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-core-5.0.0.alpha3-java/lib/logstash/outputs/base.rb:87:in `multi_receive'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-core-5.0.0.alpha3-java/lib/logstash/output_delegator.rb:142:in `worker_multi_receive'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-core-5.0.0.alpha3-java/lib/logstash/output_delegator.rb:123:in `multi_receive'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-core-5.0.0.alpha3-java/lib/logstash/pipeline.rb:315:in `output_batch'",
    "org/jruby/RubyHash.java:1342:in `each'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-core-5.0.0.alpha3-java/lib/logstash/pipeline.rb:315:in `output_batch'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-core-5.0.0.alpha3-java/lib/logstash/pipeline.rb:243:in `worker_loop'",
    "/usr/local/Cellar/logstash/5.0.0-alpha3/libexec/vendor/bundle/jruby/1.9/gems/logstash-core-5.0.0.alpha3-java/lib/logstash/pipeline.rb:204:in `start_workers'"
  ],
  "level": "warn"
}

ruby-nats https://github.com/nats-io/ruby-nats/blob/v0.8.0/lib/nats/client.rb#L909

  def send_command(command, priority = false) #:nodoc:
    needs_flush = (connected? && @pending.nil?)

    @pending ||= []
    @pending << command unless priority
    @pending.unshift(command) if priority
    @pending_size += command.bytesize

Is the case that @pending is being set to nil somehow after it is ||=ed to []?

I tried to test this with irb but it seems to work fine (assuming that, given the error message, pending is nil coming into this function):

irb(main):010:0> pending = nil
=> nil
irb(main):010:0> pending.nil?
=> true
irb(main):011:0> pending ||= []
=> []
irb(main):012:0> pending
=> []
irb(main):013:0> pending << 'command' unless false
=> ["command"]

I am new to ruby and not at all familiar with eventmachine, maybe something in there allowing this to happen?

Of course it could be something I am doing wrong in my plugin at https://github.com/nickcarenza/logstash-output-nats.

Feature Request: Non-EM client

Have you considered creating a a client that does not depend on EventMachine?

update eventmachine

Eventmachine 1.2.1 is out, please consider updating the gemspec to allow it?

Further concurrency related corruption in @pending

So unfortunately this is not quite fixed yet, and this time I do not know enough about EM to fix it.

Am doing some tests sending large messages as quick as I possibly can, my broker is a bit far away network wise so the pending queue can grow a bit between drains, and draining can take a bit (milliseconds, but still significant).

So flush_pending can take a while to finish and at the same time send_command can be adding stuff to it while flush_pending is happening. flush_pending will still nil the Array which messes it all up. Given that a user of the gem want to publish at any time stuff is happening concurrently and this Array ends up nil at times.

Best option is to never nil the array so appending to it always works. To do this a bunch has to be rewritten, if this was not EM I would use the Queue class that's thread safe and have a mode to have a blocking listener on it. One thread would sit and blocking listen on the Queue and publish soon as there's a message and any other thread would just append to the Queue - it's built in thread safety makes that fine. There has to be some EM equivelant of that, restructuring this code so that that @pending is always appendable is the only sane thing I think.

@q = Queue.new

Thread.new { loop { publish_to_network(@q.pop) } }

# other threads can at any time safely << to @q and message is dispatched soon as its written

I have no idea what the equivelant is in EM world, but this to me seems like the right structure rather than a plain Array and especially key that we never nil the Array holding var.

NoMethodError: undefined method `<<' for nil:NilClass
        from /opt/puppetlabs/puppet/lib/ruby/gems/2.1.0/gems/nats-0.7.1/lib/nats/client.rb:874:in `send_command'
        from /opt/puppetlabs/puppet/lib/ruby/gems/2.1.0/gems/nats-0.7.1/lib/nats/client.rb:375:in `publish'
        from /opt/puppetlabs/puppet/lib/ruby/gems/2.1.0/gems/nats-0.7.1/lib/nats/client.rb:253:in `publish'

can not connect to nats server in docker

ruby version 1.9.1
Docker version 1.3.2
code:
require 'nats/client'
EM.run do
NATS.start(:uri => "nats://localhost:4222/")
end
report :EXITING! NATS connection failed: Could not connect to server on nats://localhost:4222/

allow discovery of servers to be disabled

The new feature to discover servers to connect to is great but not universally desirable.

NATS daemons that goes via NAT or some other form of intermediate hop/proxy network device will announce their private address and not the public 'service' address.

These may well be unroutable to clients and without any way of mapping those to routeable addresses this feature will introduce uneeded delays in reconnects while clients attempt to connect to things they never can.

So on by default but disablable would be great.

For example nodes behind EC2 or Scaleways clouds are assigned private IPs and you might want to have other infrastructure connect to these nodes, there's a cloud provider level NAT, but other nodes lets say on Google Cloud would never be able to connect to the announced server list.

nats-server does not die when given SIGTERM by foreman

Is it deliberate to not listen for SIGTERM; or can this be added?

Suspicious memory leak in NATS server

We've been operating a private Cloud Foundry for months, and found suspicios memory leak in NATS server's memory usage recently.

The image attached is a Munin memory usage view of the VM on which the NATS server is running. The usage of 'apps' has been growing, and the current value is approx. 1.3 GiB.

We assigned one VM for a NATS server, so the NATS server is almost only an application on the VM. Actually, the NATS server is using 1.3 GiB of memory. Here is the result of 'ps' command (aligned).

$ ps up 1109
USER       PID %CPU %MEM     VSZ     RSS TTY STAT START    TIME COMMAND
root      1109  1.3 66.2 1427528 1358728 ?   Sl   2012  1177:17 nats-server

We are running the NATS server on Ubuntu 12.04 and Ruby 1.9.2-p180.

Our questions are:

What is the cause of this? Do you have any experience like this? Is this a memory leak or not?
How can we specify the cause? How can we know this is a memory leak or not? Where should we check to analyze this phenomenon?
Is this a problematic situation or not? If it is, how can we solve the problem?

Thanks in advance.

nsnt

timing issue around @pending queue

I cannot show you code that reproduce this since simple code just works in most cases, seems to happen only on a very slow under spec VM I have when running a number of things on it...which suggests a timing issue.

But the symptoms are:

published messages just vanish
published messages get stuck internally to the Nats::Client and only get sent at a later stage

The client is written around an array of pending messages and a flush that happens in EM.next_tick, this is problematic since it assumes the work in the method is done before EM.next_tick happens.

https://github.com/nats-io/ruby-nats/blob/d0556a0715c04eb395ade2cf0a9b7b806cd87ff3/lib/nats/client.rb#L737

In that code should adding the command to the array not complete by the time next_tick happens - which could really be any time - messages will be lost or delayed.

I added some debug puts messages and can definitely see the next_tick driven flush happen mid adding of the command to the pending list - and flush will nil the pending list so it looses messages.

So depending on when exactly the next_tick fires it either nils the queue - so messages vanish, or it does nothing leaving the messages on the queue and no flush gets scheduled until for some other reason in the future a flush is called.

Changing the send_command method like this:

def send_command(command, priority = false) #:nodoc:
  needs_flush = connected? && @pending.nil?

  @pending ||= []
  @pending << command unless priority
  @pending.unshift(command) if priority
  @pending_size += command.bytesize

  EM.next_tick { flush_pending } if needs_flush

  flush_pending if (connected? && @pending_size > MAX_PENDING_SIZE)
  if (@options[:fast_producer_error] && pending_data_size > FAST_PRODUCER_THRESHOLD)
    err_cb.call(NATS::ClientError.new("Fast Producer: #{pending_data_size} bytes outstanding"))
  end
  true
end

which ensures the next_tick driven flush_pending doesnt happen mid creating of the data seems to resolve it reliably for me on my slow machine.

Can't connect error should not print user/pass

Sample script not working in 2.1.3

I run this:

require "nats/client"

NATS.start do

  # Simple Subscriber
  NATS.subscribe('foo') { |msg| puts "Msg received : '#{msg}'" }

  # Simple Publisher
  NATS.publish('foo.bar.baz', 'Hello World!')

  # Unsubscribing
  sid = NATS.subscribe('bar') { |msg| puts "Msg received : '#{msg}'" }
  NATS.unsubscribe(sid)

  # Requests
  NATS.request('help') { |response| puts "Got a response: '#{response}'" }

  # Replies
  NATS.subscribe('help') { |msg, reply| NATS.publish(reply, "I'll help!") }

  # Stop using NATS.stop, exits EM loop if NATS.start started the loop
  NATS.stop

end

But this happens:

dyld: lazy symbol binding failed: Symbol not found: _rb_enable_interrupt
  Referenced from: /Volumes/Low/David/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/extensions/x86_64-darwin-14/2.1.0-static/eventmachine-0.12.10/rubyeventmachine.bundle
  Expected in: flat namespace

dyld: Symbol not found: _rb_enable_interrupt
  Referenced from: /Volumes/Low/David/.rbenv/versions/2.1.3/lib/ruby/gems/2.1.0/extensions/x86_64-darwin-14/2.1.0-static/eventmachine-0.12.10/rubyeventmachine.bundle
  Expected in: flat namespace

[1]    44007 trace trap  ruby nats.rb

Ruby version is ruby 2.1.3p242 (2014-09-19 revision 47630) [x86_64-darwin14.0]

I suppose this is related to another gem or something so maybe it's just a matter of updating the manifest of the gem, but I don't know how to solve it and it's the most basic example in the readme so any help is much appreciated :)

Warnings on 1.9.2 on Windows

BTW, I kept getting the following warnings until I hacked the client.rb file - not sure if this matters or not, but I thought I'd mention it (I simply did "gem install nats" under Windows 7):

Ruby192/lib/ruby/gems/1.9.1/gems/nats-0.4.2/lib/nats/client.rb:391: warning: instance variable @buf not initialized
Ruby192/lib/ruby/gems/1.9.1/gems/nats-0.4.2/lib/nats/client.rb:397: warning: character class has duplicated range

Thanks
Peter

@pkukol

NATS.connected? and add NATS.reconnecting?

auto unsubscribe can make memory leak

Hi, I'm Kim

While i was testing Cloudfoundry, I found some memory leak. Its from NATS client.

Cloudfoundry cc sends msg via NATS client's request method with max expected response. Because having max expected response, NATS client sets it to auto unsubscribe. It makes work well for NATS server.. It will remove the subscribtion when its reponses hits expected number.. But NATS client is not.. Although getting hit its expected response number, it do not try to remove that subscription... So number of subscriptions grows in the client... it makes cc's memory leak..

If you put below codes before the end of on_msg method in the client.rb, it no longer makes any leak.. But Im not sure if it is proper location..

def on_msg
....
unsubscribe(sid) if (sub[:max] && (sub[:received] >= sub[:max])) # if auto unsubscribe, cancel subscription.
end

I expect public patch for it from next releases..

Thanks..

Replied data might be lost in request-reply processing.

Hi, thank you for providing so exciting software. I report a problem that I found.

I wrote a following example which send request to server and get replied data from it.
https://gist.github.com/userlocalhost2000/d6695bc8f9ec8984e6e3435f92d09b1d

I tried to check that all of the replied messages could be received accurately with it.

And these are results of it.

vagrant@local-ubuntu0:~$ for _i in `seq 1 20`
> do
> ruby check.rb
> done
Failed [100,43]
OK
Failed [100,87]
Failed [100,87]
Failed [100,86]
OK
OK
Failed [100,73]
Failed [100,86]
OK
OK
Failed [100,86]
OK
Failed [100,86]
Failed [100,87]
OK
Failed [100,87]
Failed [100,87]
OK
Failed [100,86]
vagrant@local-ubuntu0:~$

Some results failed to get whole replied data.
And here is my environment to execute it.

vagrant@local-ubuntu0:~$ ruby --version
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
vagrant@local-ubuntu0:~$ gnatsd --version
nats-server version 0.9.4
vagrant@local-ubuntu0:~$ gem search -l nats

*** LOCAL GEMS ***

nats (0.8.0)
vagrant@local-ubuntu0:~$

Thank you.

Allow use of eventmachine greater than 1.0.7

We have used git bisect to determine that this commit in eventmachine is what leads to test failures on OS X.

We currently have conflicting dependencies on eventmachine, and would like to update to a newer version, but are stuck because of nats' hard dependency of = 1.0.7.

SRV record support

hello,

I am considering adding SRV record support for finding the nats servers.

I'm thinking someting like NATS.start(:srv_domain => "example.net") would then lookup _nats-server._tcp.example.net and connect to whatever it finds there. In the order sorted by priority.

If given along with :servers it will try SRV first and then fall back to the hardcoded list.

Would you accept such a patch if I wrote it - don't want to spend a bunch of time and then discover you won't want it.

support custom certs in TLS

the NATS daemon supports specific certs and verified mode but in order to enable that the client must be able to set cert and keys

Nats-server holds messages with queue group

Hi.

I would like to know if the following situation is an issue that needs to be fixed or the way it is implemented.

Here's the situation I ran into.

When one of the subscribers' network is disconnected, the nats-server seems to hold the messages that are seemed to be sent to the disconnected subscriber until the network is reconnected.

e.g.
Hosts:
#1. Nats-server
#2. Nats-publisher
#3. Nats-subscriber with queue-group("QG1")
#4. Nats-subscriber with queue-group("QG1")

(#3 and #4 are the subscribers of #2's message and are in the same queue group)

#3 and #4 start connection to #1.
#2 start publishing messages that are listened by #3 and #4
(seems the subscribers are randomly picked up from a list)
#3's network gets unreachable
(#4 does NOT receive all the messages, only some)
#3's network gets back to reachable
#3 receives all the messages that are held by the server at once.

This causes some problems:

the held messages may get lost if the subscriber's process gets down after the network is disconnected.
the held messages may be received after the messages that are sent after.
e.g. subscriber gets the message#2 before #1
timeline:
publish msg #1 (one that is held)
publish msg #2 (one that is published after #1)

I think the server should send messages to subscribers that are really connected to the server:
in the above example, the server should send all messages to #4 as soon as #3's network gets unreachable.

Please let me know if this is something needs to be fixed or not.

Thank you,
kkas

Release a new version

d9fa437 looks delightful. Can we get a new version of the gem that includes it?

I'd love to see ".gem/ruby/2.2.4/gems/nats-0.5.1/lib/nats/client.rb:154: warning: epoll is not supported on this platform" show up less frequently in my logs.

Thanks!

unneeded dependencies

I presume thin is needed for the old ruby based nats server, its certainly not needed for the client.

What's the plan with the old server code, thin is a huge dependency that brings in a bunch of other dependencies it would be really great to avoid having to carry those around for people only after the client?

Servers are randomized after the first connect

When you provide a list of servers, and dont_randomize_servers = false, all clients connect to the first server and then randomize the list. Shouldn't the randomize occur before we connect to the "first" server?

What happens if the cluster forms cycle?

Hi, derek

Could u plz give me some tips about what happens if the cluster forms cycle? I did some tests but can not figure out the reason.

correctly handling retries

hello,

My use care requires the client to just infinitely reconnect as long as it can, ideally I want to also show useful logs on failure, retry and reconnection.

Your README shows:

NATS.start(:servers => ['nats://127.0.0.1:4222', 'nats://127.0.0.1:4223']) do |c|
  puts "NATS is connected to #{c.connected_server}"
  c.on_reconnect do
    puts "Reconnected to server at #{c.connected_server}"
  end
end

opts = {
  :dont_randomize_servers => true,
  :reconnect_time_wait => 0.5,
  :max_reconnect_attempts = 10,
  :servers => ['nats://127.0.0.1:4222', 'nats://127.0.0.1:4223', 'nats://127.0.0.1:4224']
}

NATS.connect(opts) do |c|
  puts "NATS is connected!"
end

This cannot possibly be right. When on_reconnect is called connected? is false and connected_server will return nil.

So I've been fruitlessly trying to write something that will:

Attempt to connect
Attempt to reconnect for ever
Log every time when its failed - indicating what server failed
Log every time when its retrying - what server its trying
Log every time when its connected - what server it connected to

These logs are essential for someone trying to debug whats failed and whats working.

Appears to me this should be enough (except maybe logging which server failed):

options = {
  :max_reconnect_attempts => -1,
  :reconnect_time_wait => 1,
  :dont_randomize_servers => true,
  :servers => server_list
}

NATS.on_error do |e|
   Log.error("Error in NATS connection: %s: %s" % [e.class, e.to_s])
   next unless e.is_a?(NATS::ConnectError)
   raise(e)
end

 NATS.start(options) do |c|
   Log.info("NATS is connected to %s" % c.connected_server)

   c.on_reconnect do
      Log.info("Reconnecting after connection failure '%s'" % c.connected_server)
   end
end

As above in the on_reconnect block c.connected_server is nil, and NATS.start sets the block to be the on_connect block. This block does not get called once successfully reconnected because you track this via @conn_cb_called and only call it once.

So what's the right way to achieve the logging I desire? This is no doubt easy for someone who knows EM but it's a bit of a spaghetti monster if you don't :)

Bi-directional route setup

Two servers referencing each other form a type of cycle in current cluster setup.

No documentation about having to start `nats-server`

I'm on the cluster branch and had trouble getting the examples to work. I didn't realise that nats-server had to be running before you could do anything at all with pub/sub stuff. Is this step missing from the docs or is there something wrong with my installation?

Is there any way to do a blocking request

I would like to do a blocking (synchronous) NATS#request. Can it be done at all?

symbolize on connect

This could cause resource exhaustion attacks while symbols are not garbage collected.

Issue routing to queue subscribers overs routes with reply subjects

0.5.2-beta.2

Reported via email..

Hi Derek:
I’am an Engineer of Baidu Inc，and doing research on CloudFoundry, which based on the NATS cluster v0.5.0.beta.2 lately. But there seems a bug @line 116 of server/server.rb:

Should there be a SPACE between #{reply} and #{msg.bytesize}?
When push an app to the Cloud Foundry Cluster，the NATS client can‘t parse the message when #receive_data(data)，and the original exception is :
MSG staging 3 _INBOX.76193b95e4945dfdf44f1a56451018
.* /home/work/local/cloudfoundry/stager/bin/stager/lib/vcap/stager/server.rb:82:in `block in install_error_handlers': Unknown protocol: MSG staging 3 _INBOX.76193b95e4945dfdf44f1a56451018 (NATS::ServerError)

I figured that the received data should had matched the MSG case，but matched to the UNKNOWN case due to the missing SPACE， Am I right？
well，when I add a SPACE to the right place，It works!

Waiting for your reply &
Thanks for your time reading this letter ~ ：）

Support for NATS Streaming

I would love to use NATS Streaming but it would appear the current ruby client doesn't support it; I tried it but the client dumped core!

How much work would this be? I have used protobuf extensively and have a good deal of experience with EventMachine & Rabbitmq (I'm the author of Smith). I don't have a great deal of spare time but I would be interested in helping out.

Autostart default behavior should be suppressed by ENV

By default the client will autostart a server if it needs to, we should allow an ENV to suppress that unless overridden.

@olegshaldybin

Client-only gem

It'll be nice to be able to install the gem and not have to install the dependencies for the server if you're only using the client. I'd be willing to do the work to split this out into a nats-client gem or something similar if that's cool with you.

Compiled binary?

Hello. I'm new to Ruby, but is there a way to make a compiled binary out of your nats-pub and nats-sub scripts in bin? ATM on alpine linux I need to download around 300 MB in packages to get to it, i.e. apk --update add ruby ruby-dev ruby-rdoc ruby-irb gcc libc-dev make g++ then gem install nats. It'd be nice if there is a released binary of the order of a few Megabytes that I can download and run .. something along the lines of gosu's releases (in golang)

Edit:

I tried to use rubyscript2exe.rb:

/shadi # gem install nats
/shadi # ruby rubyscript2exe.rb /usr/bin/nats-sub
/tmp/tar2rubyscript.d.60.1/rubyscript2exe/init.rb:103:in `<top (required)>': uninitialized constant Config (NameError)
Did you mean?  RbConfig
        from rubyscript2exe.rb:626:in `load'
        from rubyscript2exe.rb:626:in `block in <main>'
        from rubyscript2exe.rb:577:in `block in newlocation'
        from rubyscript2exe.rb:505:in `block in newlocation'
        from rubyscript2exe.rb:472:in `newlocation'
        from rubyscript2exe.rb:505:in `newlocation'
        from rubyscript2exe.rb:577:in `newlocation'
        from rubyscript2exe.rb:619:in `<main>'

Trying to build rubyscript2exe, it seems its dependency tar2rubyscript is broken (I got same error from tar2rubyscript issue), and it's stale (last edit is 8 years ago).

Fix race condition on unsubscribe

You can receive data for unsubscribed sid and current behavior raises exception.

Do not throw a warning when epoll is not supported.

Using the client on OSX results in the following warning: nats-0.5.1/lib/nats/client.rb:154: warning: epoll is not supported on this platform. It would be great to suppress this warning.

Here is a PR that does something similar: tweetstream/tweetstream#182

"Unknown Protocol" error in client

In our CloudFoundry deployment, the CloudController (the NATS client) is getting below error intermittently when the CloudController receives the "dea.advertise" message from NATS (gnatsd) server

{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,"8fbd89d3-d26c-47b2-94de-9b774bb9687f":1,"a432a6b2-c60a-4801-ba28-cf52811be799":1,"7a98cdb9-630d-4a21-b2b9-8c2012eacc43":1,"3736f818-a39b-4e18-a0a0-afe7f748bd2a":1,"dcf171b3-c13c-4ed9-9INFO {"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}

The "INFO" message gets messed with the "dea.advertise" message.

Now I'm guessing below is the scenario which has the above problem:

DEA publishes the below "dea.advertise" message to NATS server:

NATS server tries to relay the message to CC, but during the communication with CC, the network gets into problem, so only part of the message reaches to CC, as below:

{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,

CC NATS client receives the part of the message and stores it into @buf (https://github.com/derekcollison/nats/blob/v0.5.0.beta.12/lib/nats/client.rb#L521).
Due to the network problem, CC NATS client loses the connection to NATS server, and it tries to re-connect to NATS server.
NATS server receives the new connection from the CC, and sends below "INFO" message back to CC NATS client. (NATS server always sends an "INFO" message when a client connection gets created https://github.com/apcera/gnatsd/blob/master/server/server.go#L373）.

INFO {"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}

CC NATS client receives the "INFO" message and appends it to @buf, then the content of @buf is below:
{"id":"10-7ee2570336fe4839974e49ee412ccdcb","stacks":["lucid64"],"available_memory":6336,"available_disk":441760,"app_id_to_count":{"2beae3d0-238c-4ec6-a191-890c32a12554":1,INFO {"server_id":"5093c491bc3d437e75408f144dd3643f","version":"0.5.0","host":"xx.xx.xx.xx","port":4222,"auth_required":true,"ssl_required":false,"max_payload":1048576}

Maybe the @buf needs to be clean when the client do the reconnection: https://github.com/derekcollison/nats/blob/v0.5.0.beta.12/lib/nats/client.rb#L720