shopify / semian Goto Github PK
View Code? Open in Web Editor NEW:monkey: Resiliency toolkit for Ruby for failing fast
License: MIT License
:monkey: Resiliency toolkit for Ruby for failing fast
License: MIT License
Using the quota
option effectively sets the tickets
option to a percentage of the total tickets available. How is this total ticket count determined by semian if using, for instance, the net_http
adapter?
Currently, we express error thresholds as the number of failures (error_threshold
) in a certain time period (error_timeout
). After that threshold is reached, we open the circuit, and only close it again after a certain number of successful requests (success_threshold
) are reached.
This requires intimate knowledge of your request patterns. A more flexible model is to use an error percentage threshold to determine when to open the circuit. Instead of saying 3 failures in 5 seconds, one might say over 10% of requests failed.
Either add a new parameter, error_percent_threshold
or allow error_threshold
to be expressed as a percentage (e.g. "10%"
).
Maintain either a large sliding window of successes and errors to compute percentages, or perhaps a set of counters to reduce the overall size of the windows.
Is there any reason you are not using the semaphore implementation from concurrent-ruby?
https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent/atomic/semaphore.rb
It would give semian support for more rubies.
We should increment the errors count of the circuit breaker for MySQL connection errors (similar to #37)
But MySQL have only one exception class, so it's slightly dirty. Maybe we could improve the situation upstream.
We are trying to use Semian for our application. The Net::HTTP
adapter works perfectly and I also wrote a custom adapter for logstash-logger.
Unfortunately, the MySQL adapter is behaving strangely. I was able to reproduce it in a small script:
require 'active_record'
require 'semian'
require 'semian/mysql2'
db_config = {
adapter: 'mysql2',
pool: 8,
timeout: 2,
host: 'toxiproxy',
port: 3307,
database: '...',
username: '...',
password: '...',
reconnect: true,
connect_timeout: 10,
read_timeout: 10,
semian: {
name: 'test-db',
bulkhead: false,
success_threshold: 1,
error_threshold: 3,
error_timeout: 10
}
}
ActiveRecord::Base.establish_connection(db_config)
loop do
begin
sleep 1
puts ActiveRecord::Base.connection.execute('select now()')
rescue StandardError => exc
puts exc
end
end
The output is:
# ruby test-semian.rb
[mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
I, [2018-12-07T09:59:32.641238 #65] INFO -- : [Semian::CircuitBreaker] State transition from closed to open. success_count=0 error_count=3 success_count_threshold=1 error_count_threshold=3 error_timeout=10 error_last_at="1544173172"
[mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
[mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Semian::OpenCircuitError caused by [mysql_test-db] Can't connect to MySQL server on 'toxiproxy' (111)
^Ctest-semian.rb:35:in `sleep': Interrupt
The circuit-breaker opened nicely as configured, but then the exception message is getting longer and longer. This doesn't happen with the Net::HTTP
adapter.
Is anything wrong with the configuration? Or is it because we are using ActiveRecord?
I am using the following versions:
ruby 2.3.6
semian (0.8.5)
mysql2 (0.5.1)
activerecord (5.1.6)
(I also tested with ruby 2.5.3 and mysql2 0.5.2 with the same result.)
I am having trouble to understand how to use the Semian gem to implement a circuit breaker around a HTTP call. I went through the README multiple times. As far as I understand, I need to use the NetHTTP adapter, but where do I put the code?
The HTTP urls I want to wrap in circuit breaker are in separate models like the following:
class Demo
def call_service_1 do
return Net::HTTP.get_response("service1_url.com").body
end
I have multiple models like these calling service 1, 2, 3 etc. I just want to enable circuit breaker for the Urls, and not turn the whole classes to adapters. How can I achieve this?
Note: I am new to Ruby, so maybe you need to give me some more context.
Getting good resiliency parameters is hard. I have some ideas on the maths here (esp for bulkheads), but writing something to simulate traffic + an architecture + failing components could be an interesting way to optimize. Or maybe just being better at maths than I am.
I've run into an issue with a semaphore array disappearing from the system while the app is running. I can't find any details in the logs about what would cause it, but it started happening pretty much as we moved from ubuntu 14.04 to 18.04. There are no other changes that I could see that would be related here.
The system is running with ruby 2.5.5. The exception we get is:
Semian::SyscallError: semop() failed, errno: 22 (Invalid argument)
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/protected_resource.rb line 50 in acquire
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/protected_resource.rb line 50 in acquire_bulkhead
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/protected_resource.rb line 24 in block in acquire
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/protected_resource.rb line 38 in block in acquire_circuit_breaker
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/circuit_breaker.rb line 141 in maybe_with_half_open_resource_timeout
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/circuit_breaker.rb line 30 in acquire
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/protected_resource.rb line 37 in acquire_circuit_breaker
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/protected_resource.rb line 23 in acquire
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/adapter.rb line 34 in acquire_semian_resource
File .../vendor/bundle/ruby/2.5.0/gems/semian-0.8.8/lib/semian/net_http.rb line 83 in connect
This is with the latest released semian.
The issue starts occurring a number of hours after the deployment, without any obvious pattern of traffic.
I tracked the call down to:
10574.300 ( 0.015 ms): ruby/21041 semtimedop(semid: 131072, tsops: 0x7ffff92379c2, nsops: 1, timeout: 0x7ffff9237aa8) = -1 EINVAL Invalid argument
where the semid: 131072
doesn't exist on the system (normally we have 2 semaphore arrays, but this system had only 1). This was validated using ipcs -s
.
Please let me know if there's any more debugging information I can provide.
Noticed that toxiproxy is defined twice in the Gemfile:
This may cause some side effects with version bumps moving forward.
Heya,
I've implemented the instrumentation for keeping an eye on the status of semian adapters however one that would helpful to add would be the 'timed out waiting for resource' exceptions which are raised from the C extension. This would be handy to see when configuring ticket counts and trying to find that sweet spot for your application.
I'm not sure how to best achieve (or even if it's possible) but if someone would like to point me in the right direction, I'm more than happy to take a swing at it.
Thanks!
If a project is running ruby 2.7 there are some depreciation warnings. Here are some that pop up in my project for keyword parameters
:
/Users/michaelmenanno/.gem/ruby/2.7.1/gems/semian-0.10.1/lib/semian/mysql2.rb:120: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/Users/michaelmenanno/.gem/ruby/2.7.1/gems/semian-0.10.1/lib/semian/adapter.rb:32: warning: The called method `acquire_semian_resource' is defined here
/Users/michaelmenanno/.gem/ruby/2.7.1/gems/semian-0.10.1/lib/semian.rb:251: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/Users/michaelmenanno/.gem/ruby/2.7.1/gems/semian-0.10.1/lib/semian.rb:290: warning: The called method `require_keys!' is defined here
/Users/michaelmenanno/.gem/ruby/2.7.1/gems/semian-0.10.1/lib/semian/simple_sliding_window.rb:48: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/Users/michaelmenanno/.gem/ruby/2.7.1/gems/semian-0.10.1/lib/semian/simple_sliding_window.rb:15: warning: The called method `initialize' is defined here
Currently, when the error_timeout
expires, the next acquisition request for a circuit will cause a transition from open
to half_open
. In this state, workers will attempt to access the resource with a modified timeout of half_open_resource_timeout
. The motivation here is that the modified timeout is much lower than the client timeout so if the resource is still unhealthy, it will fail fast(er).
In the current implementation, every available worker (subject to the bulkhead configuration) will attempt the half_open
-> closed
transition. This means that if the resource is still unhealthy, all the workers could potentially block for half_open_resource_timeout
seconds, reducing overall node capacity.
Mathematically, this means that t[half-open] / (t[half-open] + t[error_timeout])
will be spent attempting to re-open the circuit. If t[half-open]
is 1.0s and t[error-timeout]
is 5.0s (our MySQL defaults) then 16.7% of our capacity will go toward re-opening the circuit. If bulkheads are in place with a quota of 0.5, that number will be 8.3%.
When a circuit opens, the number of available tickets should immediately drop to 1. This shields the rest of the workers from this unhealthy resource. This is marginally faster than the open circuit error, since bulkhead acquisition is attempted before circuit-breaker acquisition, but that's likely not a big deal.
When the transition happens from open
to half_open
, we can raise the number of available tickets to success_threshold
, to allow parallel re-closing of the circuit. Once the circuit is finally re-closed, we can raise the number of available tickets back to the original tickets/quota
value.
Need to finish 4.1 and do 4.2, 4.3 and 5.0
We should use SysV
shared memory of SysV
semaphores to share the state of the circuit breakers between all the workers on the host.
I want to take a crack at it this weekend
\cc @csfrancis @byroot
/Users/larouxn/.gem/ruby/3.0.3/gems/semian-0.11.6/lib/semian/mysql2.rb:115: warning: rb_tainted_str_new_cstr is deprecated and will be removed in Ruby 3.2
This occurs during runtime. Understandably the same issue as brianmario/mysql2#1232.
Is there any reason for not having an adapter for postgresql?
We can likely just use something built into libc to hash the resources. It'd be nice to have collision detection, but it's not really a must-have since it's so unlikely.
Hi,
I have question about using Semian.
I configured simple NetHttp adapter (like in README):
config/initializers/semian.rb
SEMIAN_PARAMETERS = {
tickets: ENV['SEMIAN_TICKETS'].to_i,
success_threshold: ENV['SEMIAN_SUCCESS_THRESHOLD'].to_i,
error_threshold: ENV['SEMIAN_ERROR_THRESHOLD'].to_i,
error_timeout: ENV['SEMIAN_ERROR_TIMEOUT'].to_i
}.freeze
Semian::NetHTTP.exceptions += [::OpenSSL::SSL::SSLError]
Semian::NetHTTP.semian_configuration = proc do |host, _port|
case(host)
when 'site1.com')
SEMIAN_PARAMETERS.merge(name: 'site_1')
when 'site2.com')
SEMIAN_PARAMETERS.merge(name: 'site_2')
else
nil
end
end
I tested it on development, and all was looks good. But problem was shows on after production deployment. When I try execute http request from rails console then I getting error:
2.3.6 :090 > RestClient.get('https://site1.com')
Net::ResourceBusyError: [nethttp_site_1] semget() failed, errno: 13 (Permission denied)
from /usr/local/rvm/gems/ruby-2.3.6/gems/semian-0.8.3/lib/semian/adapter.rb:40:in `rescue in acquire_semian_resource'
from /usr/local/rvm/gems/ruby-2.3.6/gems/semian-0.8.3/lib/semian/adapter.rb:32:in `acquire_semian_resource'
from /usr/local/rvm/gems/ruby-2.3.6/gems/semian-0.8.3/lib/semian/net_http.rb:83:in `connect'
from /usr/local/rvm/rubies/ruby-2.3.6/lib/ruby/2.3.0/net/http.rb:863:in `do_start'
from /usr/local/rvm/rubies/ruby-2.3.6/lib/ruby/2.3.0/net/http.rb:852:in `start'
from /usr/local/rvm/gems/ruby-2.3.6/gems/rest-client-2.0.2/lib/restclient/request.rb:715:in `transmit'
from /usr/local/rvm/gems/ruby-2.3.6/gems/rest-client-2.0.2/lib/restclient/request.rb:145:in `execute'
from /usr/local/rvm/gems/ruby-2.3.6/gems/rest-client-2.0.2/lib/restclient/request.rb:52:in `execute'
from /usr/local/rvm/gems/ruby-2.3.6/gems/rest-client-2.0.2/lib/restclient.rb:67:in `get'
from (irb):90
from /usr/local/rvm/gems/ruby-2.3.6/gems/railties-4.2.5/lib/rails/commands/console.rb:110:in `start'
from /usr/local/rvm/gems/ruby-2.3.6/gems/railties-4.2.5/lib/rails/commands/console.rb:9:in `start'
from /usr/local/rvm/gems/ruby-2.3.6/gems/railties-4.2.5/lib/rails/commands/commands_tasks.rb:68:in `console'
from /usr/local/rvm/gems/ruby-2.3.6/gems/railties-4.2.5/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
from /usr/local/rvm/gems/ruby-2.3.6/gems/railties-4.2.5/lib/rails/commands.rb:17:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
In this time Semian.resources
return blank hash.
Could you explain me what is wrong? I don't understand this error.
@sirupsen @csfrancis I was trying to use semian on an x64 Ubuntu 14.10 installation and I got Semian is not supported on x86_64-linux-gnu - all operations will no-op
The current check is just end_with?('-linux')
which clearly isn't sufficient, though I don't know enough about the format of RUBY_PLATFORM
to say what else is needed.
#230 surfaced a failure that was not marked by Semian to raise a ResolveError
.
#230 (review) suggested grepping to find more of these messages but there's little confidence that it's 100% complete.
We need to come up with a more systematic way of covering these failures.
This warning is present in current builds:
This job ran on our legacy infrastructure. Please read our docs on how to upgrade`
Semian instruments state changes in circuit breakers, but the current documentation does not reflect this:
semian/lib/semian/circuit_breaker.rb
Line 147 in 7a5adb7
# `event` is `success`, `busy`, `circuit_open`.
# `resource` is the `Semian::Resource` object
# `scope` is `connection` or `query` (others can be instrumented too from the adapter)
# `adapter` is the name of the adapter (mysql2, redis, ..)
Semian.subscribe do |event, resource, scope, adapter|
StatsD.increment("semian.#{event}", 1, tags: {
resource: resource.name,
adapter: adapter,
type: scope,
})
end
I might prepare a PR as soon as I find some spare time.
Adapter to httprb (https://github.com/httprb/http) gem
I made the adapter but could't submit a PR
require 'semian/adapter'
require 'http'
module Semian
module HTTPrb
include Semian::Adapter
class SemianError < ::HTTP::Error
def initialize(semian_identifier, *args)
super(*args)
@semian_identifier = semian_identifier
end
end
class HTTPResponseError < ::HTTP::Error
attr_reader :response
def initialize(response)
super("#{response.code} #{response.reason}")
@response = response
end
end
ResourceBusyError = Class.new(SemianError)
CircuitOpenError = Class.new(SemianError)
class SemianConfigurationChangedError < RuntimeError
def initialize(msg = "Cannot re-initialize semian_configuration")
super
end
end
def semian_identifier
"httprb_#{raw_semian_options[:name]}"
end
DEFAULT_ERRORS = [
::SocketError,
::HTTP::ConnectionError,
::HTTP::RequestError,
::HTTP::ResponseError,
::HTTP::StateError,
::HTTP::TimeoutError,
::HTTP::HeaderError,
::EOFError,
::IOError,
::SystemCallError, # includes ::Errno::EINVAL, ::Errno::ECONNRESET, ::Errno::ECONNREFUSED, ::Errno::ETIMEDOUT, and more
Semian::HTTPrb::HTTPResponseError,
].freeze
class << self
attr_accessor :exceptions
attr_reader :semian_configuration
@uri = nil
def semian_configuration=(configuration)
raise Semian::HTTPrb::SemianConfigurationChangedError unless @semian_configuration.nil?
@semian_configuration = configuration
end
def retrieve_semian_configuration(host, port)
@semian_configuration.call(host, port) if @semian_configuration.respond_to?(:call)
end
def reset_exceptions
self.exceptions = Semian::HTTPrb::DEFAULT_ERRORS.dup
end
end
Semian::HTTPrb.reset_exceptions
def raw_semian_options
@raw_semian_options ||= begin
uri_match = @uri.scan(URI::DEFAULT_PARSER.make_regexp)[0]
host = uri_match[3]
port = uri_match[4]
path = uri_match[6]
@raw_semian_options = Semian::HTTPrb.retrieve_semian_configuration("#{host}#{path}", port)
@raw_semian_options = @raw_semian_options.dup unless @raw_semian_options.nil?
end
end
def resource_exceptions
Semian::HTTPrb.exceptions
end
def disabled?
raw_semian_options.nil?
end
def request(verb, uri, opts = {})
@uri = uri
return super(verb, uri, opts) if disabled?
begin
acquire_semian_resource(adapter: :http, scope: :connection) do
response = super(verb, uri, opts)
raise HTTPResponseError.new(response) if response.status.server_error?
response
end
end
end
private
def handle_error_responses(result)
if raw_semian_options.fetch(:open_circuit_server_errors, false)
semian_resource.mark_failed(result) if result.is_a?(::HTTP::Error)
end
result
end
end
end
HTTP::Client.prepend(Semian::HTTPrb)
Because we can now disable bulkheads or circuit_breakers for a protected_resource, we should raise an exception trying to access methods of a nil delegate.
For example, with bulkhead: false
, resource.count
should raise a BulkheadDisabledError
rather than a NoMethodError
kinda thing.
We should also add something to the docs that recommends using resource.bulkhead.count
over resource.count
.
This might be super nice to have an instrumentation point for.
When using semian in puma and enable bulkhead, there will be some errors raised. How about disabling bulkhead by default to avoid this problem?
A conversation with @rafaelfranca turned into: Why not have the "built-in" adapters load automatically?
This would be a breaking change but would reduce the barrier to verifying that Semian is being used correctly.
WDYT?
cc: @sirupsen, @csfrancis, @fw42
While trying to figure out how to use Semian NetHTTP adapter, I found this INFO log.
INFO -- : Semian sysv semaphores are not supported on x86_64-darwin21 - all operations will no-op
I was not sure whether this is concerning, so I opened an issue.
Hi guys,
Since there's been quite some changes between v0.6.2...a58a1c7 will you guys release a new version soon?
If Semian was a proxy it'd be simpler to use, and could be opt-out instead of opt-in for apps in an organization.
Some inspriation could be taken from: https://github.com/vektra/templar
It possibly would be re-implemented in something like Go. This also solves issues such as sharing state between containers.
Following the monitoring instructions in the readme:
# `event` is `success`, `busy`, `circuit_open`.
# `resource` is the `Semian::Resource` object
# `scope` is `connection` or `query` (others can be instrumented too from the adapter)
# `adapter` is the name of the adapter (mysql2, redis, ..)
Semian.subscribe do |event, resource, scope, adapter|
StatsD.increment("Shopify.#{adapter}.semian.#{event}", 1, tags: [
"resource:#{resource.name}",
"total_tickets:#{resource.tickets}",
"type:#{scope}",
])
end
Results in an Error:
NoMethodError: undefined method `tickets' for #<Semian::CircuitBreaker:0x0000558e2ec77ba0>
It seems the resource
is a Semian::CircuitBreaker
, and not a Semian::Resource
!
Introduced by: #238
Semian Version: 0.8.9
I was poking around and found what I think are a few (hypothetical) issues when some semian methods are called concurrently. It's possible I'm missing something in the architecture (e.g are adapter methods expected to always be mutexed by the parent driver the way redis is?) but if not then here are some things I noticed:
retrieve_or_register
creating multiple instances of the same circuit-breaker and protected resourcerequest_allowed?
can "lose" successful responses by triggering multiple transitions to the half-open stateI've been reading about fuse, a mature circuit breaker library for Erlang (a platform known for "resiliency by default").
In circuit breakers configuration, they have two fuse types (you can think of them similar to toxics in Toxiproxy):
{standard, MaxR, MaxT}
. These are fuses which tolerate MaxR
melt attempts in a MaxT
window, before they break down.{fault_injection, Rate, MaxR, MaxT}
. This fuse type sets up a fault injection scheme where the fuse fails at rate Rate
, an floating point value between 0.0
β1.0
. If you enter, say 1 / 500
then roughly every 500th request will se a blown
fuse, even if the fuse is okay. This can be used to add noise to the system and verify that calling systems support the failure modes appropriately. The values MaxR
and MaxT
works as in a standard fuse.IMO, the idea of injecting faults through a circuit breaker is brilliant. Not every organization has adopted chaos engineering yet, but this could be a first step towards that, at least on the application level.
We should think about adopting this idea in Semian. The biggest concern would probably be development environment vs production: do we inject faults when it's running locally or on CI? If yes, how do we prevent test flakiness? Or should we do this only in production?
Something I've been looking into lately is how we can combat the stampeding herd effect we occasionally incur once a system has recovered and it is able to receive traffic again. One approach I've explored is using expotential backoff and I was looking to find out if this is something you'd consider adding to semian? I think semian is a sensible place to put this because it already has knowledge of the tickets/quotas, error rates and could use it's already available data to make decisions on how much to push out the backoff by without needing to query another resource.
Also open to hearing about how you've addressed this at Shopify if you've got a good handle on it in other ways π
Do we have a library in nodejs environment for Semian and circuit breaker so that we can use in nodejs environment with toxiproxy
We should pass the ticket count to #acquire so we can instrument it
A big pain point of configuring semian, is that you might have host with different amounts of processes.
It could be interesting to investigate a way to configure Semain to dynamically define the number of tickets based on the processes count.
e.g. I want ceil(0.2 * process_count)
tickets.
cc @sirupsen
cc @sirupsen
Hello all!
I've been looking through the docs and googling and couldn't figure out what these errors actually mean, so I wanted to just ask.
I'm happy to make a PR to update the readme if anyone has time to help me figure these out. I think it would help others as well!
Net::ReadTimeout/Net::OpenTimeout
This one below, I'm fairly confident is caused by the connection exceeding the read_timeout or open_timeout options in net/http.
[nethttp_example.com_443] Semian::OpenCircuitError caused by Net::ReadTimeout
timed out waiting for resource
This one is related to https://github.com/Shopify/semian/blob/master/ext/semian/resource.c#L60, but I'm not totally certain what EAGAIN actually means in this context or why this would be caused. https://stackoverflow.com/a/28868162.
My best guess is that when trying to acquire the semaphore, the underlying OS didn't give one in time so Semian gave up. Is this something that one can even fix?
[nethttp_example.com_443] Semian::OpenCircuitError caused by timed out waiting for resource 'nethttp_example.com_443'
execution expired
Okay, this last one I'm super stumped by. It seems to be related to https://github.com/ruby/ruby/blob/master/lib/timeout.rb#L94, but I'm really struggling to track back to what could possibly cause this if it's not Net::HTTP. Unless maybe it's Rack timing out the entire Ruby process or something? Super open to ideas.
[nethttp_example.com_443] Semian::OpenCircuitError caused by execution expired
Thank you for any pointers or advice. Once I feel like I have a basic understanding, I'll be happy to make a PR to describe these and do all the word smithing! Thank you!
$ uname -a
Linux orion.dev 5.5.7-200.fc31.ppc64le #1 SMP Fri Feb 28 17:07:46 UTC 2020 ppc64le ppc64le ppc64le GNU/Linux
$ gcc --version
gcc (GCC) 10.0.1 20200216 (Red Hat 10.0.1-0.8)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ld --version
GNU ld version 2.34-2.fc32
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
$ make --version
GNU Make 4.2.1
Built for powerpc64le-redhat-linux-gnu
Copyright (C) 1988-2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ make
compiling semian.c
compiling tickets.c
linking shared-object semian/semian.so
/usr/bin/ld: semian.o:(.bss+0x60): multiple definition of `eSyscall'; resource.o:(.bss+0x28): first defined here
/usr/bin/ld: semian.o:(.bss+0x58): multiple definition of `eTimeout'; resource.o:(.bss+0x20): first defined here
/usr/bin/ld: semian.o:(.bss+0x50): multiple definition of `eInternal'; resource.o:(.bss+0x18): first defined here
/usr/bin/ld: semian.o:(.bss+0x48): multiple definition of `id_wait_time'; resource.o:(.bss+0x10): first defined here
/usr/bin/ld: semian.o:(.bss+0x40): multiple definition of `id_timeout'; resource.o:(.bss+0x8): first defined here
/usr/bin/ld: semian.o:(.bss+0x38): multiple definition of `system_max_semaphore_count'; resource.o:(.bss+0x0): first defined here
/usr/bin/ld: sysv_semaphores.o:(.bss+0x10): multiple definition of `eSyscall'; resource.o:(.bss+0x28): first defined here
/usr/bin/ld: sysv_semaphores.o:(.bss+0x0): multiple definition of `eInternal'; resource.o:(.bss+0x18): first defined here
/usr/bin/ld: sysv_semaphores.o:(.bss+0x8): multiple definition of `eTimeout'; resource.o:(.bss+0x20): first defined here
/usr/bin/ld: tickets.o:(.bss+0x10): multiple definition of `eSyscall'; resource.o:(.bss+0x28): first defined here
/usr/bin/ld: tickets.o:(.bss+0x8): multiple definition of `eTimeout'; resource.o:(.bss+0x20): first defined here
/usr/bin/ld: tickets.o:(.bss+0x0): multiple definition of `eInternal'; resource.o:(.bss+0x18): first defined here
collect2: error: ld returned 1 exit status
make: *** [Makefile:261: semian.so] Error 1
The program could be compiled if I set LDFLAG
explicitly with --allow-multiple-definition
Wondering if the team could resolve these warnings sot that semian could be compiled without --allow-multiple-definition
flag?
math
Need docs on how to configure Semian for use with Sidekiq.
cc: @Shopify/servcomm, @ericroberts
It would be good to be able to configure semian without having to specify a ticket count, or to be able to explicitly disable the ticket count feature altogether.
You might not want to limit the number of concurrent requests to a resource. This is the case in shopify-app-store, where we donβt want to limit the number of concurrent requests to the Shopify API.
Hi, I'm trying to get my head around the relationship between the "SEMIAN_SEMAPHORES_DISABLED" environment variable whether semaphores are actually utilized by the library. I wholeheartedly admit that most of my confusion is due to my lack of understanding about Ruby's C extensions, so my apologies for that. Here's my question though...
If I set SEMIAN_SEMAPHORES_DISABLED=1 then the following if statement should execute the else block:
https://github.com/Shopify/semian/blob/master/lib/semian.rb#L177
if Semian.semaphores_enabled?
require 'semian/semian'
else
Semian::MAX_TICKETS = 0
end
If that's the case does that mean that the C extension code is not pulled in? Assuming yes, does that mean that semaphore calls will not be performed?
Ultimately, I ask this to determine whether the SEM_UNDO threading issue would be avoided in this case. Am I barking up the wrong tree?
The gist of what is proposed here is to allow us to eliminate the assumption that there are a fixed number of workers (resource consumers) on a particular host. In a more dynamic scheduling environment (think: kubernetes), we cannot be certain of the number of resource consumers on a given host.
This is problematic, because under the current model we would have a ticket quota of a fixed size for a fixed number of workers. For illustration:
So, since W is no longer static, we need T to be able to react to it, such to preserve Q at 0.5.
The proposed implementation is as follows:
A nuance to this is:
When workers unregister themselves (they're killed or stop and SEM_UNDO does its thing), the worker count needs to be adjusted by something. We can cache the worker count in a semaphore in the semaphore set for the resource. On #acquire if it's different, we call update_ticket_count. It seems better for this reason to do it at #acquire time rather than #register time.
The version hosted on RubyGems is quite out of date, and there are problems getting new builds to pass on CI: #168
Hi guys, love the library, but I wanted to see if you would be open/interested in adding support for error percent in addition to the current absolute error threshold? I find dealing with error rates in terms of percentages much more flexible than absolute values. If you're open to it I would be interested in trying to tackle it and send you a pull request. However, I don't want to maintain a fork of the project, so only want to go down this path if it is likely to get integrated back into the main code line.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.