voxpupuli / puppet-corosync Goto Github PK
View Code? Open in Web Editor NEWSets up and manages Corosync.
Home Page: https://forge.puppet.com/puppet/corosync
License: Apache License 2.0
Sets up and manages Corosync.
Home Page: https://forge.puppet.com/puppet/corosync
License: Apache License 2.0
I followed the example to set up some primitives with the current master of puppetlabs-corosync.
Unless I explicitly define the sample primitive as follows:
cs_primitive { 'nginx_vip':
primitive_class => 'ocf',
primitive_type => 'IPaddr2',
provided_by => 'heartbeat',
parameters => { 'ip' => '172.16.210.100', 'cidr_netmask' => '24' },
operations => { 'monitor' => { 'interval' => '10s' } },
utilization => {},
}
I get the following error:
err: Cs_primitive[nginx_vip]: Could not evaluate: undefined method `empty?' for nil:NilClass
The key is to explicitly define
utilization => {},
The error happens here and I suppose this was introduced with #41 however I don't really understand why the same problem doesn't exist for the metadata
attribute?
Using puppet 2.6.
Corosync doesn't manage resources. Corosync provides reliable communication between nodes, manages cluster membership and determines quorum. Pacemaker is a cluster resource manager (CRM) that manages the resources that make up the cluster, such as IP addresses, mount points, file systems, DRBD devices, services such as MySQL or Apache and so on. Basically everything that can be monitored, stopped, started and moved around between nodes.
Pacemaker does not depend on Corosync, it could use Heartbeat (v3) for communication, membership and quorum instead. Corosync could also work without Pacemaker, for example with Red Hat's CMAN.
This is a documentation problem but also reflected in the names of the types this module provides. The Linux HA stack and its history as well as various cluster components are already confusing enough so it is important to not mix up terms.
I'll submit a PR for the Readme but I don't think it will be possible to rename the types this module provides at this point.
the following function is huge, not very ruby-like, and can be split in at least two functions (between if and else) and multiple helpers.
def self.instances
block_until_ready
instances = []
cmd = [ command(:crm), 'configure', 'show', 'xml' ]
if Puppet::PUPPETVERSION.to_f < 3.4
raw, status = Puppet::Util::SUIDManager.run_and_capture(cmd)
else
raw = Puppet::Util::Execution.execute(cmd)
status = raw.exitstatus
end
doc = REXML::Document.new(raw)
doc.root.elements['configuration'].elements['constraints'].each_element('rsc_colocation') do |e|
rscs = []
items = e.attributes
if items['rsc']
# The colocation is defined as a single rsc_colocation element. This means
# the format is rsc and with-rsc. In the type we chose to always deal with
# ordering in a sequential way, which is why we reverse their order.
if items['rsc-role']
rsc = "#{items['rsc']}:#{items['rsc-role']}"
else
rsc = items['rsc']
end
if items ['with-rsc-role']
with_rsc = "#{items['with-rsc']}:#{items['with-rsc-role']}"
else
with_rsc = items['with-rsc']
end
# Put primitives in chronological order, first 'with-rsc', then 'rsc'.
primitives = [with_rsc , rsc]
else
# The colocation is defined as a rsc_colocation element wrapped around a single resource_set.
# This happens automatically when you configure a colocation between more than 2 primitives.
# Notice, we can only interpret colocations of single sets, not multiple sets combined.
# In Pacemaker speak, this means we can support "A B C" but not e.g. "A B (C D) E".
# Feel free to contribute a patch for this.
primitives = []
e.each_element('resource_set') do |rset|
rsetitems = rset.attributes
# If the resource set has a role, it will apply to all referenced resources.
if rsetitems['role']
rsetrole = rsetitems['role']
else
rsetrole = nil
end
# Add all referenced resources to the primitives array.
rset.each_element('resource_ref') do |rref|
rrefitems = rref.attributes
if rsetrole
# Make sure the reference is stripped from a possible role
rrefprimitive = rrefitems['id'].split(':')[0]
# Always reuse the resource set role
primitives.push("#{rrefprimitive}:#{rsetrole}")
else
# No resource_set role was set: just push the complete reference.
primitives.push(rrefitems['id'])
end
end
end
end
colocation_instance = {
:name => items['id'],
:ensure => :present,
:primitives => primitives,
:score => items['score'],
:provider => self.name
}
instances << new(colocation_instance)
end
instances
end
my perception is that those helpers could also be used in other cs_* providers.
@fghaas commented that he had to resort to Puppet 3.6
when deploying on CentOS 7, because with 3.8
there was severe breakage.
This task need work after #209 has been merged.
Acceptance tests contain
unless fact('osfamily') == 'RedHat' # Something's wrong with the pcs provider
We need to fix the issue and fix it
https://github.com/puppet-community/puppet-corosync/blob/0.7.0/templates/corosync.conf.erb uses a reference to threads_real
which is never set. This creates a corosync.conf
file where the threads
parameter has an empty value; under these circumstances corosync will refuse to start. This is fixed in master, but the 0.7.0 release should be retracted from the Puppet Forge or at least being marked as broken or non-working.
This module does not yet support Ruby 2.2.x or above
I created a little class and would like to have a simple VIP
puppet-corosync 2.0.1 on centos 7 with puppet 3.7.4 ruby 2.0.0:
cs_primitive { 'haproxy_vip':
primitive_class => 'ocf',
primitive_type => 'IPaddr2',
provided_by => 'heartbeat',
parameters => {
'ip' => '1.2.3.4',
'cidr_netmask' => '24'
},
operations => {
'monitor' => {
'interval' => '30s' }
},
}
But all I get on the puppet run is:
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not autoload puppet/type/cs_primitive: /etc/puppet/modules/corosync/lib/puppet/type/cs_primitive.rb:68: syntax error, unexpected ':', expecting ')'
newparam(:unmanaged_metadata, array_matching: :all) do
^
/etc/puppet/modules/corosync/lib/puppet/type/cs_primitive.rb:104: syntax error, unexpected ':', expecting ')'
newproperty(:operations, array_matching: :all) do
^ on node foo.bar
Hi,
when Puppet tries to apply this piece of code
cs_order { 'order-msPostgresql-master-group-INFINITY':
first => 'promote msPostgresql:Master',
second => 'start master-group',
score => 'INFINITY',
}
the following error appear
Error: /Stage[main]/Profile::Db::Bi_db/Cs_order[order-msPostgresql-master-group-INFINITY]: Could not evaluate: Execution of '/sbin/pcs constraint order Master promote msPostgresql then start master-group INFINITY kind=Mandatory id=order-msPostgresql-master-group-INFINITY symmetrical=false' returned 1: Usage: pcs constraint [constraints]...
Looks like pcs command call has wrong syntax that is supposed to be:
/sbin/pcs constraint order promote msPostgresql then start master-group INFINITY kind=Mandatory id=order-msPostgresql-master-group-INFINITY symmetrical=false
(Master statement is excessive)
What would we do?
Regards,
cs_primitive
has an autorequire against the corosync service.
This makes no sense except when are running with corosync::service { 'pacemaker': version => 0 }
; under all other circumstances the autorequire should be against pacemaker. For context, see also #143.
When trying to clone a group I see the error:
Error: Could not prefetch cs_clone provider 'crm': undefined method `attributes' for nil:NilClass
The configuration looks like:
group g_nfs p_rpcbind p_nfs_kernel_server
clone cl_nfs g_nfs \
meta interleave="true"
And the XML:
<clone id="cl_nfs">
...
<group id="g_nfs">
<primitive id="p_rpcbind" class="lsb" type="rpcbind">
So the primitive is a level deeper than it would be with a clone of a primitive. The following code in https://github.com/puppet-community/puppet-corosync/blob/master/lib/puppet/provider/cs_clone/crm.rb#L41 expects the primitive to be directly under clone:
doc.root.elements['configuration'].elements['resources'].each_element('clone') do |e|
primitive_id = e.elements['primitive'].attributes['id']
It appears to be the same for pcs too, although I've not tested it.
I don't have a fix, so just documenting the issue so others will be aware.
From the discussion in #170, it would probably be nice if the cs providers implemented the post_resource_eval
hook in order to work faster through batch syncing.
It is possible that each cluster node has multiple rings, the needed config would like this this:
rrp_mode = active
interface {
ringnumber: 0
bindnetaddr: 192.168.254.1
mcastaddr: 239.1.1.1
mcastport: 5405
}
interface {
ringnumber: 1
bindnetaddr: 192.168.255.1
mcastaddr: 239.2.1.1
mcastport: 5405
}
...
nodelist {
node {
ring0_addr: 192.168.254.1
ring1_addr: 192.168.255.1
nodeid: 1
}
node {
ring0_addr: 192.168.254.2
ring1_addr: 192.168.255.2
nodeid: 2
}
We already have support for multiple interfaces, but not several rings per node. We also need to valide rrp_mode, this needs to be set to active or passive, none isn't allowed with multiple rings.
https://github.com/puppet-community/puppet-corosync/blob/master/lib/puppet/provider/cs_colocation/pcs.rb#L39 says that the order of primitives in colocation constraints does not matter. It does.
<rsc_colocation id="c_rbd_volume2_on_target1" rsc="g_target1" score="INFINITY" with-rsc="g_rbd_volume2"/>
<rsc_colocation id="c_rbd_volume1_on_target1" rsc="g_rbd_volume1" score="INFINITY" with-rsc="g_target1"/>
These two constraints are not identical, but were generated from equivalent cs_colocation
resources.
The latest version of this module on the forge is 0.7.0, released December 2014. In the project on Github, however, there are several releases since. It looks like the current release is 1.2.1, release in the last few days.
Why the discrepancy between the forge and the Github project?
Thanks,
Lance
Error Message:
Could not autoload puppet/type/cs_property: Could not autoload puppet/provider/cs_property/pcs: cannot load such file -- puppet_x/voxpupuli/corosync/provider/pcs on node XXX
We use crm not pcs.
Would be nice to have acceptance tests for ubuntu 12.04 and centos 6
The default token value of 3000ms is not a supported config by redhat. This could result in clusters being installed that would not be supported.
As per article https://access.redhat.com/solutions/300953
Red Hat Enterprise Linux 5 and 6 using cman:
The default timeout is 10 seconds (10000 ms).
The following are the limits of the range tested by Red Hat Cluster Engineering:
Minimum: 5 seconds (5000 ms)
Maximum: 300 seconds (300000 ms)
There are known issues with values outside the tested range. If an issue is determined to be associated with values outside the tested range, Red Hat Support may require you reproduce the issue with a token timeout within the tested range.
My cluster is running on rhel6
Pacemaker allows resources to be associated with some utilization of arbitrary resources. Nodes can then be said to have so much of those resources, and pacemaker will take care place resources only where there are enough resources. For example, one can declare that a resource representing a VM requires 512 MB of RAM, and that some VM host has 16 GB of RAM. See the relevant Pacemaker documentation for more.
I've started working on a branch to add the capability of managing these attributes.
in puppet-corosync 1.2.1, i am missing location rules configuration
(for example a ping-gateway configuration). Would be great, if it does exist.
Acceptance tests have a race condition with Ubuntu 14.04 and Puppet 4.
To be investagted
When running crm configure load update
in the providers' flush methods, if there are syntax errors or anything, the command still returns 0 and the resource shows that the changes were applied.
I haven't figured out a good way to validate the configuration other than crm_verify
but this command requires XML, but we are currently emitting normal crm commands. I don't have other ideas of how to fix it at the moment.
on RHEL 6.4:
err: /Stage[main]/Corosync/Exec[enable corosync]/returns: change from notrun to 0 failed: sed -i s/START=no/START=yes/ /etc/default/corosync returned 2 instead of one of [0] at /etc/puppetlabs/puppet/modules/corosync/manifests/init.pp:198
The file /etc/default/corosync
does not exist on RHEL 6.4
after installing corosync
.
I've worked around the issue for now with:
file { '/etc/default/corosync':
owner => 'root',
group => 'root',
mode => '600',
content => "START=yes"
}
It complains about a missing should
method.
[Error] Error: /Stage[main]/Profile_corosync/Profile_corosync::Daemon[pt-heartbeat-1]/Cs_colocation[mysql-1_pt-heartbeat-1]: Could not evaluate: undefined method `include?' f
or nil:NilClass
See https://bugzilla.redhat.com/show_bug.cgi?id=878508 .
Apparently RedHat chose to remove the crm
command from their RHEL 6.4 release in favor of pcs
. This Puppet module relies on crm
for all operations, which means this module is currently broken on RHEL 6.4 based systems e.g. CentOS 6.4.
Solutions:
pcs
provider to keep this module working on 6.4 based systemscrm
command has moved to a new crmsh
package, available on SuSE repo http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/I would prefer the first option as this is probably the best supported way forward.
This module has no proper way to enable Pacemaker without a service plugin. The documented way of enabling Pacemaker as per https://github.com/puppet-community/puppet-corosync/blob/master/README.md is to add
corosync::service { 'pacemaker':
version => 0,
}
which is a bad idea to begin with, since version 1 had long been the preferred service plugin.
However, this configuration mode makes no sense at all on systems past Corosync 2.0 which did away with service plugins. This means that you currently have to do
service { 'pacemaker':
ensure => running,
require => Class['corosync'],
}
to get Pacemaker to run with Corosync 2.x, and then have all cs_*
resources require Service['pacemaker']
, which really ought to be an auto-dependency.
When PCCI runs the tests it first send a failure because tests fail on Centos then it sends a success because tests work on Ubuntu and we only see the Ubuntu success at the end.
With the pcs
provider (not sure whether the crm
provider does the same), cs_primitive
resources are committed immediately. This usually causes the resource to be started immediately, regardless of explicit order and colocation constraints, or implicit constraints from cs_group
s.
This means resources start prematurely, in the wrong order, or in the wrong place, requiring the user to run pcs resource cleanup
after the fact.
This can be worked around by running exactly that via an exec
at the end of each run, but that's exceedingly ugly, so I am wondering whether others have come up with better workarounds.
Please check the following items before submitting an issue -- thank you!
Note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.
Contributor Code of Conduct.
Optional, but makes our lives much easier:
Puppet 3.8.7, Ruby 2.1.5p273, Debian 8.5 Jessie, Module from master branch
Using the cs_rsc_defaults
type gives an error:
Error: Invalid parameter cib(:cib)
Error: /Stage[main]/Zivit::Cluster/Cs_rsc_defaults[resource-stickiness]/ensure: change from absent to present failed: Invalid parameter cib(:cib)
No error.
cs_rsc_defaults { 'resource-stickiness':
value => '100'
}
see above
It seems the type is missing the cib
parameter which is referenced in the crm
provider implementation (line 84). The parameter should probably be added to this type.
After last updates being merged to master this code:
cs_order { 'order-msPostgresql-master-group-INFINITY':
first => 'msPostgresql:promote',
second => 'master-group:start',
score => 'INFINITY',
symmetrical => 'false',
}
throws this error
Error: /Stage[main]/Profile::Db::Bi_db/Cs_order[order-msPostgresql-master-group-INFINITY]: Could not evaluate: undefined method `first' for Cs_orderorder-msPostgresql-master-group-INFINITY:Puppet::Type::Cs_order::ProviderPcs
Earlier it didn't appear. Looks like someone brought new bug trying to fix old one:)
Hi,
I tried to implement types and providers for cs_clone, cs_location and cs_rsc_defaults. Some examples explain the syntax:
cs_rsc_defaults { 'migration-threshold': value => '2' }
cs_clone { 'pingclone':
ensure => present,
primitive => 'ping',
metadata => { 'globally-unique' => "false", 'clone-max' => "2", 'target-role' => "Started" },
}
cs_location { 'monitoring_on_connected_node':
ensure => present,
rsc => 'cluster-ip',
rules => [ { 'score' => '-INFINITY', 'operation' => 'or', expressions => ['not_defined pingd', 'pingd lte 0'], }, ],
}
cs_location { 'monitoring_on_preferred_node':
ensure => present,
rsc => 'cluster-ip',
host => 'fcil02v231',
score => '1000',
}
Feedback will be welcome. Contact me for source code (lennart.betz(at)netways.de)
Ciao Lennart.
ssia.
The failures in PCCI are actually bubbling up a real error:
A few things you could check:
Make sure that Corosync and Pacemaker start at boot (or at least start them both manually) on both nodes:
$ sudo systemctl enable corosync
$ sudo systemctl enable pacemaker
There is a know bug which appears at boot on RHEL 7 or CentOS 7, I reported a workaround in Redhat bugzilla bug #1030583 but it’s no longer public.
The workardound is to let Corosync wait for 10s at boot, so it doesn’t start when the interfaces aren’t completely available (ugly workaround, I know :))
Change /usr/lib/systemd/system/corosync.service to include the ExecStartPre:
…
[Service]
ExecStartPre=/usr/bin/sleep 10
ExecStart=/usr/share/corosync/corosync start
…
Then, reload systemd:
$ sudo systemctl daemon-reload
You can also look in /var/log/pacemaker.log or look for something related in /var/log/messages.
In case these steps won’t help, I will check to redo the tutorial myself and see if I missed or forgot to write something.
Keep me posted :)
http://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs
https://github.com/puppet-community/puppet-corosync/blob/master/lib/puppet/provider/cs_clone/pcs.rb#L59 sets the clone's CIB ID to <primitive>-clone
, which is an arbitrary naming convention. The CIB ID should simply honor the cs_clone
resource's namevar, name
.
we need to migrate the crmsh commands to that new helper, which manages executions in a better way.
This task need work after #209 has been merged.
The validation check in https://github.com/puppet-community/puppet-corosync/blob/master/lib/puppet/type/cs_group.rb#L30 is overzealous. It is a perfectly valid use case to have groups with just 1 primitive. For example, you might want to define a set of 3, 2 or even just 1 resource that you then reference from a constraint. The easiest way to that is to use a group, and to always have the constraint point to the group. If there is only one primitive in the group, so be it.
The alternative is to have manifests full of ifs and unlesses to either point to a group or to a standalone primitive, and that's just silly.
Usecase: when scaling adding/removing corosync cluster nodes given in the "quorum_members" parameter, existing indexes should be unique and preserved for a node lifetime (optionally).
While shifted by auto-increment (https://github.com/voxpupuli/puppet-corosync/blob/master/templates/corosync.conf.udpu.erb#L90-L96) IDs of old and new nodes might bring a major issue to the dynamic config feature of Corosync 2 cluster, which relies on node IDs mappings in the "runtime.totem.pg.mrp.srp.members" namespace.
The solution is to make IDs generation configurable: either by autoincrement (default), or by a given "quorum_members_ids" data, if specified by a user.
Example:
A) Source cluster: quorum_members = [ node-1, node-3, node-22 ].
The resulting corosync.conf's nodelist will be generated by the erb template as a following:
nodelist {
node {
ring0_addr: node-1
nodeid: 1
ring0_addr: node-3
nodeid: 2
ring0_addr: node-22
nodeid: 3
}
}
B) Destination cluster: quorum_members = [ node-1, node-22, node-4 ]
Expected the nodelist IDs to be preserved for existing nodes:
node {
ring0_addr: node-1
nodeid: 1
ring0_addr: node-22
nodeid: 3
ring0_addr: node-22
nodeid: 4 (or anything else but 2)
}
Actual:
node {
ring0_addr: node-1
nodeid: 1
ring0_addr: node-22
nodeid: 2
ring0_addr: node-4
nodeid: 3
}
Which makes IDs 2 and 3 to be mapped to wrong nodes.
We need to drop self.instances if we plan to support the cib => parameter
Hi all,
after #233 was merged the following code:
cs_primitive { 'pgsql':
primitive_class => 'ocf',
primitive_type => 'pgsql',
provided_by => 'heartbeat',
promotable => 'true',
parameters => { 'pgctl' => '/bin/pg_ctl', 'psql' => '/bin/psql', 'pgdata' => '/var/lib/pgsql/data/', 'rep_mode' => 'sync', 'node_list' => inline_template("<%= @node_list %>"), 'restore_command' => 'cp /var/lib/pgsql/pg_archive/%f %p', 'primary_conninfo_opt' => 'keepalives_idle=60 keepalives_interval=5 keepalives_count=5', 'master_ip' => inline_template("<%= @vip_slave %>"), 'restart_on_promote' => 'true' },
operations => {
'start' => { 'interval' => '0s', 'timeout' => '60s', 'on-fail' => 'restart' },
'monitor' => { 'interval' => '4s', 'timeout' => '60s', 'on-fail' => 'restart' },
'monitor' => { 'interval' => '3s', 'timeout' => '60s', 'on-fail' => 'restart', 'role' => 'Master' },
'promote' => { 'interval' => '0s', 'timeout' => '60s', 'on-fail' => 'restart' },
'demote' => { 'interval' => '0s', 'timeout' => '60s', 'on-fail' => 'stop' },
'stop' => { 'interval' => '0s', 'timeout' => '60s', 'on-fail' => 'block' },
'notify' => { 'interval' => '0s', 'timeout' => '60s' },
},
ms_metadata => { 'master-max' => '1', 'master-node-max' => '1', 'clone-max' => '2', 'clone-node-max' => '1', 'notify' => 'true' },
require => Cs_primitive['vip-master','vip-slave'],
}
throws the following error:
Error: /Stage[main]/Profile::Db::Bi_db/Cs_primitive[pgsql]: Could not evaluate: Execution of '/sbin/pcs resource op remove pgsql monitor:Master interval=3s on-fail=restart timeout=60s' returned 1: Error: Unable to find operation matching: monitor:Master interval=3s on-fail=restart timeout=60s
port #174 to the pcs provider
[Error] Error: /Stage[main]/Profile_corosync/Profile_corosync::Daemon[messagebus]/Cs_order[nfs_before_messagebus]: Could not evaluate: Execution of '/usr/sbin/pcs con
straint order nfs-mount-clone then messagebus INFINITY kind=Mandatory id=nfs_before_messagebus symmetrical=true' returned 1: Error: Resource 'nfs-mount-clone' does no
t exist
Hi community,
after latest changes merged I got this warning
Warning: cs_primitive.rb[operations]: Role in the operations name is now deprecated. Please use an array of hashes and put the role in the values.
(at /opt/puppetlabs/puppet/cache/lib/puppet/type/cs_primitive.rb:103:in `block (4 levels) in <top (required)>')
I know it's not a major error, just be advised
The cs_shadow
and cs_commit
resources create notice messages every run. This shouldn't happen. /var/lib/heartbeat/crm/shadow.*
are left after each run, so we could perhaps use that? The crm
command has to have a way to know what cib's stick around, since trying to create a new one with the same name says that it already exists.
root@puppet-failover-secondary:~# crm configure cib new puppet_ha
A shadow instance 'puppet_ha' already exists.
To prevent accidental destruction of the cluster, the --force flag is required in order to proceed.
root@puppet-failover-secondary:~# wc /var/lib/heartbeat/crm/shadow.puppet_ha
70 199 4724 /var/lib/heartbeat/crm/shadow.puppet_ha
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.