ansible-community / ansible-nomad Goto Github PK
View Code? Open in Web Editor NEW:watch: Ansible role for Nomad
Home Page: https://galaxy.ansible.com/brianshumate/nomad
License: BSD 2-Clause "Simplified" License
:watch: Ansible role for Nomad
Home Page: https://galaxy.ansible.com/brianshumate/nomad
License: BSD 2-Clause "Simplified" License
This is version 1.8.0, commit: b8163a1.
If an ipv6 address is specified for the nomad_advertise_address, then nomad will error out with a bind: invalid argument error.
As can be seen in the basic.hcl the problem is that the ports are simply being appended to the nomad_advertise_address. IPv6 syntax requires that the IP portion be placed within brackets - e.g
[fdd3:ecf5:726d:5fbf:2e99:93f6:ba25:8570]:4646 rather than fdd3:ecf5:726d:5fbf:2e99:93f6:ba25:8570:4646
Followup:
I worked around this by changing the base.hcl.j2 by adding the ipwrap filter as described in the Ansible docs here.
bind_addr = "{{ nomad_bind_address }}"
advertise {
http = "{{ nomad_advertise_address | ipwrap }}:{{ nomad_ports.http }}"
rpc = "{{ nomad_advertise_address | ipwrap }}:{{ nomad_ports.rpc }}"
serf = "{{ nomad_advertise_address | ipwrap }}:{{ nomad_ports.serf }}"
Note that the ipwrap filter requires that the netaddr package to be installed on the ansible server. It does not need to be installed on the remote client.
I tested it with both ipv6 and ipv4 addresses and it seems to work. Your mileage may vary :-)
I have a fix for this, could you allow me to send it as a new branch?
This is how does the fix look like:
+{% if nomad_version | replace(".", "0000") | int >= 10000000000 %}
+ # Deprecated in 1.0.0
+ backwards_compatible_metrics = "{{ nomad_telemetry_backwards_compatible_metrics | default(false) | bool | lower }}"
+ disable_tagged_metrics = "{{ nomad_telemetry_disable_tagged_metrics | default(false) | bool | lower }}"
+{% endif %}
v0.12.1
was released today! This has some desirable bug fixes. Let's update to the latest/greatest for the defaults.
Hello.
When i starting playbook on host whith centos 7 and already disabled SELinux:
cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
and:
setenforce 0 || echo $?
setenforce: SELinux is disabled
1
i get the following:
TASK [brianshumate.nomad : Stop SELinux enforcing] ******************************************************************
fatal: [myhost]: FAILED! => {"changed": true, "cmd": "setenforce 0", "delta": "0:00:00.005533", "end": "2019-05-22 17:02:33.749074", "msg": "non-zero return code", "rc": 1, "start": "2019-05-22 17:02:33.743541", "stderr": "setenforce: SELinux is disabled", "stderr_lines": ["setenforce: SELinux is disabled"], "stdout": "", "stdout_lines": []}
PLAY RECAP **********************************************************************************************************
Nomad 0.11.x is now GA
The role on galaxy is at 1.8.0, which is broken for me due to not having this fix: e45d6b6
As a result I get errors like:
AnsibleUndefinedVariable: 'nomad_ports_http' is undefined
Hi Brian,
I don't see a way to provide custom / local configs, like a consul token, since the daemon is started with explicit config files, rather than pointing to the config directory.
If the client and server configs were only placed based on the value of nomad_node_role
, it would seem we could point to the config directory instead, and I could just create a consul.hcl file which would automatically be picked up.
Alternatively, a second config dir could be passed on to the daemon, leaving the existing setup intact.
Thoughts?
I'm trying to build a procedure that will allow me to add instances (servers and clients) to the cluster as we go along. I've noticed that if I start with a single machine and configured as both client and server, when I add another such server, one of them declares itself as the leader while the other one gets stuck in a "no leader" loop.
The two instances agree on a leader and the cluster is fully fucntional
One instance declares itself as the leader while the other one is stuck in a "no leader" loop.
I've found that the root cause of this issue is the duplicate restart of the service. When a new machine is added, it is started twice. First in the main task when enabling the service and then it gets restarted in the handler becuase its configuration has changed.
Only enable the service in the main task and let the handler start it.
I have working solution for systemd based systems which keeps other systems unchanged. I can open a PR for it
In the rendering of the server.hcl file, server specific variables are evaluated as well as needing the python-netaddr package, both of which may not be available to the client, requiring them to be defined or installed and then ignored.
I will create a merge request that address this by not installing the server.hcl file when nomad_node_role == 'client'
Hi,
I have used your ansible role to deploy nomad on Virtualbox (my Vagrantfile).
On the VB I have each server with two network interfaces:
I have configured the "nomad_iface" to "eth1" but my jobs get bind only on eth0.
I also have "nomad_bind_address" => "0.0.0.0", because without this my server only listens on loopback device.
Do I need to do anything else?
Thanks,
Bruno
The download and unzip steps of the installation of nomad are delegated to localhost and will fail when unzip
is not available.
This role fails to run from the container version of ansible/awx
.
Workaround was to install unzip in the ansible/awx container from the playbook, before the call to this role.
A patch for ansible-nomad would be non-trivial, because the ability to install unzip
on localhost from a package is not guaranteed. Perhaps the dependency for unzip
on localhost could be mentioned in the README.md
Hello!
Similar to what we did for logging in the ansible-consul role, shall we enhance syslog logging for ansible-nomad?
Ownership of {{ nomad_log_dir }} would be set to syslog:adm to enable rsyslogd to write to that directory.
We would like to be able to define syslog facility used for logging messages. Suggest we define the variable nomad_syslog_facility for this purpose.
Also, in order to have rsyslogd write nomad logs to {{ consul_log_path }}, we need a corresponding rule to the syslog facility. Suggest that this happens via the /etc/rsyslog.d/00-nomad.conf.
I will submit a PR for your review.
Please let me know if you have any questions.
Thanks!
-yash
https://github.com/brianshumate/ansible-nomad/blame/master/tasks/install.yml#L32
i noticed that my shell was running /bin/sh and was failing on the pipefail flag.
ask a quick workaround I added this to the task:
args:
executable: /bin/bash
error:
TASK [nomad : Get Nomad package checksum] ******************************************************************************************************************************************************************
fatal: [armdocker-1.angrybear2.local -> 127.0.0.1]: FAILED! => {"changed": true, "cmd": "set -o pipefail\n grep \"nomad_0.9.1_linux_arm.zip\" \"/home/lane/GIT/lanesible/roles/nomad/files/nomad_0.9.1_SHA25
6SUMS\" | awk '{print $1}'", "delta": "0:00:00.008978", "end": "2019-05-17 03:48:05.544637", "msg": "non-zero return code", "rc": 2, "start": "2019-05-17 03:48:05.535659", "stderr": "/bin/sh: 1: set: Ille
gal option -o pipefail", "stderr_lines": ["/bin/sh: 1: set: Illegal option -o pipefail"], "stdout": "", "stdout_lines": []}
On Ubuntu 16.04, both the SYSV init script and systemd unit get installed. Moreover the service is not enabled automatically.
The part installing the scripts works well in the Consul playbook, it could be copied from there.
other than root? If nomad functionality is restricted to running jobs (with raw_fork/exec) it is fine.
When running the role on Ubuntu 19.10, I get the following error with ensuring the required OS packages are installed
failed: [lab] (item=cgroup-bin) => {"ansible_loop_var": "item", "changed": false, "item": "cgroup-bin", "msg": "No package matching 'cgroup-bin' is available"}
After a bit of digging I found that it appears the problem is caused by cgroup-bin
being deprecated and subsequently removed in 19.10 (see here).
Looking around it seems that the packages are hard-coded in vars/Debian.yaml
:
nomad_os_packages:
- cgroup-bin
- curl
- git
- libcgroup1
- unzip
I've tried many ways of overriding these variables but it looks like I'm stuck using --extra-vars
from the command line or importing and changing the role locally to replace cgroup-bin
with cgroup-tools
. While the former is ok, the later feels very wrong. I would put in a PR to update them, but I'm not sure what the best approach to take is as I'm a relative newcomer to Ansible.
Hi,
that's a very clean role, thank you for all the effort you put into it!
I have a question, though:
In tasks/install.yml
there's the Download nomad
task with this conditional check in line 45:
when: consul_package.stat.exists == False
Does that mean the role is not meant to be used with Consul integration?
I'm asking this because I would like to use this role together with Consul as described here for automatic service discovery of the tasks executed by Nomad.
Hey Brian,
The logging arg is ignored, and the & to background the process backgrounds the entire deamon call. Could fix these by adding &>> in front of the LOG_FILE, and an escape in front of the &, like:
--pidfile="$PID_FILE" \
"$nomad" agent -"{{ nomad_node_role }}" -config "${CONFIG_PATH}"/base.hcl -config "${CONFIG_PATH}/{{ nomad_node_role }}.hcl" &>> "${LOG_FILE}" \&
Hi Brian!
I've noticed that in here you are stopping nomad with SIGKILL signal.
Is it a recommended way to stop nomad?
I'm running the example vagrant file (using BOX_NAME=centos/6, debian works fine BTW) and am getting the following output:
TASK [brianshumate.nomad : Start Nomad] ****************************************
fatal: [nomad1.local]: FAILED! => {"changed": false, "msg": "failed determining service state, possible typo of service name?"}
fatal: [nomad2.local]: FAILED! => {"changed": false, "msg": "failed determining service state, possible typo of service name?"}
fatal: [nomad3.local]: FAILED! => {"changed": false, "msg": "failed determining service state, possible typo of service name?"}
RUNNING HANDLER [brianshumate.nomad : restart nomad] ***************************
to retry, use: --limit @/Users/********/ansible-nomad/examples/site.retry
PLAY RECAP *********************************************************************
nomad1.local : ok=17 changed=12 unreachable=0 failed=1
nomad2.local : ok=16 changed=12 unreachable=0 failed=1
nomad3.local : ok=16 changed=12 unreachable=0 failed=1
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
I have a hunch that the /etc/init.d/nomad
script doesn't use the binary with the correct arguments.
If I run vagrant ssh nomad1
to enter the instance shell and subsequently run sudo service nomad status
I get the following response:
[vagrant@nomad1 ~]$ sudo service nomad status
Usage: nomad [-version] [-help] [-autocomplete-(un)install] <command> [args]
Common commands:
run Run a new job or update an existing job
stop Stop a running job
status Display the status output for a resource
alloc Interact with allocations
job Interact with jobs
node Interact with nodes
agent Runs a Nomad agent
Other commands:
acl Interact with ACL policies and tokens
agent-info Display status information about the local agent
deployment Interact with deployments
eval Interact with evaluations
namespace Interact with namespaces
operator Provides cluster-level tools for Nomad operators
quota Interact with quotas
sentinel Interact with Sentinel policies
server Interact with servers
ui Open the Nomad Web UI
version Prints the Nomad version
If I look at the implementation of /etc/init.d/nomad
I see that the status
section calls info
which doesn't exist as a command... valid options are status
and agent-info
This is where both centos and debian don't seem to work... not sure if this is just related to my virtualbox installation.
root@nomad1:~# nomad status
Error querying jobs: Get http://127.0.0.1:4646/v1/jobs: dial tcp 127.0.0.1:4646: connect: connection refused
root@nomad1:~# nomad agent-info
Error querying agent info: failed querying self endpoint: Get http://127.0.0.1:4646/v1/agent/self: dial tcp 127.0.0.1:4646: connect: connection refused
[dnk8n@localhost brianshumate.nomad]$ cd examples/
[dnk8n@localhost examples]$ ls
bin README_VAGRANT.md site.yml Vagrantfile vagrant_hosts
[dnk8n@localhost examples]$ pwd
/home/dnk8n/.ansible/roles/brianshumate.nomad/examples
[dnk8n@localhost examples]$ ./bin/preinstall
✅ nomad VM node information present in /etc/hosts
✅ Vagrant Hosts plugin is installed
[dnk8n@localhost examples]$ vagrant up
Bringing machine 'nomad1' up with 'virtualbox' provider...
Bringing machine 'nomad2' up with 'virtualbox' provider...
Bringing machine 'nomad3' up with 'virtualbox' provider...
==> nomad1: Importing base box 'debian/jessie64'...
==> nomad1: Matching MAC address for NAT networking...
==> nomad1: Checking if box 'debian/jessie64' version '8.11.1' is up to date...
==> nomad1: Setting the name of the VM: nomad-node1
==> nomad1: Clearing any previously set network interfaces...
==> nomad1: Preparing network interfaces based on configuration...
nomad1: Adapter 1: nat
nomad1: Adapter 2: hostonly
==> nomad1: Forwarding ports...
nomad1: 22 (guest) => 2222 (host) (adapter 1)
==> nomad1: Running 'pre-boot' VM customizations...
==> nomad1: Booting VM...
==> nomad1: Waiting for machine to boot. This may take a few minutes...
nomad1: SSH address: 127.0.0.1:2222
nomad1: SSH username: vagrant
nomad1: SSH auth method: private key
nomad1:
nomad1: Vagrant insecure key detected. Vagrant will automatically replace
nomad1: this with a newly generated keypair for better security.
nomad1:
nomad1: Inserting generated public key within guest...
nomad1: Removing insecure key from the guest if it's present...
nomad1: Key inserted! Disconnecting and reconnecting using new SSH key...
==> nomad1: Machine booted and ready!
==> nomad1: Checking for guest additions in VM...
nomad1: No guest additions were detected on the base box for this VM! Guest
nomad1: additions are required for forwarded ports, shared folders, host only
nomad1: networking, and more. If SSH fails on this machine, please install
nomad1: the guest additions and repackage the box to continue.
nomad1:
nomad1: This is not an error message; everything may continue to work properly,
nomad1: in which case you may ignore this message.
==> nomad1: Setting hostname...
==> nomad1: Configuring and enabling network interfaces...
==> nomad1: Installing rsync to the VM...
==> nomad1: Rsyncing folder: /home/dnk8n/.ansible/roles/brianshumate.nomad/examples/ => /vagrant
==> nomad1: Running provisioner: hosts...
==> nomad2: Importing base box 'debian/jessie64'...
==> nomad2: Matching MAC address for NAT networking...
==> nomad2: Checking if box 'debian/jessie64' version '8.11.1' is up to date...
==> nomad2: Setting the name of the VM: nomad-node2
==> nomad2: Fixed port collision for 22 => 2222. Now on port 2200.
==> nomad2: Clearing any previously set network interfaces...
==> nomad2: Preparing network interfaces based on configuration...
nomad2: Adapter 1: nat
nomad2: Adapter 2: hostonly
==> nomad2: Forwarding ports...
nomad2: 22 (guest) => 2200 (host) (adapter 1)
==> nomad2: Running 'pre-boot' VM customizations...
==> nomad2: Booting VM...
==> nomad2: Waiting for machine to boot. This may take a few minutes...
nomad2: SSH address: 127.0.0.1:2200
nomad2: SSH username: vagrant
nomad2: SSH auth method: private key
nomad2:
nomad2: Vagrant insecure key detected. Vagrant will automatically replace
nomad2: this with a newly generated keypair for better security.
nomad2:
nomad2: Inserting generated public key within guest...
nomad2: Removing insecure key from the guest if it's present...
nomad2: Key inserted! Disconnecting and reconnecting using new SSH key...
==> nomad2: Machine booted and ready!
==> nomad2: Checking for guest additions in VM...
nomad2: No guest additions were detected on the base box for this VM! Guest
nomad2: additions are required for forwarded ports, shared folders, host only
nomad2: networking, and more. If SSH fails on this machine, please install
nomad2: the guest additions and repackage the box to continue.
nomad2:
nomad2: This is not an error message; everything may continue to work properly,
nomad2: in which case you may ignore this message.
==> nomad2: Setting hostname...
==> nomad2: Configuring and enabling network interfaces...
==> nomad2: Installing rsync to the VM...
==> nomad2: Rsyncing folder: /home/dnk8n/.ansible/roles/brianshumate.nomad/examples/ => /vagrant
==> nomad2: Running provisioner: hosts...
==> nomad3: Importing base box 'debian/jessie64'...
==> nomad3: Matching MAC address for NAT networking...
==> nomad3: Checking if box 'debian/jessie64' version '8.11.1' is up to date...
==> nomad3: Setting the name of the VM: nomad-node3
==> nomad3: Fixed port collision for 22 => 2222. Now on port 2201.
==> nomad3: Clearing any previously set network interfaces...
==> nomad3: Preparing network interfaces based on configuration...
nomad3: Adapter 1: nat
nomad3: Adapter 2: hostonly
==> nomad3: Forwarding ports...
nomad3: 22 (guest) => 2201 (host) (adapter 1)
==> nomad3: Running 'pre-boot' VM customizations...
==> nomad3: Booting VM...
==> nomad3: Waiting for machine to boot. This may take a few minutes...
nomad3: SSH address: 127.0.0.1:2201
nomad3: SSH username: vagrant
nomad3: SSH auth method: private key
nomad3:
nomad3: Vagrant insecure key detected. Vagrant will automatically replace
nomad3: this with a newly generated keypair for better security.
nomad3:
nomad3: Inserting generated public key within guest...
nomad3: Removing insecure key from the guest if it's present...
nomad3: Key inserted! Disconnecting and reconnecting using new SSH key...
==> nomad3: Machine booted and ready!
==> nomad3: Checking for guest additions in VM...
nomad3: No guest additions were detected on the base box for this VM! Guest
nomad3: additions are required for forwarded ports, shared folders, host only
nomad3: networking, and more. If SSH fails on this machine, please install
nomad3: the guest additions and repackage the box to continue.
nomad3:
nomad3: This is not an error message; everything may continue to work properly,
nomad3: in which case you may ignore this message.
==> nomad3: Setting hostname...
==> nomad3: Configuring and enabling network interfaces...
==> nomad3: Installing rsync to the VM...
==> nomad3: Rsyncing folder: /home/dnk8n/.ansible/roles/brianshumate.nomad/examples/ => /vagrant
==> nomad3: Running provisioner: hosts...
==> nomad3: Running provisioner: ansible...
nomad3: Running ansible-playbook...
PLAY [Installing Nomad] ********************************************************
TASK [Gathering Facts] *********************************************************
[WARNING]: Platform linux on host nomad2.local is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.9/referen
ce_appendices/interpreter_discovery.html for more information.
ok: [nomad2.local]
[WARNING]: Platform linux on host nomad3.local is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.9/referen
ce_appendices/interpreter_discovery.html for more information.
ok: [nomad3.local]
[WARNING]: Platform linux on host nomad1.local is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.9/referen
ce_appendices/interpreter_discovery.html for more information.
ok: [nomad1.local]
TASK [brianshumate.nomad : Check distribution compatibility] *******************
skipping: [nomad2.local]
skipping: [nomad1.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Fail if not a new release of Red Hat / CentOS] ******
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Fail if not a new release of Debian] ****************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Fail if not a new release of Ubuntu] ****************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Check nomad_group_name is included in groups] *******
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Include OS variables] *******************************
ok: [nomad1.local]
ok: [nomad2.local]
ok: [nomad3.local]
TASK [brianshumate.nomad : Gather facts from other servers] ********************
TASK [brianshumate.nomad : Expose bind_address, advertise_address and node_role as facts] ***
ok: [nomad1.local]
ok: [nomad2.local]
ok: [nomad3.local]
TASK [brianshumate.nomad : Add Nomad group] ************************************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Add Nomad user] *************************************
changed: [nomad2.local]
changed: [nomad3.local]
changed: [nomad1.local]
TASK [brianshumate.nomad : Install dmsetup for Ubuntu 16.04] *******************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Run dmsetup for Ubuntu 16.04] ***********************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Add Nomad user to docker group] *********************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : OS packages] ****************************************
changed: [nomad1.local] => (item=cgroup-bin)
changed: [nomad2.local] => (item=cgroup-bin)
changed: [nomad3.local] => (item=cgroup-bin)
changed: [nomad1.local] => (item=curl)
changed: [nomad2.local] => (item=curl)
changed: [nomad3.local] => (item=curl)
changed: [nomad3.local] => (item=git)
changed: [nomad2.local] => (item=git)
ok: [nomad3.local] => (item=libcgroup1)
ok: [nomad2.local] => (item=libcgroup1)
changed: [nomad1.local] => (item=git)
ok: [nomad1.local] => (item=libcgroup1)
changed: [nomad3.local] => (item=unzip)
changed: [nomad2.local] => (item=unzip)
changed: [nomad1.local] => (item=unzip)
TASK [brianshumate.nomad : Check Nomad package checksum file] ******************
ok: [nomad1.local]
TASK [brianshumate.nomad : Get Nomad package checksum file] ********************
changed: [nomad1.local]
TASK [brianshumate.nomad : Get Nomad package checksum] *************************
changed: [nomad3.local]
changed: [nomad1.local]
changed: [nomad2.local]
TASK [brianshumate.nomad : Check Nomad package file] ***************************
ok: [nomad2.local]
ok: [nomad1.local]
ok: [nomad3.local]
TASK [brianshumate.nomad : Download Nomad] *************************************
changed: [nomad2.local]
ok: [nomad3.local]
ok: [nomad1.local]
TASK [brianshumate.nomad : Create Temporary Directory for Extraction] **********
changed: [nomad2.local]
changed: [nomad1.local]
changed: [nomad3.local]
TASK [brianshumate.nomad : Unarchive Nomad] ************************************
changed: [nomad3.local]
changed: [nomad1.local]
changed: [nomad2.local]
TASK [brianshumate.nomad : Install Nomad] **************************************
changed: [nomad2.local]
changed: [nomad3.local]
changed: [nomad1.local]
TASK [brianshumate.nomad : Cleanup] ********************************************
changed: [nomad1.local]
changed: [nomad2.local]
changed: [nomad3.local]
TASK [brianshumate.nomad : Disable SELinux for Docker Driver] ******************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Create directories] *********************************
changed: [nomad2.local] => (item=/var/nomad)
changed: [nomad1.local] => (item=/var/nomad)
changed: [nomad3.local] => (item=/var/nomad)
TASK [brianshumate.nomad : Create config directory] ****************************
changed: [nomad2.local]
changed: [nomad1.local]
changed: [nomad3.local]
TASK [brianshumate.nomad : Base configuration] *********************************
changed: [nomad2.local]
changed: [nomad1.local]
changed: [nomad3.local]
TASK [brianshumate.nomad : Server configuration] *******************************
skipping: [nomad3.local]
changed: [nomad2.local]
changed: [nomad1.local]
TASK [brianshumate.nomad : Client configuration] *******************************
skipping: [nomad1.local]
skipping: [nomad2.local]
changed: [nomad3.local]
TASK [brianshumate.nomad : Custom configuration] *******************************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : SYSV init script] ***********************************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : Debian init script] *********************************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]
TASK [brianshumate.nomad : extract systemd version] ****************************
ok: [nomad3.local]
ok: [nomad1.local]
ok: [nomad2.local]
TASK [brianshumate.nomad : systemd script] *************************************
changed: [nomad1.local]
changed: [nomad3.local]
changed: [nomad2.local]
TASK [brianshumate.nomad : reload systemd daemon] ******************************
ok: [nomad3.local]
ok: [nomad2.local]
ok: [nomad1.local]
TASK [brianshumate.nomad : Start Nomad] ****************************************
changed: [nomad1.local]
changed: [nomad3.local]
changed: [nomad2.local]
TASK [Start nomad] *************************************************************
ok: [nomad1.local]
ok: [nomad2.local]
ok: [nomad3.local]
RUNNING HANDLER [brianshumate.nomad : restart nomad] ***************************
changed: [nomad3.local]
changed: [nomad2.local]
changed: [nomad1.local]
PLAY RECAP *********************************************************************
nomad1.local : ok=24 changed=15 unreachable=0 failed=0 skipped=15 rescued=0 ignored=0
nomad2.local : ok=22 changed=15 unreachable=0 failed=0 skipped=15 rescued=0 ignored=0
nomad3.local : ok=22 changed=14 unreachable=0 failed=0 skipped=15 rescued=0 ignored=0
==> nomad1: Machine 'nomad1' has a post `vagrant up` message. This is a message
==> nomad1: from the creator of the Vagrantfile, and not from Vagrant itself:
==> nomad1:
==> nomad1: Vanilla Debian box. See https://app.vagrantup.com/debian for help and bug reports
==> nomad2: Machine 'nomad2' has a post `vagrant up` message. This is a message
==> nomad2: from the creator of the Vagrantfile, and not from Vagrant itself:
==> nomad2:
==> nomad2: Vanilla Debian box. See https://app.vagrantup.com/debian for help and bug reports
==> nomad3: Machine 'nomad3' has a post `vagrant up` message. This is a message
==> nomad3: from the creator of the Vagrantfile, and not from Vagrant itself:
==> nomad3:
==> nomad3: Vanilla Debian box. See https://app.vagrantup.com/debian for help and bug reports
[dnk8n@localhost examples]$ vagrant ssh nomad1
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Nov 19 12:56:38 2020 from 10.1.42.1
vagrant@nomad1:~$ nomad node status
Error querying node status: Get http://127.0.0.1:4646/v1/nodes: dial tcp 127.0.0.1:4646: connect: connection refused
I am getting an "illegal char error" with my client.hcl when using nomad_chroot_env
.
This line seems to be incorrect:
https://github.com/brianshumate/ansible-nomad/blob/b37e501c5ff320e85fc10bc4d638285aec027867/templates/client.hcl.j2#L35
client {
enabled = true
chroot_env = {
"/etc/local.resolv.conf": "/etc/resolv.conf",
"/run/systemd/resolve": "/run/systemd/resolve"
}
}
client {
enabled = true
chroot_env = {
"/etc/local.resolv.conf" = "/etc/resolv.conf",
"/run/systemd/resolve" = "/run/systemd/resolve"
}
}
Upgrade 0.12, add new configuration vars as needed
PR #88 added support for configuring Nomad's telemetry
options (as previously requested in #59).
The conditional check on line 86 of base.hcl
, which controls whether to output the telemetry {}
stanza, contains a reference to an undefined nomad_telemetry
variable which causes the playbook to fail.
TASK [brianshumate.nomad : Base configuration] ************************************************************************
fatal: [rpi4-6.local]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'nomad_telemetry' is undefined"}
It would be great to have automatic gossip encryption key generation and setup for nomad similar to the consul role. As of now it seems that the way to add encryption is by manually generating a key and setting the nomad_encryption variable.
In README.md, you mentioned that the default user "nomad" but when I checked the defaults, it is defined as "root"
Hey Brian,
the nomad_sysvinit.j2 file is passing -pid-file "${PID_FILE}"
which nomad doesn't seem to recognize, and I can't find that arg in any 0.5.6 docs.
Additionally, restart doesn't seem to work, but stop, sleep for some period, and start does. I've seen this behavior in other procs like consul, and I'm not sure what's causing it.
When we upgrade from no TLS to TLS, we need the parameter "rpc_upgrade_mode" = true
in TLS config.
https://learn.hashicorp.com/nomad/transport-security/enable-tls#rpc-upgrade-mode-for-nomad-servers
https://nomadproject.io/docs/configuration/tls/
Namespacing an enterprise feature that is offered in vault. Nomad offers this feature, but this ansible role currently does not expose support for it.
Furthermore, the Nomad documentation specifies that a vault token does not need to be added for a client node: https://www.nomadproject.io/docs/configuration/vault#nomad-client
In order to reduce the spread of tokens, it is recommended that vault tokens only be provided directly to the server.
Nomad + Consul connect require the CNI plugins to be installed in /opt/cni/bin
As the extraction is performed locally on the ansible control node, there should be a check to install the correct packages.
Extract nomad archive requires unzip (atleast on RedHat/ CentOs). https://github.com/brianshumate/ansible-nomad/blob/26626ea9d40cc31877f3518870f824c54d6ea61c/tasks/install.yml#L68
Without installation of unzip a error occurs:
TASK [brianshumate.nomad : Unarchive Nomad] *************************************************************************************************************************************************************************************************
fatal: [project-shared-hs03 -> 127.0.0.1]: FAILED! => {"changed": false, "msg": "Failed to find handler for \"/home/user/.ansible/tmp/ansible-tmp-1571734629.36-215237641599247/source\". Make sure the required command to extract the file is
installed. Command \"unzip\" not found. Command \"/usr/bin/gtar\" could not handle archive."}
fatal: [project-shared-hs02 -> 127.0.0.1]: FAILED! => {"changed": false, "msg": "Failed to find handler for \"/home/user/.ansible/tmp/ansible-tmp-1571734629.29-222481350635223/source\". Make sure the required command to extract the file is
installed. Command \"unzip\" not found. Command \"/usr/bin/gtar\" could not handle archive."}
fatal: [project-shared-hs01 -> 127.0.0.1]: FAILED! => {"changed": false, "msg": "Failed to find handler for \"/home/user/.ansible/tmp/ansible-tmp-1571734629.19-86320697202974/source\". Make sure the required command to extract the file is
installed. Command \"unzip\" not found. Command \"/usr/bin/gtar\" could not handle archive."}
to retry, use: --limit @/home/user/project-automation/playbooks/hashistack_2_apps.retry
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
project-shared-hs01 : ok=38 changed=3 unreachable=0 failed=1
project-shared-hs02 : ok=33 changed=3 unreachable=0 failed=1
project-shared-hs03 : ok=33 changed=3 unreachable=0 failed=1
With this commit: 8828144
We lost the ability to auto-configure nomad based on consul. Let's bring that back, unless there was a good reason to drop it.
If you agree, I'll submit a merge request.
no croup-bin package on debian bullseye -- same as #75
failed: [mcsolo-3] (item=cgroup-bin) => {"ansible_loop_var": "item", "changed": false, "item": "cgroup-bin", "msg": "No package matching 'cgroup-bin' is available"}
nomad_user
user is set to root
and nomad_group
is set to bin
by default. Therefore, the code below will change the primary group for the root
user to bin
.
- name: Add Nomad user
user:
name: "{{ nomad_user }}"
comment: "Nomad user"
group: "{{ nomad_group }}"
system: yes
when:
- nomad_manage_user | bool
This caused an issue with snap
in my lab server because when I installed microk8s
the commands failed with the following error
/snap/bin/microk8s
permanently dropping privs did not work: File exists
Took a while to figure out that the error above was caused because the GID for the root user was not 0
. After setting the group for the root
user back to root
(gid 0
) the issue was resolved.
The following shows how snap
throws the error above when uid
and gid
are not 0
.
https://github.com/snapcore/snapd/blob/master/cmd/snap-confine/snap-confine.c#L503-L506
Proposal:
Set nomad_user
to nomad
instead of root
by default.
nomad_group_name
how should i use this nomad_group_name
? if i use defalut value ,i get some error :nomad_group_name must be included in groups
All of the startup scripts are hard-coded to assume they are running on nomad servers, but I would think they should run -client
when nomad_node_role==client
. Something like:
"$nomad" agent {% if nomad_node_role == 'server' %}-server{% else %}-client{% endif %}
will existing nomad_options parameter work?
should we define plugin options as seperate blocks and render separate config file?
With the new support of host_volumes
, this needs to be added to client.hcl
.
Alternatively, maybe we can consider adding arbitrary extra text to append to either base.hcl
, server.hcl
and client.hcl
so features in newer versions can be configured without implementing support for everything?
need ci-testing like ansible-consul
Tasks using jinja tests as filters will generate deprecation warnings such as these starting in ansible 2.5. The documented requirements are ansible 2.5, so worth updating to the correct syntax. I have submitted a PR.
-charles.
We need authoritative_region in server config, using for multi-region ACL.
https://nomadproject.io/guides/security/acl/#configuring-acls
https://nomadproject.io/docs/configuration/server/#authoritative_region
Need to transition ansible-galaxy entry to new repo location
All of the startup scripts are hard-coded for /usr/local/bin/nomad, but should instead use:
{{ nomad_bin_dir }}/nomad
And the install.yml file task (Install Nomad) also installs to /usr/local/bin
, rather than "{{ nomad_bin_dir }}"
Hi, when I install this role via ansible-galaxy
, the source appears to missing the latest commit which fixes a linting issue. Unfortunately this causes ansible to fail with a syntax error:
- name: Add Nomad user to docker group
when: "{{ nomad_user }}" != "root"
^ here
If you run: ansible-galaxy role info brianshumate.nomad
, it appears like the commit is included:
Role: brianshumate.nomad
description: Nomad cluster role
active: True
commit: 6946dfed63663709ab87dbd269bf35ce2ca1f1bc
commit_message: Fix linting issue in Docker tasks
commit_url: https://api.github.com/repos/brianshumate/ansible-nomad/git/commits/6946dfed63663709ab87dbd269bf35ce2ca1f1bc
company: Brian Shumate
created: 2017-02-23T23:14:07.668561Z
dependencies: []
download_count: 4509
forks_count: 58
galaxy_info:
author: Brian Shumate
company: Brian Shumate
galaxy_tags: ['clustering', 'monitoring', 'networking', 'scheduling', 'system']
license: BSD
min_ansible_version: 2.5
platforms: [{'name': 'Archlinux', 'versions': ['all']}, {'name': 'EL', 'versions': [6, 7]}, {'name': 'Ubuntu', 'versions': ['vivid', 'xenial']}, {'name': 'Debian', 'versions': ['jessie']}, {'name': 'Windows', 'versions': ['2012R2']}]
github_branch: master
github_repo: ansible-nomad
github_user: brianshumate
id: 15834
imported: 2019-12-17T16:44:28.033546-05:00
install_date: Sun Jan 5 19:19:35 2020
installed_version: v1.9.3
is_valid: True
issue_tracker_url: https://github.com/brianshumate/ansible-nomad/issues
license: BSD
min_ansible_version: 2.5
modified: 2019-12-17T21:44:28.045301Z
open_issues_count: 8
path: (u'/root/.ansible/roles', u'/usr/share/ansible/roles', u'/etc/ansible/roles')
role_type: ANS
stargazers_count: 103
travis_status_url: https://travis-ci.org/brianshumate/ansible-nomad.svg?branch=master
However, I can verify the incorrect code is included in the release package at: https://github.com/brianshumate/ansible-nomad/archive/v1.9.3.zip.
I'm trying to get the role to configure nomad to log in /var/log/nomad rather than /var/log/messages, just for convenience of monitoring the cluster logs. I've set group_vars for my nomad group to:
nomad_syslog_enable: false
and otherwise am using the default location for logging. Regardless of the syslog setting though, all log messages from nomad are written to /var/log/messages rather than the default location specified in the role as:
nomad_log_dir: /var/log/nomad
The task Create directories
ignores directory variables, and should look something like:
file:
dest: "{{ item }}"
state: directory
owner: "{{ nomad_user }}"
group: "{{ nomad_group}}"
with_items:
- /opt/nomad
- /var/run/nomad
- "{{ nomad_data_dir }}"
- "{{ nomad_config_dir }}"
- "{{ nomad_log_dir }}"
Hi Brian, first of all, amazing role, thanks for this.
I'm wondering if you could add support for telemetry also? This way we can also get metrics from Nomad nodes.
Thanks,
Bruno
Hello there!
I'm trying to setup a small development cluster, but have encountered some problems with the Nomad nodes not being able to elect a leader. The important factor contributing to this problem is that I'm using Consul – without it, the setup is smooth and painless.
I've narrowed down the issue to following code parts:
nomad_servers
will be set to the length of the array containing the hosts in the nomad_group_name
group.
https://github.com/brianshumate/ansible-nomad/blob/40fa2bbe95c592d215f36be72d480accbe151fb2/templates/server.hcl.j2#L4-L6
The array will be however populated only if nomad_use_consul=no
, which means that if Consul is enabled, the bootsrap_expect
parameter will become 0, which confuses Nomad a lot and prevents it from electing leader (0 = no leader should be elected).
https://github.com/brianshumate/ansible-nomad/blob/418fa2ede73ed7240b37ef7cbc5af4c4dcf6405f/defaults/main.yml#L92-L104
Wouldn't it make sense to remove the surrounding if-statement in the snippet above? Even if you use Consul, you will want to maintain a list of the server nodes, so populating the nomad_servers
, no matter the Consul presence, feels fine to me. What do you think?
nomad_servers: "\
{% set _nomad_servers = [] %}\
{% for host in groups[nomad_group_name] %}\
{% set _nomad_node_role = hostvars[host]['nomad_node_role'] | default('client', true) %}\
{% if ( _nomad_node_role == 'server' or _nomad_node_role == 'both') %}\
{% if _nomad_servers.append(host) %}{% endif %}\
{% endif %}\
{% endfor %}\
{{ _nomad_servers }}"
If you are fine with the proposed change, I will submit a PR with a fix.
Cheers!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.