ansible-community / ansible-nomad Goto Github PK

View Code? Open in Web Editor NEW

287.0 13.0 162.0 301 KB

:watch: Ansible role for Nomad

Home Page: https://galaxy.ansible.com/brianshumate/nomad

License: BSD 2-Clause "Simplified" License

Shell 26.30% Jinja 73.70%

nomad ansible ansible-role scheduler hashicorp hacktoberfest

ansible-nomad's People

Contributors

Stargazers

Watchers

Forkers

lanefu dggreenbaum awheeler ebreton gmarkey kjagiello loomnetwork enqack eegilbert dhirenshumsher burkostya ngocngv ilourt ykhemani rtzq0 ccf aliaksandr-dounar-epam patpadgett rodolphefouquet mrvovanness rinetd blaet jsecchiero kelleyblackmore s-sebastian-devops mockingbirdconsulting bewiwi jijeesh robloxrob mosen bilke valdisz awagelight zklapow pathcl axiops rbjorklin bdossantos teever heri16 jadams muubash spideylinux raposalorx ducminhle legogris odacer ryancraig gentoo9ball jebas ianlevesque fhriley erikburgess imcitius liemle3893 ranger-x perlboy habibrosyad gudron rndmh3ro pc-star chancez blake adawalli vipinkrajput samdoran dmitrydorofeev dnapbak skulblaka24 powellchristoph monolithprojects langerma theztd kennethkalmer stl314159 kishorb robustq davekonopka martinbucko devoptimize-com ygalblum ctorrisi gezibash nahname ixai blefish kevinschoonover systematicainvestments ping-io blmhemu awarner-greshamtech ahjohannessen paladin-devops rendanic dhung-hashicorp aronasorman gadle armaniv maust jlholm

ansible-nomad's Issues

Incorrect advertise addresses generated when using ipv6

This is version 1.8.0, commit: b8163a1.
If an ipv6 address is specified for the nomad_advertise_address, then nomad will error out with a bind: invalid argument error.

As can be seen in the basic.hcl the problem is that the ports are simply being appended to the nomad_advertise_address. IPv6 syntax requires that the IP portion be placed within brackets - e.g
[fdd3:ecf5:726d:5fbf:2e99:93f6:ba25:8570]:4646 rather than fdd3:ecf5:726d:5fbf:2e99:93f6:ba25:8570:4646

Followup:
I worked around this by changing the base.hcl.j2 by adding the ipwrap filter as described in the Ansible docs here.

bind_addr = "{{ nomad_bind_address }}"
advertise {
    http = "{{ nomad_advertise_address | ipwrap }}:{{ nomad_ports.http }}"
    rpc = "{{ nomad_advertise_address | ipwrap }}:{{ nomad_ports.rpc }}"
    serf = "{{ nomad_advertise_address | ipwrap }}:{{ nomad_ports.serf }}"

Note that the ipwrap filter requires that the netaddr package to be installed on the ansible server. It does not need to be installed on the remote client.

I tested it with both ipv6 and ipv4 addresses and it seems to work. Your mileage may vary :-)

Compatibility with the version 1.0.0

I have a fix for this, could you allow me to send it as a new branch?

This is how does the fix look like:

+{% if nomad_version | replace(".", "0000") | int >= 10000000000 %}
+    # Deprecated in 1.0.0
+    backwards_compatible_metrics = "{{ nomad_telemetry_backwards_compatible_metrics | default(false) | bool | lower }}"
+    disable_tagged_metrics = "{{ nomad_telemetry_disable_tagged_metrics | default(false) | bool | lower }}"
+{% endif %}

Update to Nomad 0.12.1

v0.12.1 was released today! This has some desirable bug fixes. Let's update to the latest/greatest for the defaults.

https://github.com/hashicorp/nomad/releases/tag/v0.12.1

Installing on CentOS 7 and SELinux disabled

Hello.

When i starting playbook on host whith centos 7 and already disabled SELinux:

cat /etc/selinux/config 

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled

and:

setenforce 0 || echo $? 
setenforce: SELinux is disabled
1

i get the following:

TASK [brianshumate.nomad : Stop SELinux enforcing] ******************************************************************
fatal: [myhost]: FAILED! => {"changed": true, "cmd": "setenforce 0", "delta": "0:00:00.005533", "end": "2019-05-22 17:02:33.749074", "msg": "non-zero return code", "rc": 1, "start": "2019-05-22 17:02:33.743541", "stderr": "setenforce: SELinux is disabled", "stderr_lines": ["setenforce: SELinux is disabled"], "stdout": "", "stdout_lines": []}

PLAY RECAP **********************************************************************************************************

Upgrade 0.11

Nomad 0.11.x is now GA

Please create a new release and update galaxy

The role on galaxy is at 1.8.0, which is broken for me due to not having this fix: e45d6b6

As a result I get errors like:

AnsibleUndefinedVariable: 'nomad_ports_http' is undefined

Custom nomad config support

Hi Brian,
I don't see a way to provide custom / local configs, like a consul token, since the daemon is started with explicit config files, rather than pointing to the config directory.
If the client and server configs were only placed based on the value of nomad_node_role, it would seem we could point to the config directory instead, and I could just create a consul.hcl file which would automatically be picked up.
Alternatively, a second config dir could be passed on to the daemon, leaving the existing setup intact.

Thoughts?

Fail to add Nomad servers

Info

I'm trying to build a procedure that will allow me to add instances (servers and clients) to the cluster as we go along. I've noticed that if I start with a single machine and configured as both client and server, when I add another such server, one of them declares itself as the leader while the other one gets stuck in a "no leader" loop.

Steps to reproduce

Run the role with only one machine and configure its role to "both"
Edit the inventory file and add a second machine also role is also "both"

Expected Result

The two instances agree on a leader and the cluster is fully fucntional

Actual Result

One instance declares itself as the leader while the other one is stuck in a "no leader" loop.

Investigation

I've found that the root cause of this issue is the duplicate restart of the service. When a new machine is added, it is started twice. First in the main task when enabling the service and then it gets restarted in the handler becuase its configuration has changed.

Proposed solution

Only enable the service in the main task and let the handler start it.

I have working solution for systemd based systems which keeps other systems unchanged. I can open a PR for it

The server config is being installed on clients and ignored

In the rendering of the server.hcl file, server specific variables are evaluated as well as needing the python-netaddr package, both of which may not be available to the client, requiring them to be defined or installed and then ignored.

I will create a merge request that address this by not installing the server.hcl file when nomad_node_role == 'client'

Nomad job bind network interface

Hi,

I have used your ansible role to deploy nomad on Virtualbox (my Vagrantfile).
On the VB I have each server with two network interfaces:

NAT: eth0
host-only: eth1

I have configured the "nomad_iface" to "eth1" but my jobs get bind only on eth0.

I also have "nomad_bind_address" => "0.0.0.0", because without this my server only listens on loopback device.

Do I need to do anything else?

Thanks,
Bruno

ensure unzip is installed on localhost

The download and unzip steps of the installation of nomad are delegated to localhost and will fail when unzip is not available.
This role fails to run from the container version of ansible/awx.

Workaround was to install unzip in the ansible/awx container from the playbook, before the call to this role.

A patch for ansible-nomad would be non-trivial, because the ability to install unzip on localhost from a package is not guaranteed. Perhaps the dependency for unzip on localhost could be mentioned in the README.md

Enhancing syslog logging.

Hello!

Similar to what we did for logging in the ansible-consul role, shall we enhance syslog logging for ansible-nomad?

Ownership of {{ nomad_log_dir }} would be set to syslog:adm to enable rsyslogd to write to that directory.

We would like to be able to define syslog facility used for logging messages. Suggest we define the variable nomad_syslog_facility for this purpose.

Also, in order to have rsyslogd write nomad logs to {{ consul_log_path }}, we need a corresponding rule to the syslog facility. Suggest that this happens via the /etc/rsyslog.d/00-nomad.conf.

I will submit a PR for your review.

Please let me know if you have any questions.

Thanks!
-yash

Pipefail flag assumes bash

https://github.com/brianshumate/ansible-nomad/blame/master/tasks/install.yml#L32

i noticed that my shell was running /bin/sh and was failing on the pipefail flag.

ask a quick workaround I added this to the task:

  args:
    executable: /bin/bash

error:

TASK [nomad : Get Nomad package checksum] ******************************************************************************************************************************************************************
fatal: [armdocker-1.angrybear2.local -> 127.0.0.1]: FAILED! => {"changed": true, "cmd": "set -o pipefail\n grep \"nomad_0.9.1_linux_arm.zip\" \"/home/lane/GIT/lanesible/roles/nomad/files/nomad_0.9.1_SHA25
6SUMS\" | awk '{print $1}'", "delta": "0:00:00.008978", "end": "2019-05-17 03:48:05.544637", "msg": "non-zero return code", "rc": 2, "start": "2019-05-17 03:48:05.535659", "stderr": "/bin/sh: 1: set: Ille
gal option -o pipefail", "stderr_lines": ["/bin/sh: 1: set: Illegal option -o pipefail"], "stdout": "", "stdout_lines": []}

Two init scripts get installed on Ubuntu 16.04

On Ubuntu 16.04, both the SYSV init script and systemd unit get installed. Moreover the service is not enabled automatically.

The part installing the scripts works well in the Consul playbook, it could be copied from there.

Using this role, can we install nomad as any user

other than root? If nomad functionality is restricted to running jobs (with raw_fork/exec) it is fine.

cgroup-bin removed in Ubuntu 19.10

When running the role on Ubuntu 19.10, I get the following error with ensuring the required OS packages are installed

failed: [lab] (item=cgroup-bin) => {"ansible_loop_var": "item", "changed": false, "item": "cgroup-bin", "msg": "No package matching 'cgroup-bin' is available"}

After a bit of digging I found that it appears the problem is caused by cgroup-bin being deprecated and subsequently removed in 19.10 (see here).

Looking around it seems that the packages are hard-coded in vars/Debian.yaml:

nomad_os_packages:
  - cgroup-bin
  - curl
  - git
  - libcgroup1
  - unzip

I've tried many ways of overriding these variables but it looks like I'm stuck using --extra-vars from the command line or importing and changing the role locally to replace cgroup-bin with cgroup-tools. While the former is ok, the later feels very wrong. I would put in a PR to update them, but I'm not sure what the best approach to take is as I'm a relative newcomer to Ansible.

Clarification regarding Consul

Hi,

that's a very clean role, thank you for all the effort you put into it!

I have a question, though:
In tasks/install.yml there's the Download nomad task with this conditional check in line 45:

when: consul_package.stat.exists == False

Does that mean the role is not meant to be used with Consul integration?
I'm asking this because I would like to use this role together with Consul as described here for automatic service discovery of the tasks executed by Nomad.

More nomad_sysvinit.j2 fixes

Hey Brian,
The logging arg is ignored, and the & to background the process backgrounds the entire deamon call. Could fix these by adding &>> in front of the LOG_FILE, and an escape in front of the &, like:

      --pidfile="$PID_FILE" \
      "$nomad" agent -"{{ nomad_node_role }}" -config "${CONFIG_PATH}"/base.hcl -config "${CONFIG_PATH}/{{ nomad_node_role }}.hcl" &>> "${LOG_FILE}" \&

stopping nomad with SIGKILL

Hi Brian!
I've noticed that in here you are stopping nomad with SIGKILL signal.
Is it a recommended way to stop nomad?

error starting nomad

Main problem

I'm running the example vagrant file (using BOX_NAME=centos/6, debian works fine BTW) and am getting the following output:

TASK [brianshumate.nomad : Start Nomad] ****************************************
fatal: [nomad1.local]: FAILED! => {"changed": false, "msg": "failed determining service state, possible typo of service name?"}
fatal: [nomad2.local]: FAILED! => {"changed": false, "msg": "failed determining service state, possible typo of service name?"}
fatal: [nomad3.local]: FAILED! => {"changed": false, "msg": "failed determining service state, possible typo of service name?"}

RUNNING HANDLER [brianshumate.nomad : restart nomad] ***************************
        to retry, use: --limit @/Users/********/ansible-nomad/examples/site.retry

PLAY RECAP *********************************************************************
nomad1.local               : ok=17   changed=12   unreachable=0    failed=1
nomad2.local               : ok=16   changed=12   unreachable=0    failed=1
nomad3.local               : ok=16   changed=12   unreachable=0    failed=1

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

I have a hunch that the /etc/init.d/nomad script doesn't use the binary with the correct arguments.
If I run vagrant ssh nomad1 to enter the instance shell and subsequently run sudo service nomad status I get the following response:

[vagrant@nomad1 ~]$ sudo service nomad status
Usage: nomad [-version] [-help] [-autocomplete-(un)install] <command> [args]

Common commands:
    run         Run a new job or update an existing job
    stop        Stop a running job
    status      Display the status output for a resource
    alloc       Interact with allocations
    job         Interact with jobs
    node        Interact with nodes
    agent       Runs a Nomad agent

Other commands:
    acl             Interact with ACL policies and tokens
    agent-info      Display status information about the local agent
    deployment      Interact with deployments
    eval            Interact with evaluations
    namespace       Interact with namespaces
    operator        Provides cluster-level tools for Nomad operators
    quota           Interact with quotas
    sentinel        Interact with Sentinel policies
    server          Interact with servers
    ui              Open the Nomad Web UI
    version         Prints the Nomad version

If I look at the implementation of /etc/init.d/nomad I see that the status section calls info which doesn't exist as a command... valid options are status and agent-info

Additional issues

This is where both centos and debian don't seem to work... not sure if this is just related to my virtualbox installation.

root@nomad1:~# nomad status
Error querying jobs: Get http://127.0.0.1:4646/v1/jobs: dial tcp 127.0.0.1:4646: connect: connection refused

root@nomad1:~# nomad agent-info
Error querying agent info: failed querying self endpoint: Get http://127.0.0.1:4646/v1/agent/self: dial tcp 127.0.0.1:4646: connect: connection refused

Example Vagrant file doesn't work out of the box: `Error querying node status: Get http://127.0.0.1:4646/v1/nodes: dial tcp 127.0.0.1:4646: connect: connection refused`

[dnk8n@localhost brianshumate.nomad]$ cd examples/
[dnk8n@localhost examples]$ ls
bin  README_VAGRANT.md  site.yml  Vagrantfile  vagrant_hosts
[dnk8n@localhost examples]$ pwd
/home/dnk8n/.ansible/roles/brianshumate.nomad/examples
[dnk8n@localhost examples]$ ./bin/preinstall 
✅  nomad VM node information present in /etc/hosts
✅  Vagrant Hosts plugin is installed
[dnk8n@localhost examples]$ vagrant up
Bringing machine 'nomad1' up with 'virtualbox' provider...
Bringing machine 'nomad2' up with 'virtualbox' provider...
Bringing machine 'nomad3' up with 'virtualbox' provider...
==> nomad1: Importing base box 'debian/jessie64'...
==> nomad1: Matching MAC address for NAT networking...
==> nomad1: Checking if box 'debian/jessie64' version '8.11.1' is up to date...
==> nomad1: Setting the name of the VM: nomad-node1
==> nomad1: Clearing any previously set network interfaces...
==> nomad1: Preparing network interfaces based on configuration...
    nomad1: Adapter 1: nat
    nomad1: Adapter 2: hostonly
==> nomad1: Forwarding ports...
    nomad1: 22 (guest) => 2222 (host) (adapter 1)
==> nomad1: Running 'pre-boot' VM customizations...
==> nomad1: Booting VM...
==> nomad1: Waiting for machine to boot. This may take a few minutes...
    nomad1: SSH address: 127.0.0.1:2222
    nomad1: SSH username: vagrant
    nomad1: SSH auth method: private key
    nomad1: 
    nomad1: Vagrant insecure key detected. Vagrant will automatically replace
    nomad1: this with a newly generated keypair for better security.
    nomad1: 
    nomad1: Inserting generated public key within guest...
    nomad1: Removing insecure key from the guest if it's present...
    nomad1: Key inserted! Disconnecting and reconnecting using new SSH key...
==> nomad1: Machine booted and ready!
==> nomad1: Checking for guest additions in VM...
    nomad1: No guest additions were detected on the base box for this VM! Guest
    nomad1: additions are required for forwarded ports, shared folders, host only
    nomad1: networking, and more. If SSH fails on this machine, please install
    nomad1: the guest additions and repackage the box to continue.
    nomad1: 
    nomad1: This is not an error message; everything may continue to work properly,
    nomad1: in which case you may ignore this message.
==> nomad1: Setting hostname...
==> nomad1: Configuring and enabling network interfaces...
==> nomad1: Installing rsync to the VM...
==> nomad1: Rsyncing folder: /home/dnk8n/.ansible/roles/brianshumate.nomad/examples/ => /vagrant
==> nomad1: Running provisioner: hosts...
==> nomad2: Importing base box 'debian/jessie64'...
==> nomad2: Matching MAC address for NAT networking...
==> nomad2: Checking if box 'debian/jessie64' version '8.11.1' is up to date...
==> nomad2: Setting the name of the VM: nomad-node2
==> nomad2: Fixed port collision for 22 => 2222. Now on port 2200.
==> nomad2: Clearing any previously set network interfaces...
==> nomad2: Preparing network interfaces based on configuration...
    nomad2: Adapter 1: nat
    nomad2: Adapter 2: hostonly
==> nomad2: Forwarding ports...
    nomad2: 22 (guest) => 2200 (host) (adapter 1)
==> nomad2: Running 'pre-boot' VM customizations...
==> nomad2: Booting VM...
==> nomad2: Waiting for machine to boot. This may take a few minutes...
    nomad2: SSH address: 127.0.0.1:2200
    nomad2: SSH username: vagrant
    nomad2: SSH auth method: private key
    nomad2: 
    nomad2: Vagrant insecure key detected. Vagrant will automatically replace
    nomad2: this with a newly generated keypair for better security.
    nomad2: 
    nomad2: Inserting generated public key within guest...
    nomad2: Removing insecure key from the guest if it's present...
    nomad2: Key inserted! Disconnecting and reconnecting using new SSH key...
==> nomad2: Machine booted and ready!
==> nomad2: Checking for guest additions in VM...
    nomad2: No guest additions were detected on the base box for this VM! Guest
    nomad2: additions are required for forwarded ports, shared folders, host only
    nomad2: networking, and more. If SSH fails on this machine, please install
    nomad2: the guest additions and repackage the box to continue.
    nomad2: 
    nomad2: This is not an error message; everything may continue to work properly,
    nomad2: in which case you may ignore this message.
==> nomad2: Setting hostname...
==> nomad2: Configuring and enabling network interfaces...
==> nomad2: Installing rsync to the VM...
==> nomad2: Rsyncing folder: /home/dnk8n/.ansible/roles/brianshumate.nomad/examples/ => /vagrant
==> nomad2: Running provisioner: hosts...
==> nomad3: Importing base box 'debian/jessie64'...
==> nomad3: Matching MAC address for NAT networking...
==> nomad3: Checking if box 'debian/jessie64' version '8.11.1' is up to date...
==> nomad3: Setting the name of the VM: nomad-node3
==> nomad3: Fixed port collision for 22 => 2222. Now on port 2201.
==> nomad3: Clearing any previously set network interfaces...
==> nomad3: Preparing network interfaces based on configuration...
    nomad3: Adapter 1: nat
    nomad3: Adapter 2: hostonly
==> nomad3: Forwarding ports...
    nomad3: 22 (guest) => 2201 (host) (adapter 1)
==> nomad3: Running 'pre-boot' VM customizations...
==> nomad3: Booting VM...
==> nomad3: Waiting for machine to boot. This may take a few minutes...
    nomad3: SSH address: 127.0.0.1:2201
    nomad3: SSH username: vagrant
    nomad3: SSH auth method: private key
    nomad3: 
    nomad3: Vagrant insecure key detected. Vagrant will automatically replace
    nomad3: this with a newly generated keypair for better security.
    nomad3: 
    nomad3: Inserting generated public key within guest...
    nomad3: Removing insecure key from the guest if it's present...
    nomad3: Key inserted! Disconnecting and reconnecting using new SSH key...
==> nomad3: Machine booted and ready!
==> nomad3: Checking for guest additions in VM...
    nomad3: No guest additions were detected on the base box for this VM! Guest
    nomad3: additions are required for forwarded ports, shared folders, host only
    nomad3: networking, and more. If SSH fails on this machine, please install
    nomad3: the guest additions and repackage the box to continue.
    nomad3: 
    nomad3: This is not an error message; everything may continue to work properly,
    nomad3: in which case you may ignore this message.
==> nomad3: Setting hostname...
==> nomad3: Configuring and enabling network interfaces...
==> nomad3: Installing rsync to the VM...
==> nomad3: Rsyncing folder: /home/dnk8n/.ansible/roles/brianshumate.nomad/examples/ => /vagrant
==> nomad3: Running provisioner: hosts...
==> nomad3: Running provisioner: ansible...
    nomad3: Running ansible-playbook...

PLAY [Installing Nomad] ********************************************************

TASK [Gathering Facts] *********************************************************
[WARNING]: Platform linux on host nomad2.local is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.9/referen
ce_appendices/interpreter_discovery.html for more information.
ok: [nomad2.local]
[WARNING]: Platform linux on host nomad3.local is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.9/referen
ce_appendices/interpreter_discovery.html for more information.
ok: [nomad3.local]
[WARNING]: Platform linux on host nomad1.local is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change this. See https://docs.ansible.com/ansible/2.9/referen
ce_appendices/interpreter_discovery.html for more information.
ok: [nomad1.local]

TASK [brianshumate.nomad : Check distribution compatibility] *******************
skipping: [nomad2.local]
skipping: [nomad1.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Fail if not a new release of Red Hat / CentOS] ******
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Fail if not a new release of Debian] ****************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Fail if not a new release of Ubuntu] ****************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Check nomad_group_name is included in groups] *******
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Include OS variables] *******************************
ok: [nomad1.local]
ok: [nomad2.local]
ok: [nomad3.local]

TASK [brianshumate.nomad : Gather facts from other servers] ********************

TASK [brianshumate.nomad : Expose bind_address, advertise_address and node_role as facts] ***
ok: [nomad1.local]
ok: [nomad2.local]
ok: [nomad3.local]

TASK [brianshumate.nomad : Add Nomad group] ************************************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Add Nomad user] *************************************
changed: [nomad2.local]
changed: [nomad3.local]
changed: [nomad1.local]

TASK [brianshumate.nomad : Install dmsetup for Ubuntu 16.04] *******************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Run dmsetup for Ubuntu 16.04] ***********************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Add Nomad user to docker group] *********************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : OS packages] ****************************************
changed: [nomad1.local] => (item=cgroup-bin)
changed: [nomad2.local] => (item=cgroup-bin)
changed: [nomad3.local] => (item=cgroup-bin)
changed: [nomad1.local] => (item=curl)
changed: [nomad2.local] => (item=curl)
changed: [nomad3.local] => (item=curl)
changed: [nomad3.local] => (item=git)
changed: [nomad2.local] => (item=git)
ok: [nomad3.local] => (item=libcgroup1)
ok: [nomad2.local] => (item=libcgroup1)
changed: [nomad1.local] => (item=git)
ok: [nomad1.local] => (item=libcgroup1)
changed: [nomad3.local] => (item=unzip)
changed: [nomad2.local] => (item=unzip)
changed: [nomad1.local] => (item=unzip)

TASK [brianshumate.nomad : Check Nomad package checksum file] ******************
ok: [nomad1.local]

TASK [brianshumate.nomad : Get Nomad package checksum file] ********************
changed: [nomad1.local]

TASK [brianshumate.nomad : Get Nomad package checksum] *************************
changed: [nomad3.local]
changed: [nomad1.local]
changed: [nomad2.local]

TASK [brianshumate.nomad : Check Nomad package file] ***************************
ok: [nomad2.local]
ok: [nomad1.local]
ok: [nomad3.local]

TASK [brianshumate.nomad : Download Nomad] *************************************
changed: [nomad2.local]
ok: [nomad3.local]
ok: [nomad1.local]

TASK [brianshumate.nomad : Create Temporary Directory for Extraction] **********
changed: [nomad2.local]
changed: [nomad1.local]
changed: [nomad3.local]

TASK [brianshumate.nomad : Unarchive Nomad] ************************************
changed: [nomad3.local]
changed: [nomad1.local]
changed: [nomad2.local]

TASK [brianshumate.nomad : Install Nomad] **************************************
changed: [nomad2.local]
changed: [nomad3.local]
changed: [nomad1.local]

TASK [brianshumate.nomad : Cleanup] ********************************************
changed: [nomad1.local]
changed: [nomad2.local]
changed: [nomad3.local]

TASK [brianshumate.nomad : Disable SELinux for Docker Driver] ******************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Create directories] *********************************
changed: [nomad2.local] => (item=/var/nomad)
changed: [nomad1.local] => (item=/var/nomad)
changed: [nomad3.local] => (item=/var/nomad)

TASK [brianshumate.nomad : Create config directory] ****************************
changed: [nomad2.local]
changed: [nomad1.local]
changed: [nomad3.local]

TASK [brianshumate.nomad : Base configuration] *********************************
changed: [nomad2.local]
changed: [nomad1.local]
changed: [nomad3.local]

TASK [brianshumate.nomad : Server configuration] *******************************
skipping: [nomad3.local]
changed: [nomad2.local]
changed: [nomad1.local]

TASK [brianshumate.nomad : Client configuration] *******************************
skipping: [nomad1.local]
skipping: [nomad2.local]
changed: [nomad3.local]

TASK [brianshumate.nomad : Custom configuration] *******************************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : SYSV init script] ***********************************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : Debian init script] *********************************
skipping: [nomad1.local]
skipping: [nomad2.local]
skipping: [nomad3.local]

TASK [brianshumate.nomad : extract systemd version] ****************************
ok: [nomad3.local]
ok: [nomad1.local]
ok: [nomad2.local]

TASK [brianshumate.nomad : systemd script] *************************************
changed: [nomad1.local]
changed: [nomad3.local]
changed: [nomad2.local]

TASK [brianshumate.nomad : reload systemd daemon] ******************************
ok: [nomad3.local]
ok: [nomad2.local]
ok: [nomad1.local]

TASK [brianshumate.nomad : Start Nomad] ****************************************
changed: [nomad1.local]
changed: [nomad3.local]
changed: [nomad2.local]

TASK [Start nomad] *************************************************************
ok: [nomad1.local]
ok: [nomad2.local]
ok: [nomad3.local]

RUNNING HANDLER [brianshumate.nomad : restart nomad] ***************************
changed: [nomad3.local]
changed: [nomad2.local]
changed: [nomad1.local]

PLAY RECAP *********************************************************************
nomad1.local               : ok=24   changed=15   unreachable=0    failed=0    skipped=15   rescued=0    ignored=0   
nomad2.local               : ok=22   changed=15   unreachable=0    failed=0    skipped=15   rescued=0    ignored=0   
nomad3.local               : ok=22   changed=14   unreachable=0    failed=0    skipped=15   rescued=0    ignored=0   


==> nomad1: Machine 'nomad1' has a post `vagrant up` message. This is a message
==> nomad1: from the creator of the Vagrantfile, and not from Vagrant itself:
==> nomad1: 
==> nomad1: Vanilla Debian box. See https://app.vagrantup.com/debian for help and bug reports

==> nomad2: Machine 'nomad2' has a post `vagrant up` message. This is a message
==> nomad2: from the creator of the Vagrantfile, and not from Vagrant itself:
==> nomad2: 
==> nomad2: Vanilla Debian box. See https://app.vagrantup.com/debian for help and bug reports

==> nomad3: Machine 'nomad3' has a post `vagrant up` message. This is a message
==> nomad3: from the creator of the Vagrantfile, and not from Vagrant itself:
==> nomad3: 
==> nomad3: Vanilla Debian box. See https://app.vagrantup.com/debian for help and bug reports
[dnk8n@localhost examples]$ vagrant ssh nomad1

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Nov 19 12:56:38 2020 from 10.1.42.1
vagrant@nomad1:~$ nomad node status
Error querying node status: Get http://127.0.0.1:4646/v1/nodes: dial tcp 127.0.0.1:4646: connect: connection refused

Error loading /etc/nomad.d/client.hcl: illegal char

I am getting an "illegal char error" with my client.hcl when using nomad_chroot_env.

This line seems to be incorrect:
https://github.com/brianshumate/ansible-nomad/blob/b37e501c5ff320e85fc10bc4d638285aec027867/templates/client.hcl.j2#L35

What the template generated

client {
    enabled = true

    chroot_env = {
    "/etc/local.resolv.conf": "/etc/resolv.conf",
    "/run/systemd/resolve": "/run/systemd/resolve"
}
    }

What the template should have generated

client {
    enabled = true

    chroot_env = {
    "/etc/local.resolv.conf" = "/etc/resolv.conf",
    "/run/systemd/resolve" = "/run/systemd/resolve"
}
    }

Upgrade 0.12

Upgrade 0.12, add new configuration vars as needed

nomad_telemetry variable is undefined

PR #88 added support for configuring Nomad's telemetry options (as previously requested in #59).

The conditional check on line 86 of base.hcl, which controls whether to output the telemetry {} stanza, contains a reference to an undefined nomad_telemetry variable which causes the playbook to fail.

TASK [brianshumate.nomad : Base configuration] ************************************************************************
fatal: [rpi4-6.local]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'nomad_telemetry' is undefined"}

Add automatic encryption key setup

It would be great to have automatic gossip encryption key generation and setup for nomad similar to the consul role. As of now it seems that the way to add encryption is by manually generating a key and setting the nomad_encryption variable.

Default user

In README.md, you mentioned that the default user "nomad" but when I checked the defaults, it is defined as "root"

nomad_systemvinit.j2 issues

Hey Brian,
the nomad_sysvinit.j2 file is passing -pid-file "${PID_FILE}" which nomad doesn't seem to recognize, and I can't find that arg in any 0.5.6 docs.

Additionally, restart doesn't seem to work, but stop, sleep for some period, and start does. I've seen this behavior in other procs like consul, and I'm not sure what's causing it.

Add rpc_upgrade_mode for TLS

When we upgrade from no TLS to TLS, we need the parameter "rpc_upgrade_mode" = true in TLS config.
https://learn.hashicorp.com/nomad/transport-security/enable-tls#rpc-upgrade-mode-for-nomad-servers
https://nomadproject.io/docs/configuration/tls/

Improve Vault Support

Namespacing an enterprise feature that is offered in vault. Nomad offers this feature, but this ansible role currently does not expose support for it.

Furthermore, the Nomad documentation specifies that a vault token does not need to be added for a client node: https://www.nomadproject.io/docs/configuration/vault#nomad-client
In order to reduce the spread of tokens, it is recommended that vault tokens only be provided directly to the server.

Auto install CNI plugins

Nomad + Consul connect require the CNI plugins to be installed in /opt/cni/bin

OS package installation for unarchive is missing on control node

As the extraction is performed locally on the ansible control node, there should be a check to install the correct packages.

Extract nomad archive requires unzip (atleast on RedHat/ CentOs). https://github.com/brianshumate/ansible-nomad/blob/26626ea9d40cc31877f3518870f824c54d6ea61c/tasks/install.yml#L68

Without installation of unzip a error occurs:

TASK [brianshumate.nomad : Unarchive Nomad] *************************************************************************************************************************************************************************************************
fatal: [project-shared-hs03 -> 127.0.0.1]: FAILED! => {"changed": false, "msg": "Failed to find handler for \"/home/user/.ansible/tmp/ansible-tmp-1571734629.36-215237641599247/source\". Make sure the required command to extract the file is
 installed. Command \"unzip\" not found. Command \"/usr/bin/gtar\" could not handle archive."}
fatal: [project-shared-hs02 -> 127.0.0.1]: FAILED! => {"changed": false, "msg": "Failed to find handler for \"/home/user/.ansible/tmp/ansible-tmp-1571734629.29-222481350635223/source\". Make sure the required command to extract the file is
 installed. Command \"unzip\" not found. Command \"/usr/bin/gtar\" could not handle archive."}
fatal: [project-shared-hs01 -> 127.0.0.1]: FAILED! => {"changed": false, "msg": "Failed to find handler for \"/home/user/.ansible/tmp/ansible-tmp-1571734629.19-86320697202974/source\". Make sure the required command to extract the file is
installed. Command \"unzip\" not found. Command \"/usr/bin/gtar\" could not handle archive."}
        to retry, use: --limit @/home/user/project-automation/playbooks/hashistack_2_apps.retry

PLAY RECAP **********************************************************************************************************************************************************************************************************************************
project-shared-hs01          : ok=38   changed=3    unreachable=0    failed=1
project-shared-hs02          : ok=33   changed=3    unreachable=0    failed=1
project-shared-hs03          : ok=33   changed=3    unreachable=0    failed=1

Lost nomad_use_consul functionality

With this commit: 8828144
We lost the ability to auto-configure nomad based on consul. Let's bring that back, unless there was a good reason to drop it.

If you agree, I'll submit a merge request.

Debian Bullseye compatibility - cgroup-bin

no croup-bin package on debian bullseye -- same as #75

failed: [mcsolo-3] (item=cgroup-bin) => {"ansible_loop_var": "item", "changed": false, "item": "cgroup-bin", "msg": "No package matching 'cgroup-bin' is available"}

change default nomad user to nomad instead of root

nomad_user user is set to root and nomad_group is set to bin by default. Therefore, the code below will change the primary group for the root user to bin.

- name: Add Nomad user
  user:
    name: "{{ nomad_user }}"
    comment: "Nomad user"
    group: "{{ nomad_group }}"
    system: yes
  when:
    - nomad_manage_user | bool

This caused an issue with snap in my lab server because when I installed microk8s the commands failed with the following error

/snap/bin/microk8s
permanently dropping privs did not work: File exists

Took a while to figure out that the error above was caused because the GID for the root user was not 0. After setting the group for the root user back to root (gid 0) the issue was resolved.

The following shows how snap throws the error above when uid and gid are not 0.
https://github.com/snapcore/snapd/blob/master/cmd/snap-confine/snap-confine.c#L503-L506

Proposal:
Set nomad_user to nomad instead of root by default.

nomad_group_name must be included in groups

`nomad_group_name`

Ansible group that contains all cluster nodes
Default value: nomad_instances

how should i use this nomad_group_name? if i use defalut value ,i get some error :nomad_group_name must be included in groups

startup scripts all assume -server

All of the startup scripts are hard-coded to assume they are running on nomad servers, but I would think they should run -client when nomad_node_role==client. Something like:

"$nomad" agent {% if nomad_node_role == 'server' %}-server{% else %}-client{% endif %}

Approaching Plugin settings stanza

will existing nomad_options parameter work?
should we define plugin options as seperate blocks and render separate config file?

https://www.nomadproject.io/docs/configuration/plugin

Support host_volume in client

With the new support of host_volumes, this needs to be added to client.hcl.
Alternatively, maybe we can consider adding arbitrary extra text to append to either base.hcl, server.hcl and client.hcl so features in newer versions can be configured without implementing support for everything?

Convert travis-ci to GitHub workflow

need ci-testing like ansible-consul

Jinja tests used as filters generating deprecation warnings

Tasks using jinja tests as filters will generate deprecation warnings such as these starting in ansible 2.5. The documented requirements are ansible 2.5, so worth updating to the correct syntax. I have submitted a PR.

-charles.

authoritative_region for multi-region ACL

We need authoritative_region in server config, using for multi-region ACL.
https://nomadproject.io/guides/security/acl/#configuring-acls
https://nomadproject.io/docs/configuration/server/#authoritative_region

Nomad installation error

Hi All,

I am new to this website and getting nomad installation error even with admin rights.

how to set nomad_options with inventory vars on hosts to enable driver.raw_exec.enable= 1

Update Ansible-Galaxy

Need to transition ansible-galaxy entry to new repo location

The variable nomad_bin_dir is ignored

All of the startup scripts are hard-coded for /usr/local/bin/nomad, but should instead use:
{{ nomad_bin_dir }}/nomad
And the install.yml file task (Install Nomad) also installs to /usr/local/bin, rather than "{{ nomad_bin_dir }}"

Linting issue commit missing in latest release

Hi, when I install this role via ansible-galaxy, the source appears to missing the latest commit which fixes a linting issue. Unfortunately this causes ansible to fail with a syntax error:

- name: Add Nomad user to docker group
  when: "{{ nomad_user }}" != "root"
                           ^ here

If you run: ansible-galaxy role info brianshumate.nomad, it appears like the commit is included:

Role: brianshumate.nomad
	description: Nomad cluster role
	active: True
	commit: 6946dfed63663709ab87dbd269bf35ce2ca1f1bc
	commit_message: Fix linting issue in Docker tasks
	commit_url: https://api.github.com/repos/brianshumate/ansible-nomad/git/commits/6946dfed63663709ab87dbd269bf35ce2ca1f1bc
	company: Brian Shumate
	created: 2017-02-23T23:14:07.668561Z
	dependencies: []
	download_count: 4509
	forks_count: 58
	galaxy_info:
		author: Brian Shumate
		company: Brian Shumate
		galaxy_tags: ['clustering', 'monitoring', 'networking', 'scheduling', 'system']
		license: BSD
		min_ansible_version: 2.5
		platforms: [{'name': 'Archlinux', 'versions': ['all']}, {'name': 'EL', 'versions': [6, 7]}, {'name': 'Ubuntu', 'versions': ['vivid', 'xenial']}, {'name': 'Debian', 'versions': ['jessie']}, {'name': 'Windows', 'versions': ['2012R2']}]
	github_branch: master
	github_repo: ansible-nomad
	github_user: brianshumate
	id: 15834
	imported: 2019-12-17T16:44:28.033546-05:00
	install_date: Sun Jan  5 19:19:35 2020
	installed_version: v1.9.3
	is_valid: True
	issue_tracker_url: https://github.com/brianshumate/ansible-nomad/issues
	license: BSD
	min_ansible_version: 2.5
	modified: 2019-12-17T21:44:28.045301Z
	open_issues_count: 8
	path: (u'/root/.ansible/roles', u'/usr/share/ansible/roles', u'/etc/ansible/roles')
	role_type: ANS
	stargazers_count: 103
	travis_status_url: https://travis-ci.org/brianshumate/ansible-nomad.svg?branch=master

However, I can verify the incorrect code is included in the release package at: https://github.com/brianshumate/ansible-nomad/archive/v1.9.3.zip.

Nomad will only log to /var/log/messages (RHEL7)

I'm trying to get the role to configure nomad to log in /var/log/nomad rather than /var/log/messages, just for convenience of monitoring the cluster logs. I've set group_vars for my nomad group to:

nomad_syslog_enable: false

and otherwise am using the default location for logging. Regardless of the syslog setting though, all log messages from nomad are written to /var/log/messages rather than the default location specified in the role as:

nomad_log_dir: /var/log/nomad

Directory creation ignores directory variables

The task Create directories ignores directory variables, and should look something like:

  file:
    dest: "{{ item }}"
    state: directory
    owner: "{{ nomad_user }}"
    group: "{{ nomad_group}}"
  with_items:
    - /opt/nomad
    - /var/run/nomad
    - "{{ nomad_data_dir }}"
    - "{{ nomad_config_dir }}"
    - "{{ nomad_log_dir }}"

Add telemetry support

Hi Brian, first of all, amazing role, thanks for this.

I'm wondering if you could add support for telemetry also? This way we can also get metrics from Nomad nodes.

Thanks,
Bruno

bootstrap_expect not getting set properly when using Consul

Hello there!

I'm trying to setup a small development cluster, but have encountered some problems with the Nomad nodes not being able to elect a leader. The important factor contributing to this problem is that I'm using Consul – without it, the setup is smooth and painless.

I've narrowed down the issue to following code parts:

nomad_servers will be set to the length of the array containing the hosts in the nomad_group_name group.
https://github.com/brianshumate/ansible-nomad/blob/40fa2bbe95c592d215f36be72d480accbe151fb2/templates/server.hcl.j2#L4-L6

The array will be however populated only if nomad_use_consul=no, which means that if Consul is enabled, the bootsrap_expect parameter will become 0, which confuses Nomad a lot and prevents it from electing leader (0 = no leader should be elected).
https://github.com/brianshumate/ansible-nomad/blob/418fa2ede73ed7240b37ef7cbc5af4c4dcf6405f/defaults/main.yml#L92-L104

Wouldn't it make sense to remove the surrounding if-statement in the snippet above? Even if you use Consul, you will want to maintain a list of the server nodes, so populating the nomad_servers, no matter the Consul presence, feels fine to me. What do you think?

nomad_servers: "\
  {% set _nomad_servers = [] %}\
  {% for host in groups[nomad_group_name] %}\
    {% set _nomad_node_role = hostvars[host]['nomad_node_role'] | default('client', true) %}\
    {% if ( _nomad_node_role == 'server' or _nomad_node_role == 'both') %}\
      {% if _nomad_servers.append(host) %}{% endif %}\
    {% endif %}\
  {% endfor %}\
  {{ _nomad_servers }}"

If you are fine with the proposed change, I will submit a PR with a fix.

Cheers!