splunk / ansible-role-for-splunk Goto Github PK

Splunk@Splunk's Ansible role for installing Splunk, upgrading Splunk, and installing apps/addons on Splunk deployments (VM/bare metal)

License: Apache License 2.0

Shell 27.46% Jinja 72.54%

splunk-upgrade splunk-deployments splunk-admins splunk-forwarder splunk ansible anisble-role ansible-role-for-splunk splunk-installations splunk-role

ansible-role-for-splunk's Introduction

ansible-role-for-splunk: An Ansible role for Splunk admins

This repository contains Splunk's official Ansible role for performing Splunk administration of remote hosts over SSH. This role can manage Splunk Enterprise and Universal Forwarders that are on Linux-based platforms (CentOS/Redhat/Ubuntu/Amazon Linux/OpenSUSE), as well as deploy configurations from Git repositories. Example playbooks and inventory files are also provided to help new Ansible users make the most out of this project.

ansible-role-for-splunk is used by the Splunk@Splunk team to manage Splunk's corporate deployment of Splunk.

Purpose
Getting Started
Extended Documentation
Frequently Asked Questions
Support
License

Purpose

What is ansible-role-for-splunk?

ansible-role-for-splunk is a single Ansible role for deploying and administering production Splunk deployments. It supports all Splunk deployment roles (Universal Forwarder, Heavy Forwarder, Indexer, Search Head, Deployment Server, Cluster Master, SHC Deployer, DMC, License Master) as well as management of all apps and configurations (via git repositories).

This codebase is used by the Splunk@Splunk team internally to manage our deployment, so it has been thoroughly vetted since it was first developed in late 2018. For more information about Ansible best practices, checkout our related .conf20 session for this project.

Design Philosophy

A few different design philosophies have been applied in the development of this project.

First, ansible-role-for-splunk was designed under the "Don't Repeat Yourself (DRY)" philosophy. This means that the project contains minimal code redundancy. If you want to fork this project and change any functionality, you only need to update the code in one place.

Second, ansible-role-for-splunk was designed to be idempotent. This means that if the system is already in the desired state that Ansible expects, it will not make any changes. This even applies to our app management code, which can update apps on search heads without modifying existing local/ files that may have been created through actions in Splunk Web. For example, if you want to upgrade an app on a search head, and your repository does not contain a local/ folder, Ansible will not touch the existing local/ folder on the search head. This is accomplished using the synchronize module. For more information on that, refer to the configure_apps.yml task description.

Third, ansible-role-for-splunk was designed to manage all Splunk configurations as code. What do I mean by that? You're not going to find tasks for installing web certificates, templating indexes.conf, or managing every Splunk configuration possible. Instead, you will find that we have a generic configure_apps.yml task which can deploy any version of any git repository to any path under $SPLUNK_HOME on the hosts in your inventory. We believe that having all configurations in git repositories is the best way to perform version control and configuration management for Splunk deployments. That said, we've made a handful of exceptions:

Creation of the local splunk admin user. We are able to do this securely using ansible-vault to encrypt splunk_admin_password so that we can create a user-seed.conf during the initial installation. Please note that if you do not configure the splunk_admin_password variable with a new value, an admin account will not be created when deploying a new Splunk installation via check_splunk.yml.
Configuring deploymentclient.conf for Deployment Server (DS) clients. We realize that some environments may have hundreds of clientNames configured and that creating a git repository for each variation would be pretty inefficient. Therefore, we support configuring deploymentclient.conf for your Ansible-managed forwarders using variables. The current version is based on a single template that supports only the clientName and targetUri keys. However, this can be easily extended with additional variables (or static content) of your choosing.
Deployment of a new search head cluster. In order to initialize a new search head cluster, we cannot rely solely on creating backend files. Therefore, the role supports deploy a new search head cluster using provided variable values that are stored in your Ansible configurations (preferably via group_vars, although host_vars or inventory variables will also work).

Getting Started

Getting started with this role will requires you to:

Install Ansible (version >=v2.7 is supported and should work through v2.10)
Setup your inventory correctly
Configure the appropriate variables to describe the desired state of your environment
Create a playbook or leverage one of the included example playbooks that specifies the deployment_task you'd like to run

Ansible Setup

Ansible only needs to be installed on the host that you want to use to manage your Splunk deployments. We recommend having a dedicated server that is used only for Ansible orchestration, but technically you can run Ansible from any host, including your laptop, as long as you have the network connectivity and credentials required to SSH into hosts that are in your Ansible inventory.

Inventory

The layout of your inventory is critical for the tasks included in ansible-role-for-splunk to run correctly. The "role" of your host is determined by it being a member of one or more inventory groups that define its Splunk role. Ansible expects each host to be a member of one of these groups and uses that membership to determine the package that should be used, the installation path, the default deployment path for app deployments, and several other things. The following group names are currently supported:

full
uf
clustermanager
deploymentserver
indexer
licensemaster
search
shdeployer
dmc

Note that in Ansible you may nest groups within groups, and groups within those groups, and so on. We depend on this heavily to differentiate a full Splunk installation vs a Universal Forwarder (UF) installation, and to map variables in group_vars to specific groups of hosts. You will see examples of this within the sample inventory.yml files that are included in the "environments" folder of this project.

Variables

As proper usage of this role requires a thorough understanding of variables, familiarity with Ansible variable precedence is highly recommended. Almost all variables used in this role have been added to roles/splunk/defaults/main.yml (lowest precendence) for reference. Default values of "unconfigured" are automatically ignored at the task level.

Although a number of variables ship with this role, many of them automatically configure themselves when the play is executed. For example, during the upgrade check, the desired version of Splunk that you want to be at is based solely upon the value of splunk_package_url_full or splunk_package_url_uf. We extract the version and build numbers from the URL automagically, and then compare those values to the output of the "splunk version" command during the check_splunk.yml task to determine if an upgrade is required or not.

There are a few variables that need to configure out of the box to use this role with your environment:

splunk_uri_lm - The URI for your license master (e.g. https://my_license_master:8089)
ansible_user - The username that you want Ansible to connect as for SSH access
ansible_ssh_private_key_file - The file path to the private key that the Ansible user should use for SSH access authentication

In addition, you may want to configure some of the optional variables that are mentioned in roles/splunk/defaults/main.yml to manage things like splunk.secret, send Slack notifications, automatically install useful scripts or additional Linux packages, etc. For a full description of the configurable variables, refer to the comments in roles/splunk/defaults/main.yml and be sure to read-up on the task descriptions in this README file.

As of the v1.0.4 release for this role, an additional variable called target_shc_group_name must be defined in the host_vars for each SHC Deployer host. This variable tells Ansible which group of hosts in the inventory contain the SHC members that the SHC Deployer host is managing. This change improves the app deployment process for SHCs by performing a REST call to the first SH in the list from the inventory group whose name matches the value of target_shc_group_name. If the SHC is not in a ready state, then the play will halt and no changes will be made. It will also automatically grab the captain URI and use the captain as the deploy target for the apply shcluster-bundle handler. An example of how target_shc_group_name should be used has been included in the sample inventory at environments/production/inventory.yml.

In order to use the app management functionality, you will need to configure the following additional variables:

git_server: ssh://[email protected]
git_key: ~/.ssh/mygit.key
git_project: FOO
git_version: bar
git_apps:
  - name: my_app
    version: master

You will find additional examples in the included sample group_vars and host_vars files. Note that you may also specify git_server, git_key, git_project, and git_version within git_apps down to the repository (name) level. You may also override the auto-configured splunk_app_deploy_path at the repository level as well. For example, to deploy apps to $SPLUNK_HOME/etc/apps on a deployment server rather than the default of $SPLUNK_HOME/etc/deployment-apps. If not set, configure_apps.yml will determine the app deployment path based on the host's group membership within the inventory. Tip: If you only use one git server, you may want to define the git_server and related values in an all.yml group_var file.

Configure local splunk admin password at install

splunk_admin_username: youradminusername (optional, defaults to admin)
splunk_admin_password: yourpassword (required, but see note below about encryption)

Note: If you do not configure these 2 variables, new Splunk installations will be installed without an admin account present. This has no impact on upgrades to existing installations.

Configure splunk admin password for existing installations We recommend that the splunk_admin_username (if not using "admin) and splunk_admin_password variables be configured in either group_vars or host_vars. If you use the same username and/or password across your deployment, then an all.yml group_vars file is a great location. If you have different passwords for different hosts, then place these variables in a corresponding group_vars or host_vars file. You can then encrypt the password to use in-line with other unencrypted variables by using the following command: ansible-vault encrypt_string --ask-vault-pass 'var_value_to_encrypt' --name 'splunk_admin_password'. Once that is done, use either the --ask-vault-pass or --vault-password-file argument when running the playbook to have Ansible automatically decrypt the value for the play to use.

Playbooks

The following example playbooks have been included in this project for your reference:

splunk_app_install.yml - Install or upgrade apps on Splunk hosts using the configure_apps.yml task in the splunk role. Note that the apps you want to deploy should be defined in either host_vars or group_vars, along with a splunk_app_deploy_path. Refer to the documentation for app deployment for details.
splunk_install_or_upgrade.yml - Install or upgrade Splunk (or Splunk UFs) on hosts using the check_splunk.yml task in the splunk role.
splunk_shc_deploy.yml - Installs Splunk and initializes search head clustering on a shdeployer and group of hosts that will serve as a new search head cluster.
splunk_upgrade_full_stack.yml - Example playbook that demonstrates how to upgrade an entire Splunk deployment with a single-site indexer cluster and a search head cluster using the splunk role. Note: This playbook does not upgrade forwarders, although you could easily add an extra play to do that.

Extended Documentation

This section contains additional reference documentation.

Task File Descriptions

add_crashlog_script.yml - Installs a bash script and cron job that will automatically clean-up splunkd crash log files. By default, every night at midnight, it will find any crash logs that are more than 7 days old and will delete them. You may change how many days of crash logs are retained by editing the cleanup_crashlogs.sh.j2 template.
add_diag_script.yml - Installs a bash script and cron job that will automatically clean-up splunk diag files. By default, every night at midnight, it will find any diags that are more than 30 days old and will delete them. You may change how many days of splunk diags are retained by editing the cleanup_diags.sh.j2 template.
add_pstack_script.yml - Copies the genpstacks.sh script to $SPLUNK_HOME/genpstacks.sh. This file is useful to have on all of your Splunk servers for when Splunk Support asks you to capture pstacks.

Note: Any task with an adhoc prefix means that it can be used independently as a deployment_task in a playbook. You can use the tasks to resolve various Splunk problems or perform one-time activities, such as decommissioning an indexer from an indexer cluster.

adhoc_clean_dispatch.yml - This task is intended to be used for restoring service to search heads should the dispatch directory become full. You should not need to use this task in a healthy environment, but it is at your disposal should the need arise. The task will stop splunk, remove all files in the dispatch directory, and then start splunk.
adhoc_configure_hostname - Configure a Splunk server's hostname using the value from inventory_hostname. It configures the system hostname, serverName in server.conf and host in inputs.conf. All Splunk configuration changes are made using the ini_file module, which will preserve any other existing configurations that may exist in server.conf and/or inputs.conf.
adhoc_decom_indexer.yml - Executes a splunk offline --enforce-counts command. This is useful when decommissioning one or more indexers from an indexer cluster.
adhoc_fix_mongo.yml - Use when Splunk is in a stopped state to fix mongodb/kvstore issues. This task ensures that permissions are set correctly on mongo's splunk.key file and deletes mongod.lock if it exists.
adhoc_fix_server_certificate.yml - Use to delete an expired server.pem and generate a new one (default certs). Useful if your server.pem certificate has expired and you are using Splunk's default certificate for splunkd. Note that default certificates present a security risk and that their use should be avoided, if possible.
adhoc_kill_splunkd.yml - Some releases of Splunk have a "feature" that leaves zombie splunkd processes after a 'splunk stop'. Use this task after a 'splunk stop' to make sure that it's really stopped. Useful for upgrades on some of the 7.x releases, and automatically called by the upgrade_splunk.yml task.
check_splunk.yml - Check if Splunk is installed. If Splunk is not installed, it will be installed on the host. If Splunk is already installed, the task will execute a "splunk version" command on the host, and then compare the version and build number of Splunk to the version and build number of the expected version of Splunk. Note that the expected version of Splunk does not need to be statically defined; The expected Splunk version and build are automatically extracted from the value of splunk_package_url_full or splunk_package_url_uf using Jinja regex filters. This task will work for both the Universal Forwarder and full Splunk Enterprise packages. You define which host uses what package by organizing it under the appropriate group ('full' or 'uf') in your Ansible inventory.
check_decrypted_secret.yml - Check the decrypted value of a given pass4SymmKey. This can be called by a task to compare the desired value with the currently configured value to see if they match. This pervents unnessecary changes to be applied.
configure_apps.yml - This task should be called directly from a playbook in order to deploy apps or configurations (from git repositories) to Splunk hosts. Tip: Add a this task to a playbook after the check_splunk.yml play. Doing so will perform a "install (or upgrade) and deploy apps" run, all in one playbook.
configure_auditd.yml - Configure auditd filtering rules to exclude splunk launched executables. Disabled by default, but can be enabled by setting splunk_auditd_configure to true.
configure_authentication.yml - Uses the template identified by the splunk_authenticationconf variable to install an authentication.conf file to $SPLUNK_HOME/etc/system/local/authentication.conf. We are including this task here since Ansible is able to securely deploy an authentication.conf configuration by using ansible-vault to encrypt sensitive values such as the value of the ad_bind_password variable. Note: If you are using a common splunk.secret file, you can omit this task and instead use configure_apps.yml to deploy an authentication.conf file from a Git repository containing an authentication.conf app with pre-hashed credentials.
configure_bash.yml - Configures bashrc and bash_profile files for the splunk user. Please note that the templates included with this role will overwrite any existing files for the splunk user (if they exist). The templates will define a custom PS1 at the bash prompt, configure the $SPLUNK_HOME environment variable so that you can issue "splunk " without specifying the full path to the Splunk binary, and will enable auto-completion of Splunk CLI commands in bash.
configure_deploymentclient.yml - Generates a new deploymentclient.conf file from the deploymentclient.conf.j2 template and installs it to $SPLUNK_HOME/etc/system/local/deploymentclient.conf. This task is included automatically during new installations when values have been configured for the clientName and splunk_uri_ds variables.
configure_dmc.yml - Configures the DMC as an Indexer Peer in SH mode, adds hosts to the host as search peers, and configures the host MC in auto mode
configure_facl.yml - Configure file system access control lists (FACLs) to allow the splunk user to read /var/log files and add the splunk user's group to /etc/audit/auditd.conf to read /var/log/audit/ directory. This allows the splunk user to read privileged files from a non-privileged system account. Note: This task is performed automatically during new installations when splunk is installed as a non-root user.
configure_idxc_manager.yml - Configures a Splunk host to act as a manager node using splunk_idxc_rf, splunk_idxc_sf, splunk_idxc_key, and splunk_idxc_label.
configure_idxc_member.yml - Configures a Splunk host as an indexer cluster member using splunk_uri_cm, splunk_idxc_rep_port, and splunk_idxc_key.
configure_idxc_sh.yml - Configures a search head to join an existing indexer cluster using splunk_uri_cm and splunk_idxc_key.
configure_license.yml - Configure the license group to the splunk_license_group variable defined. Default is Trial. Available values are "Trial, Free, Enterprise, Forwarder, Manager or Peer. If set to Peer, the splunk_uri_lm must be defined. Note: This could also be accomplished using configure_apps.yml with a git repository.
configure_os.yml - Increases ulimits for the splunk user and disables Transparent Huge Pages (THP) per Splunk implementation best practices.
configure_serverclass.yml - Generates a new serverclass.conf file from the serverclass.conf.j2 template and installs it to $SPLUNK_HOME/etc/system/local/serverclass.conf.
configure_shc_captain.yml - Perform a bootstrap shcluster-captain using the server list provided in splunk_shc_uri_list.
configure_shc_deployer.yml - Configures a Splunk host to act as a search head deployer by configuring the pass4SymmKey contained in splunk_shc_key and the shcluster_label contained in splunk_shc_label.
configure_shc_members.yml - Initializes search head clustering on Splunk hosts that will be participating in a new search head cluster. Relies on the values of: splunk_shc_key, splunk_shc_label, splunk_shc_deployer, splunk_shc_rf, splunk_shc_rep_port, splunkd_port, splunk_admin_username, and splunk_admin_password. Be sure to review the default values for the role for these and configure them appropriately in your group_vars.
configure_splunk_forwarder_meta.yml - Configures a new indexed field called splunk_forwarder and sets its default value to the value of ansible_hostname. Note that you will need to install a fields.conf on your search head(s) if you wish to use this custom indexed field.
configure_splunk_boot.yml - Used during installation to automatically configure splunk boot-start to the desired state. This task can also be used to enable boot-start on an existing host that does not have it enabled, or to switch from init.d to systemd, or vice-versa. The desired boot-start method is determined using the boolean value of splunk_use_initd (true=initd, false=systemd). In addition it is also possible for splunk to create a polkit rule, if using systemd, that allows the splunk_nix_user to managed the splunk service without authentication. You may also set the systemd_unit_full or the systemd_unit_uf variables to customize the service name systemd will use.
configure_splunk_secret.yml - Configures a common splunk.secret file from the files/authentication/splunk.secret so that pre-hashed passwords can be securely deployed. Note that changing splunk.secret will require re-encryption of any passwords that were encrypted using the previous splunk.secret since Splunk will no longer be able to decrypt them successfully.
configure_systemd.yml - Updates Splunk's systemd file using best practices and tips from the community. Also allows Splunk to start successfully using systemd after an upgrade without the need to run splunk ftr --accept-license.
configure_thp.yml - Installs a new systemd service (disable-thp) that disables THP for RedHat|CentOS systems 6.0+. This task is automatically called by the configure_os.yml task. Optionally, you can set use_tuned_thp to configure THP via tuned instead of a service. Default is false. Mote: Make sure your host does not require a specific tuned profile before applying this one.
download_and_unarchive.yml - Downloads the appropriate Splunk package using splunk_package_url (derived automatically from the values of splunk_package_url_full or splunk_package_url_uf variables). The package is then installed to splunk_install_path (derived automatically in main.yml using the splunk_install_path and the host's membership of either a uf or full group in the inventory).
You can set if the download/unarchive process uses the Ansible host or if each host downloads and unarchives the package individually by setting splunk_download_local.
Default is true which will download the package to the Ansible host once and unarchive to each host from there.
If set to false the package will be downloaded and unarchived to each host individually. Immediately after unarchive the package will be removed from the host.
install_apps.yml - Do not call install_apps.yml directly! Use configure_apps.yml - Called by configure_apps.yml to perform app installation on the Splunk host.
install_splunk.yml - Do not call install_splunk.yml directly! Use check_splunk.yml - Called by check_splunk.yml to install/upgrade Splunk and Splunk Universal Forwarders, as well as perform any initial configurations. This task is called by check_splunk.yml when the check determines that Splunk is not currently installed. This task will create the splunk user and splunk group, configure the bash profile for the splunk user (by calling configure_bash.yml), configure THP and ulimits (by calling configure_os.ym), download and install the appropriate Splunk package (by calling download_and_unarchive.yml), configure a common splunk.secret (by calling configure_splunk_secret.yml, if configure_secret is defined), create a deploymentclient.conf file with the splunk_ds_uri and clientName (by calling configure_deploymentclient.yml, if clientName is defined), install a user-seed.conf with a prehashed admin password (if used_seed is defined), and will then call the post_install.yml task. See post_install.yml entry for details on post-installation tasks.
install_utilities.yml - Installs Linux packages that are useful for troubleshooting Splunk-related issues when install_utilities: true and linux_packages is defined with a list of packages to install.
configure_dmesg.yml - Some distros restrict access to read dmesg for non-root users. This allows the splunk user to run the dmesg command. Defaults to false.
main.yml - This is the main task that will always be called when executing this role. This task sets the appropriate variables for full vs uf packages, sends a Slack notification about the play if the slack_token and slack_channel are defined, checks the current boot-start configuration to determine if it's in the expected state, and then includes the task from the role to execute against, as defined by the value of the deployment_task variable. The deployment_task variable should be defined in your playbook(s). Refer to the included example playbooks to see this in action.
post_install.yml - Executes post-installation tasks. Performs a touch on the .ui_login file which disables the first-time login prompt to change your password, ensures that splunk_home is owned by the correct user and group, and optionally configures three scripts to: cleanup crash logs and old diags (by calling add_crashlog_script.yml and add_diag_script.yml, respectively), and a pstack generation shell script for troubleshooting purposes (by calling add_pstack_script.yml). This task will install various Linux troubleshooting utilities (by calling install_utilities.yml) when install_utilities: true.
set_maintenance_mode.yml - Enables or disables maintenance mode on a cluster manager. Intended to be called by playbooks for indexer cluster upgrades/maintenance. Requires the state variable to be defined. Valid values: enabled, disabled
set_upgrade_state.yml - Executes a splunk upgrade-{{ peer_state }} cluster-peers command on the cluster manager. This task can be used for upgrading indexer clusters with new minor and maintenance releases of Splunk (assuming you are at Splunk v7.1.0 or higher). Refer to https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Searchablerollingupgrade for more information.
splunk_offline.yml - Runs a splunk offline CLI command. Useful for bringing down indexers non-intrusively by allowing searches to complete before stopping splunk.
splunk_restart.yml - Restarts splunk via the service module. Used when waiting for a handler to run at the end of the play would be inappropriate.
splunk_start.yml - Starts splunk via the service module. Used when waiting for a handler to run at the end of the play would be inappropriate.
splunk_stop.yml - Stops splunk via the service module. Used when waiting for a handler to run at the end of the play would be inappropriate.
upgrade_splunk.yml - Do not call upgrade_splunk.yml directly! Use check_splunk.yml - Called by check_splunk.yml. Performs an upgrade of an existing splunk installation. Configures .bash_profile and .bashrc for splunk user (by calling configure_bash.yml), disables THP and increases ulimits (by calling configure_os.yml), kills any stale splunkd processes present when splunk_force_kill is set to True (by calling adhoc_kill_splunkd.yml). Note: You should NOT run the upgrade_splunk.yml task directly from a playbook. check_splunk.yml will call upgrade_splunk.yml if it determines that an upgrade is needed; It will then download and unarchive the new version of Splunk (by calling download_and_unarchive.yml), ensure that mongod is in a good stopped state (by calling adhoc_fix_mongo.yml), and will then perform post-installation tasks using the post_install.yml task.

Frequently Asked Questions

Q: What is the difference between this and splunk-ansible?

A: The splunk-ansible project was built for the docker-splunk project, which is a completely different use case. The way that docker-splunk works is by spinning-up an image that already has splunk-ansible inside of it, and then any arguments provided to Docker are passed into splunk-ansible so that it can run locally inside of the container to install and configure Splunk there. While it's a cool use case, we didn't feel that splunk-ansible met our needs as Splunk administrators to manage production Splunk deployments, so we wrote our own.

Q: When using configure_apps.yml, the play fails on the synchronize module. What gives?

A: This is due to a known Ansible bug related to password-based authentication. To workaround this issue, use a key pair for SSH authentication instead by setting the ansible_user and ansible_ssh_private_key_file variables.

Support

If you have questions or need support, you can:

Use the GitHub issue tracker to submit bugs or request features.
Post a question to Splunk Answers.
Join the #ansible channel on Splunk-Usergroups Slack.
Please do not file cases in the Splunk support portal related to this project, as they will not be able to help you.

License

Distributed under the terms of the Apache 2.0 license, ansible-role-for-splunk is free and open-source software.

ansible-role-for-splunk's People

Contributors

Stargazers

Watchers

Forkers

chuckf lindeskar dimarra egadsthefuzz habeebk chskm hodgegoblin nono-square layamba25 swipswaps angelrengifo dolbyjoab kbwin alasdairb jewnix schneewe lmnogues chrisbalmer xtreme-spluning vpegg-zz ananth-kumar12 mason-splunk midnight5ky bayeslearner ww9rivers horacejin0903 huaj-splunk megatronga mojito2110 manilynramos mbeacom selvamkamalesan pvignesh058 jamsgrove metaevolution drggfish mycodegalaxy kramern kentaage reshmaks michaelchristopherson splunk-app-and-ta-development zyphermonkey osaretin23 neusse isabella232 zcukb jonscruggs kiritobit stlccserverteam mahirchavda natecrisler red667 urus341 sbcyber01 hassj technimad xenos76 ssclbb drutstein don-splunk masonsmorales 0rse arcsector rbrisseyii hampusstrom jogalt guub jeffreyflax4 sheikhmunawar bpelaia genuins rad-mixen vatsaljagani nerdynicholas jprondak mathiznogoud lowell80 nockstarr jimodonald dennybdointhngs antoniocarvalhofilho ghas-results stoobertb sethswansoncms donatienkt muizkay thatdevopskid vivibrag15 kone267 chrishills463 bsonposh meda-cyber prymalinstynct ekooiker duytrinh-boop dataoverlord19 denney-tech jbarnes

ansible-role-for-splunk's Issues

splunk_use_initd is boolean but conditionals are comparing it as a string

https://github.com/splunk/ansible-role-for-splunk/search?q=splunk_use_initd

Error: Enable splunk boot-start via systemd failing

The configure_splunk_boot.yml playbook fails with the error below for a new VM:

fatal: [lm-host]: FAILED! => {"changed": true, "cmd": ["/srv/splunk/splunk/bin/splunk", "enable", "boot-start", "-user", "splunk", "-systemd-managed", "1", "--answer-yes", "--auto-ports", "--no-prompt", "--accept-license"], "delta": "0:00:00.890923", "end": "2021-10-06 15:42:54.740175", "msg": "non-zero return code", "rc": 4, "start": "2021-10-06 15:42:53.849252", "stderr": "Copying '/srv/splunk/splunk/etc/openldap/ldap.conf.default' to '/srv/splunk/splunk/etc/openldap/ldap.conf'.\nbtool returned something in stderr: 'Couldn't open \"/srv/splunk/splunk/bin/../etc/splunk-launch.conf\": Permission denied\nCouldn't open \"/srv/splunk/splunk/bin/../etc/splunk-launch.conf\": Permission denied\nCouldn't open \"/srv/splunk/splunk/etc/splunk-launch.conf\": Permission denied\ncouldn't run \"(null)/bin/splunkd\" \"btool\": Permission denied\n'\nCouldn't open \"/srv/splunk/splunk/bin/../etc/splunk-launch.conf\": Permission denied\nCouldn't open \"/srv/splunk/splunk/bin/../etc/splunk-launch.conf\": Permission denied\nCouldn't open \"/srv/splunk/splunk/etc/splunk-launch.conf\": Permission denied\n\n\nAn error occurred: Could not create audit keys (returned 8).\nFirst-time run failed!", "stderr_lines": ["Copying '/srv/splunk/splunk/etc/openldap/ldap.conf.default' to '/srv/splunk/splunk/etc/openldap/ldap.conf'.", "btool returned something in stderr: 'Couldn't open \"/srv/splunk/splunk/bin/../etc/splunk-launch.conf\": Permission denied", "Couldn't open \"/srv/splunk/splunk/bin/../etc/splunk-launch.conf\": Permission denied", "Couldn't open \"/srv/splunk/splunk/etc/splunk-launch.conf\": Permission denied", "couldn't run \"(null)/bin/splunkd\" \"btool\": Permission denied", "'", "Couldn't open \"/srv/splunk/splunk/bin/../etc/splunk-launch.conf\": Permission denied", "Couldn't open \"/srv/splunk/splunk/bin/../etc/splunk-launch.conf\": Permission denied", "Couldn't open \"/srv/splunk/splunk/etc/splunk-launch.conf\": Permission denied", "", "", "An error occurred: Could not create audit keys (returned 8).", "First-time run failed!"], "stdout": "ERROR: Couldn't read \"/srv/splunk/splunk/etc/splunk-launch.conf\" -- maybe $SPLUNK_HOME or $SPLUNK_ETC is set wrong?\n\nThis appears to be your first time running this version of Splunk.", "stdout_lines": ["ERROR: Couldn't read \"/srv/splunk/splunk/etc/splunk-launch.conf\" -- maybe $SPLUNK_HOME or $SPLUNK_ETC is set wrong?", "", "This appears to be your first time running this version of Splunk."]}

Any thoughts on what the problem or how it can be fixed?

Enhancement: Deploy an indexer cluster

https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Clusterdeploymentoverview

Upgrade disables bootstart and does not enable after upgrade.

When running splunk_upgrade_full_stack.yml the task upgrade_splunk.yml disables bootstart and states it will be re-enabled by a handler. There is no handler that re-enables bootstart.

The task download_and_unarchive.yml notifies the "start splunk" handler after unarchiving the Splunk upgrade. ~~However, after starting Splunk daemon you are unable to enable bootstart.~~

In looking at the install_splunk.yml task there is a call to configure_splunk_boot.yml.

Enhancement: Support custom port number for splunkd at install and disable splunkd at install

Feature request

New var splunkd_port was added for SHC config in v1.1.1. This should also be customizable during the installation process if the user wants to install splunk using a non-default port.
web.conf

[settings]
mgmtHostPort = 0.0.0.0:8090

Option to disable splunkd port at install (allow only for uf, hf)
server.conf

[httpServer]
disableDefaultPort = true

Feature: Add support for installing Splunk UF on Windows platform

useradd appears to fail on CentOS 7 - "useradd: invalid home directory 'auto_determined'"

ansible-role-for-splunk fails to install Splunk for me on a CentOS 7 box with the error: ""

I'm using Vagrant to spin up the CentOS 7 box, so should be relatively easy to repro.

Vagrant.configure("2") do |config|

  config.vm.box = "centos/7"
  
  # config.ssh.insert_key = false
  # config.ssh.private_key_path = "/home/graeme/.ssh/id_rsa"

  config.vm.network "public_network", ip: "192.168.0.100"
  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "./ansible-role-for-splunk-backup/playbooks/splunk_install_or_upgrade.yml"
    ansible.extra_vars = "./ansible-role-for-splunk-backup/environments/development/group_vars/all.yml"
    ansible.inventory_path = "/etc/ansible/hosts"
    ansible.limit = "all"
    end
end

environments\development\inventory.yml

---
all:
  children:

    # uf: # Note that you can nest additional groups under here to use with group_vars
    #   hosts:
    #     my_first_uf:
    #     my_second_uf:

    full:
      children:
        # hf:
        #   hosts:
        #     some_hf:

        search:
          hosts:
            192.168.0.100

environments\development\group_vars\all.yml

---
splunk_uri_lm: https://192.168.0.100:8089
ansible_user: vagrant
ansible_ssh_private_key_file: /home/graeme/vagrant/splunk/.vagrant/machines/default/virtualbox/private_key
git_server: ssh://git@mygithost:1234
git_key: ~/.ssh/my-git-key
splunk_admin_username: administrator
splunk_admin_password: testinglol

Terminal output extract:

TASK [../roles/splunk : Check if Splunk is installed] ******************************************************************
ok: [192.168.0.100]

TASK [../roles/splunk : Install Splunk if not installed] ***************************************************************
included: /home/graeme/vagrant/splunk/ansible-role-for-splunk-backup/roles/splunk/tasks/install_splunk.yml for 192.168.0.100

TASK [../roles/splunk : Add nix splunk group] **************************************************************************
ok: [192.168.0.100]

TASK [../roles/splunk : Add nix splunk user] ***************************************************************************
fatal: [192.168.0.100]: FAILED! => {"changed": false, "msg": "useradd: invalid home directory 'auto_determined'\n", "name": "splunk", "rc": 3}

PLAY RECAP *************************************************************************************************************
192.168.0.100              : ok=8    changed=0    unreachable=0    failed=1    skipped=10   rescued=0    ignored=0

Detailed extract from the debug output:

<192.168.0.100> (1, b'\r\n{"msg": "useradd: invalid home directory \'auto_determined\'\\n", "failed": true, "rc": 3, "name": "splunk", "invocation": {"module_args": {"comment": null, "ssh_key_bits": 0, "update_password": "always", "non_unique": false, "force": false, "skeleton": null, "create_home": true, "password_lock": null, "ssh_key_passphrase": null, "uid": null, "home": "auto_determined", "append": true, "ssh_key_type": "rsa", "ssh_key_comment": "ansible-generated on localhost.localdomain", "group": null, "system": false, "state": "present", "role": null, "hidden": null, "local": null, "authorization": null, "profile": null, "shell": "/bin/bash", "expires": null, "ssh_key_file": null, "groups": ["splunk"], "move_home": false, "password": null, "name": "splunk", "seuser": null, "remove": false, "login_class": null, "generate_ssh_key": null}}}\r\n', b'Shared connection to 192.168.0.100 closed.\r\n')
<192.168.0.100> Failed to connect to the host via ssh: Shared connection to 192.168.0.100 closed.
<192.168.0.100> ESTABLISH SSH CONNECTION FOR USER: vagrant
<192.168.0.100> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="/home/graeme/vagrant/splunk/.vagrant/machines/default/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="vagrant"' -o ConnectTimeout=10 -o ControlPath=/home/graeme/.ansible/cp/095f9ea864 192.168.0.100 '/bin/sh -c '"'"'rm -f -r /home/vagrant/.ansible/tmp/ansible-tmp-1629794650.6224759-196759785565184/ > /dev/null 2>&1 && sleep 0'"'"''
<192.168.0.100> (0, b'', b'')
fatal: [192.168.0.100]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "append": true,
            "authorization": null,
            "comment": null,
            "create_home": true,
            "expires": null,
            "force": false,
            "generate_ssh_key": null,
            "group": null,
            "groups": [
                "splunk"
            ],
            "hidden": null,
            "home": "auto_determined",
            "local": null,
            "login_class": null,
            "move_home": false,
            "name": "splunk",
            "non_unique": false,
            "password": null,
            "password_lock": null,
            "profile": null,
            "remove": false,
            "role": null,
            "seuser": null,
            "shell": "/bin/bash",
            "skeleton": null,
            "ssh_key_bits": 0,
            "ssh_key_comment": "ansible-generated on localhost.localdomain",
            "ssh_key_file": null,
            "ssh_key_passphrase": null,
            "ssh_key_type": "rsa",
            "state": "present",
            "system": false,
            "uid": null,
            "update_password": "always"
        }
    },
    "msg": "useradd: invalid home directory 'auto_determined'\n",
    "name": "splunk",
    "rc": 3
}

PLAY RECAP *************************************************************************************************************
192.168.0.100              : ok=8    changed=0    unreachable=0    failed=1    skipped=10   rescued=0    ignored=0

Enhancement: Deploy a search head cluster

https://docs.splunk.com/Documentation/Splunk/latest/DistSearch/SHCconfigurationoverview

Enhancement: Add support for splunk.secret to be passed as a variable

I'd like to be able to pass the splunk.secret as a variable instead of a file so it can be pulled from a secret storage (i.e. HashiCorp Vault, 1Password, etc).

Bug: Slight tab issue with roles/splunk/meta/main.yml (breaking commit)

In commit 94177ac, the following lines (among others) were added to roles/splunk/meta/main.yml:

  platforms:
    - name: EL
      versions:
        - 6
        - 7
    - name: Ubuntu
      versions:
        - xenial
        - bionic
      - name: Debian
        versions:
        - jessie
        - stretch

The following lines have two extra spaces that need deleted:

      - name: Debian
        versions:

This is the error when attempting to run a playbook without deleting the extra lines:

/etc/ansible/ansible-role-for-splunk/playbooks$ ansible-playbook splunk_install_or_upgrade.yml -C
ERROR! Syntax Error while loading YAML.
  did not find expected key

The error appears to be in '/etc/ansible/ansible-role-for-splunk/roles/splunk/meta/main.yml': line 19, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

        - bionic
      - name: Debian
      ^ here

Thank you for all of your hard work @mason-splunk ! I really appreciate it, and I'm sure many others do too.

Enhancement: Add support for customizing the service name for UF

Could we let the user custom the service name when enabling the systemd ?
Currently, it hardcode to "SplunkForwarder" as default.
Just need to add "-systemd-unit-file-name " option in "Enable splunk boot-start via systemd" task (configure_splunk_boot.yml) and expose a variable in main.yml and pass it down to configure_splunk_boot.yml

Unrelated question - Ansible Collection

I already posted this in splunk/splunk-ansible#644 too but didn't got an answer yet so I try it here to (sorry for double posting).

Ahoi there,

as we currently are on our journey to implement ITSI, I had a look into if and what we can manage with Ansible. And "sadly" I just found this (what isn't what we need) and ansible-collection/splunk.es (https://github.com/ansible-collections/splunk.es) which is also not what we want ... BUT: as I am already somewhat experienced with collections (done some things on ansible-collection/grafana and zabbix) I thought it would be awesome to kick of an splunk.itsi collection. Especially as the splunk.es already did a great job on the base (httpapi).
I already have started with base_service_template.

I know this is not directly on this Repo but I hope maybe one of you can point me to the correct contacts to get this started?
The other way would be that I poke the ansible-collection community leads to open a new community collection for me - I just wanted to check back here first.

Additionally I want to add this here too: as an customer it is confusion to see two "official" approaches which may or may not do the same without clarifying WHAT they do or don't. What I would like to see are official Ansible Collections with the appropriate roles and plays and modules and what-not. This way you could even take the Versioning more directly by publishing Version appropriate Collection releases.

Bug: 'changed_when' is not a valid attribute for a TaskInclude in roles/splunk/tasks/configure_splunk_boot.yml'

Minor mistake in configure_splunk_boot.yml.
We should remove the changed_when attribute.

    - name: Check if Splunk needs to be stopped if boot-start isn't configured as Ansible expects (or boot-start is not configured at all)
      include_tasks: check_splunk_status.yml
      when: >
        ((systemd_boot and splunk_use_initd) or
        (initd_boot.stat.exists and not splunk_use_initd) or
        (not systemd_boot and not initd_boot.stat.exists and not splunk_use_initd)

Enhancement: Resolve ansible-lint errors

Linting role splunk via ansible-lint... 
roles/splunk/handlers/main.yml:24: [E305] Use shell only when shell functionality is required 
roles/splunk/handlers/main.yml:29: [E305] Use shell only when shell functionality is required 
roles/splunk/handlers/main.yml:36: [E305] Use shell only when shell functionality is required 
roles/splunk/handlers/main.yml:43: [E305] Use shell only when shell functionality is required 
roles/splunk/handlers/main.yml:50: [E303] systemctl used in place of systemd module 
roles/splunk/handlers/main.yml:50: [E305] Use shell only when shell functionality is required 
roles/splunk/handlers/main.yml:55: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/adhoc_clean_dispatch.yml:5: [E301] Commands should not change things if nothing needs doing 
roles/splunk/tasks/adhoc_clean_dispatch.yml:5: [E302] rm used in place of argument state=absent to file module 
roles/splunk/tasks/adhoc_decom_indexer.yml:2: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/adhoc_fix_server_certificate.yml:4: [E301] Commands should not change things if nothing needs doing 
roles/splunk/tasks/adhoc_fix_server_certificate.yml:4: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/adhoc_kill_splunkd.yml:2: [E301] Commands should not change things if nothing needs doing 
roles/splunk/tasks/adhoc_kill_splunkd.yml:2: [E306] Shells that use pipes should set the pipefail option 
roles/splunk/tasks/check_splunk.yml:12: [E601] Don't compare to literal True/False 
roles/splunk/tasks/check_splunk.yml:21: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/check_splunk.yml:37: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/check_splunk.yml:54: [E601] Don't compare to literal True/False 
roles/splunk/tasks/configure_apps.yml:43: [E504] Do not use 'local_action', use 'delegate_to: localhost' 
roles/splunk/tasks/configure_bash.yml:4: [E206] Variables should have spaces before and after: {{ var_name }} 
roles/splunk/tasks/configure_splunk_boot.yml:2: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/configure_splunk_boot.yml:9: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/download_and_unarchive.yml:12: [E504] Do not use 'local_action', use 'delegate_to: localhost' 
roles/splunk/tasks/install_apps.yml:34: [E204] Lines should be no longer than 160 chars 
roles/splunk/tasks/install_apps.yml:61: [E601] Don't compare to literal True/False 
roles/splunk/tasks/install_splunk.yml:46: [E201] Trailing whitespace 
roles/splunk/tasks/install_splunk.yml:52: [E201] Trailing whitespace 
roles/splunk/tasks/install_utilities.yml:3: [E403] Package installs should not use latest 
roles/splunk/tasks/set_maintenance_mode.yml:2: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/set_maintenance_mode.yml:5: [E206] Variables should have spaces before and after: {{ var_name }} 
roles/splunk/tasks/set_upgrade_state.yml:2: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/set_upgrade_state.yml:5: [E206] Variables should have spaces before and after: {{ var_name }} 
roles/splunk/tasks/slack_messenger.yml:4: [E504] Do not use 'local_action', use 'delegate_to: localhost' 
roles/splunk/tasks/slack_messenger.yml:12: [E204] Lines should be no longer than 160 chars 
roles/splunk/tasks/splunk_offline.yml:2: [E301] Commands should not change things if nothing needs doing 
roles/splunk/tasks/splunk_offline.yml:2: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/splunk_restart.yml:2: [E301] Commands should not change things if nothing needs doing 
roles/splunk/tasks/splunk_restart.yml:2: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/splunk_start.yml:2: [E301] Commands should not change things if nothing needs doing 
roles/splunk/tasks/splunk_start.yml:2: [E305] Use shell only when shell functionality is required 
roles/splunk/tasks/splunk_stop.yml:2: [E301] Commands should not change things if nothing needs doing 
roles/splunk/tasks/splunk_stop.yml:2: [E305] Use shell only when shell functionality is required

Enhancement: Change splunk status checks to return code

The stdout text from the splunk status command could change in the future. Changing it to return code will be more future proof:

[root@ip-10-202-21-127 ~]# /opt/splunkforwarder/bin/splunk status
splunkd is not running.
[root@ip-10-202-21-127 ~]# echo $?
3

[root@ip-10-202-21-127 ~]# /opt/splunkforwarder/bin/splunk status
splunkd is running (PID: 31021).
splunk helpers are running (PIDs: 31027).
[root@ip-10-202-21-127 ~]# echo $?
0

Swap 'undefined' for '~' to denote initial null values

Use <varname> is defined instead of checking for a literal string value of 'undefined'. Functionally it works the same, but it's a little cleaner.

Fix hardcoded splunk home path in adhoc_fix_server_certificate.yml

Please could you replace the code below :

"{{ splunk_home }}/bin/splunk createssl server-cert -d /opt/splunk/etc/auth -n server.pem"

With

"{{ splunk_home }}/bin/splunk createssl server-cert -d {{splunk_home }}/etc/auth -n server.pem"

it should be more flexible.
Thanks.

Enhancement: Use ini_file instead of a template for deploymentclient.conf and extend functionality

Use ini_file for deploymentclient.conf config instead of template to avoid overwriting potential existing files/other settings
Add support for phoneHomeIntervalInSecs in deploymentclient.conf
Add support for pass4symmkey under the deployment stanza of server.conf

Enhancement: Move splunk.secret to a variable

Use case: Customer may want to use multiple splunk.secret files in their environment (e.g. one for UFs, another for indexers, etc.). Currently it's hardcoded to copy a file named splunk.secret from the role's files directory. Convert to a variable so that multiple files can be used/assigned via group_vars.

Enhancement: Make downloading of packages and apps configurable to be local vs remote

This is somewhat related to #37

Need to add support for:

Downloading of Splunk packages on remote host
Downloading of apps/addons on remote host

Currently everything is downloaded to the local ansible host, and then rsync'd over. However, this should be configurable.

Reset keys when they change or Splunk secret changes

At the moment changing Splunk.secret will break everything and rerunning ansible playbook will always force configs to be updated and splunk restarted even though the password values are exactly the same.

Should update the tasks related to passwords to do the following

Run a shell command that

runs btool to get the value for the key
if value is empty exit 1
elif value starts with $ use show-decrypted command to decrypt it and return value
else return value
register the value into pass_result

Run a ini_file

run only when pass_result not same as password in inventory
set the password

This should auto update password if it or Splunk.secret has changed.

Running configure_shc_captain.yml multiple times yields different results

Ansible tasks should be idempotent. If the search head cluster already had a captain this task should be able to figure that out and skip.

Bug: Idempotency for user creation .bashrc/.bash_profile

Two tasks report changes even when the files have not actually changed. Investigate cause of this and remediate to ensure role is idempotent.

splunk user creation
configuration of .bashrc and .bash_profile

UF Service Restart

When installing the UF or an app to it the roles notifies the restart splunk handler. However, since the service is called splunkforwarder rather than Splunkd the restart fails.

I suggest we move the service name to a variable based on the group membership.

Bug: Splunk does an extra restart during installation when splunk_user_seed: true

Remove handler that restarts Splunk at https://github.com/splunk/ansible-role-for-splunk/blob/master/roles/splunk/tasks/configure_user-seed.yml#L15

overriding splunk_nix_user with initd do not creates ulimit with the correct user

The tasks configure_os.yml only copy the files splunk_ulimits.conf without using the splunk_nix_user to replace the user set in the files.

The results is that the ulimits are set for a user splunk that might not exists

auditd restart does not work as expected in non-RedHat Linux distributions

By running service auditd condrestart to restart the auditd service is causing the task to fail on distributions other than CentOS and RHEL

example yaml playbooks do not pass ansible lint in vscode.

the extra "-" in splunk_install_or_upgrade.yml for example. Also ansible supports include_role and task_from. Not sure I understand why the convoluted way to call a task from a role.

---
# Example playbook to perform either a splunk installation or upgrade
- hosts:
    - all
  roles:
    - ../roles/splunk
  serial: 50
  vars:
    - deployment_task: check_splunk.yml

Prefix group names

We need example groupname_prefix=splunk_, that, if configured will make the groupnames we need to use like splunk_uf, splunk_full and so on.

We want to use ansible-role-for-splunk in a larger ansible repository, but we are afraid of group-name collisions

Add support for setting general pass4SymmKey in server.conf

In most cases this is needed to connect to license servers.

Enhancement: Install/upgrade from RPM/deb OS packages when compatible with target OS

Hi,

thanks for publishing this ansible role!

We noticed you use the tgz files regardless whether the target OS is supported by splunk's RPM or debian packages.
Would you be open for PR that uses ansible facts for detecting the target OS and use the packages instead of old raw tar files?

We prefer using RPM packages over tgz files whenever OS packages are available.

thanks!

Splunk secret installation should happen before accepting the license

During Splunk fresh install, the splunk_license_accept.yml task is called. It creates the initial splunk.secret and the file $SPLUNK_HOME/etc/system/local/server.conf. The server.conf file has the [sslConfig] stanza and sslPassword which is encrypted with the initial splunk.secret.
The task configure_splunk_secret.yml is called afterwards which replaces the initial splunk.secret. However when splunk starts, it will be unable to decrypt the sslPassword, because it was encrypted with the initial splunk.secret, rendering Splunk install broken.

The configure_splunk_secret.yml task should be moved before the splunk_license_accept.yml task.

Feature: Renaming the Splunk groups with a prefix

I have a feature request (can make a PR too) to rename the groups in the inventory file (and tasks respectively) to a more splunk name, to prefix it with splunk_

I had a discussion with a customer and he had a good point. You can have multiple host in an inventory which are not related to splunk. As this is a role and you can have multiple of them, group names should not be that generic like "full" or "search", due to inventories are not role dependent.

Git Best Practices

First off, this is awesome!

We’re now bringing in Ansible to our environment and starting git from fresh. I dont want to screw this up so I am asking for some advice.

We have three environments, DEV, PAT and PROD and three separate repos. This is mostly because configs that make sense in DEV may not make sense in PROD. (Directories are different, servers are different etc).

What’s the best way to set up my directory structure for git? Would it be to use a single git repo per environment? Or should I maybe use a repo just for the SHC and another repo for HF?

Looking for just some general advice and best practices. Trying to set us up not to pull our hair out in the future.

Thanks!

Bug: Apply Indexer Cluster Bundle - Rolling restart in progress

When using the 'apply indexer cluster bundle' it can falsly be registered as changed while the return is actually a warning when a rolling restart of peers is in progress.

changed: [<hostname>] => {"attempts": 1, "changed": true, "cmd": "/opt/splunk/bin/splunk apply cluster-bundle --answer-yes --skip-validation -auth <creds>", "delta": "0:00:01.473747", "end": "2021-02-23 14:22:13.291236", "rc": 0, "start": "2021-02-23 14:22:11.817489", "stderr": "Cannot apply (or) validate configuration settings. Rolling restart of the peers is in progress.", "stderr_lines": ["Cannot apply (or) validate configuration settings. Rolling restart of the peers is in progress."], "stdout": "\nEncountered some errors while applying the bundle.", "stdout_lines": ["", "Encountered some errors while applying the bundle."]}

We could handle this by using changed_when on look for the string below.

Please run 'splunk show cluster-bundle-status' for checking the status of the applied bundle.
OK

Enhancement: Add P4SK and phoneHomeIntervalInSecs for UF be passed as a variable

Could we add P4SK and phoneHomeIntervalInSecs to role variable for user to config ?

Enhancement: Add support to perform rolling upgrades for shc and idx

Splunk has documentation on how to successfully perform rolling upgrades for both the search and indexing tiers.

Would it be possible to implement these into the splunk_upgrade_full_stack.yml, upgrade_splunk.yml, or potentially a brand new playbook/tasks file?

Splunk Documentation

https://docs.splunk.com/Documentation/Splunk/8.2.2/DistSearch/SHCrollingupgrade
https://docs.splunk.com/Documentation/Splunk/8.2.2/Indexer/Searchablerollingupgrade

Cleanup: Remove default systemd value

configure_systemd.yml contains a default value that should be removed:
- { option: "RemainAfterExit", value: "False" }

Ref: https://www.freedesktop.org/software/systemd/man/systemd.service.html

RemainAfterExit=
Takes a boolean value that specifies whether the service shall be considered active even when all its processes exited. Defaults to no.

systemd start fail on installation - polkit ask root password or sudo

Installing splunk on Centos , fail with
"Systemd manages the Splunk service. Use 'systemctl start Splunkd' to start the service. Root permission is required. Login as root user or use sudo."]}

The problem seems due to polkit tool using systemd: it's necessary to run /opt/splunk/bin/splunk start with sudo as reported here:
https://www.duanewaddle.com/splunk-7-2-2-and-systemd/

Using init.d work correctly

Enhancement: Verify service_ready_flag before running apply shcluster-bundle

Toi follow the best practice described here we should check the service_ready_flag before doing a shcluster-bundle apply.

The challenge we face is that the app repo is synced to the deployer while we need to call the API of one of the search heads to verify the state. Looping the 'search' host group would result in x number of queries which isn't very helpful.

For our situation we created a host group called 'cluster_captain' which holds our preferred captain. I am aware that this isn't a perfect solution but it works.

Bug: Git(lab) groups

The current setup doesn't allow the use of gitlab groups. Gitlab groups result in the following ssh string:

ssh://<user_name>@<my_instance>.gitlab.host:<group_name>/<project_name>/<repo_name>/app name>.git

In configure_app.yml the repo string is created using the different vars supplied. When adding the colon to the git_server var the git module takes it as part of the DNS name this failing to resolve it. The Ansible Git module does support the use of the colon and I have verified this works by putting the full string in there manually. I do not yet understand why it doesn't work when merging strings.

repo: "{{ item.git_server | default(git_server) }}/{{ item.git_project | default(git_project) }}/{{ item.app_name }}.git"

failed: [localhost] (item={'app_name': '<my_app>'}) => {"ansible_loop_var": "item", "changed": false, "cmd": "/usr/bin/git clone --origin origin 'ssh:********@<my_instance>.gitlab.host:<group_name>/<project_name>/<my_app>.git' /tmp/gitt", "item": {"app_name": "<my_app>"}, "msg": "Cloning into '/tmp/git'...
ssh: Could not resolve hostname <my_instance>.gitlab.host:<group_name>: Name or service not known
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.", "rc": 128, "stderr": "Cloning into '/tmp/gitt'...
ssh: Could not resolve hostname <my_instance>.gitlab.host:<group_name>: Name or service not known
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
", "stderr_lines": ["Cloning into '/tmp/git'...", "ssh: Could not resolve hostname <my_instance>.gitlab.host:<group_name>: Name or service not known", "fatal: Could not read from remote repository.", "", "Please make sure you have the correct access rights", "and the repository exists."], "stdout": "", "stdout_lines": []}

Issue : Current Systemd implementation may not support workload management per Splunk Docs Systemd configuration

Configuration of the systemd Splunkd.service is using the following values :

    - { option: "ExecStart", value: "{{ splunk_home }}/bin/splunk start --accept-license --answer-yes --no-prompt" }
    - { option: "ExecStop", value: "{{ splunk_home }}/bin/splunk start stop" }
    - { option: "ExecReload", value: "{{ splunk_home }}/bin/splunk restart" }
    - { option: "Restart", value: "on-failure" }
    - { option: "RestartSec", value: "30s" }
    - { option: "TimeoutStopSec", value: "10min" }
    - { option: "Type", value: "forking" }
    - { option: "RemainAfterExit", value: "False" }
    - { option: "User", value: "{{ splunk_nix_user }}" }
    - { option: "Group", value: "{{ splunk_nix_group }}" }
    - { option: "PIDFile", value: "{{ splunk_home }}/var/run/splunk/splunkd.pid" }
    - { option: "LimitNOFILE", value: "1024000" }
    - { option: "LimitNPROC", value: "512000" }
    - { option: "LimitFSIZE", value: "infinity" }
    - { option: "LimitDATA", value: "infinity" }
    - { option: "LimitCORE", value: "infinity" }
    - { option: "TasksMax", value: "infinity" }

when the splunk documentation is asking for :

#This unit file replaces the traditional start-up script for systemd
#configurations, and is used when enabling boot-start for Splunk on
#systemd-based Linux distributions.

[Unit]
Description=Systemd service file for Splunk, generated by 'splunk enable boot-start'
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/opt/splunk/bin/splunk _internal_launch_under_systemd
KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=360
LimitNOFILE=65536
SuccessExitStatus=51 52
RestartPreventExitStatus=51
RestartForceExitStatus=52
User=splunker
Group=splunker
Delegate=true
CPUShares=1024
MemoryLimit=<value>
PermissionsStartOnly=true
ExecStartPost=/bin/bash -c "chown -R splunker:splunker /sys/fs/cgroup/cpu/system.slice/%n"
ExecStartPost=/bin/bash -c "chown -R splunker:splunker /sys/fs/cgroup/memory/system.slice/%n"

[Install]
WantedBy=multi-user.target

The current configuration does not allow for the Workload management to be enabled via an ansible installation (at least the ExecStart and the Restart need to be changed.

I'm not a Unix expert so I don't really know the impact of this.

Enhancement: Support multiple apps within a single git repo in configure_apps.yml

On my gitlab, i've a repo name splunk_apps it contains all the apps for the same "project"!

- app_1
- app_2
- ...

May be i've missed something in my configuration but it does not seems possible to install only app_1 on a server and app_2 on a second

For me the issue lies in the task Download defined Git repos to local Ansible host

    - name: Download defined Git repos to local Ansible host
      git:
        accept_hostkey: true
        repo: "{{ item.git_server | default(git_server) }}/{{ item.git_project | default(git_project) }}/{{ item.name }}"
        version: "{{ item.git_version | default(git_version) }}"
        dest: "{{ git_local_clone_path }}{{ ansible_nodename }}/{{ item.name }}"
        key_file: "{{ git_key }}"
        force: true
      loop: "{{ git_apps }}"
      delegate_to: localhost
      changed_when: false
      check_mode: false

it's looping on the param git_apps and is expecting each of them to be a standalone git repo. For me each standalone repo should be configured by the param git_project only

Did I miss something ? May be we can add the 2 possilities with a new parameter git_multiple_app_per_repo

Enhancement: Download logic and checksums

Use Case
I want to deploy different versions of Splunk to different hosts in the same Ansible play, but not perform unnecessary downloads if there is already a local copy of the package available.

Current Implementation
download_and_unarchive.yml

- name: "Download Splunk {{ splunk_install_type }} package"
  get_url:
    url: "{{ splunk_package_url }}"
    dest: "{{ splunk_package_path }}/{{ splunk_file }}"
  delegate_to: localhost
  register: download_result
  retries: 3
  delay: 10
  until: download_result is success
  run_once: true

run_once: true would be undesirable as the play would fail if we did not have a copy of the other package already available locally. The role does not currently cleanup old tarballs post-installation from the default path of the user's home directory.

Suggested Implementation

Remove run_once: true from the download task
Before the download task, check if the desired tarball file has already been downloaded locally via stat.
If the desired tarball does not exist locally, download it.
If the desired tarball file exists locally, check for an existing local .sha512 hash file for the tarball.
If a local .sha512 file does not exist locally, download it.
Compare the SHA-512 hash of the local tarball to the expected hash in the .sha512 file.
If they do not match, remove the existing tarball file and download it again.
Compare the hash values again to ensure that they are the same.
If the hashes are the same, proceed.
If the hashes are still not the same (this should not happen) then we should fail the play.

Enhancement: Generate serverclass.conf from variables in YAML

Add serverclass.conf generation functionality and documentation.

Enhancement : Install app from Splunkbase

As a Splunk Admin with a restricted git repo size, I want to be able to automatically install application from Splunk Base instead of GIT.

Enhancement: Join search heads to an indexer cluster

We recently started using this project for our deployments.

We were able to successfully get our indexer cluster and search head cluster up and running using provided tasks, but noticed on the manager the search heads are not shown in the indexer clustering UI. Only the management node is shown under the search head.

I will happily attached our playbook and inventory if that would be helpful. It is pretty basic.

Issue : main.yml is reloading systemctl even if we use init.d

May be i've done something wrong on my box, but for me if I use init.d i should not see a trigger on systemctl daemon reload

solution trigger the reload only if we are not using initd

- name: reload systemctl daemon
  systemd:
    daemon_reload: true
  become: true
  when: not splunk_use_initd

Enhancement: Make sure splunk is running to allow indexer bundle push

When updating apps on the master (app_dest = '/etc/master-apps') we should make sure splunk is running or the 'apply indexer cluster bundle' handler will fail.

in install_apps.yml after the last handler setting (need to make sure handler has a value):
First part is unchanged, just for reference.

- name: "Set default restart splunk handler for all other paths (e.g. etc/auth)"
  set_fact:
    handler: "restart splunk"
  when:
    - app_dest != 'etc/shcluster/apps'
    - app_dest != 'etc/deployment-apps'
    - app_dest != 'etc/master-apps'

- name: make sure splunk is running when handler = apply indexer cluster bundle
  include: splunk_start.yml
  when: handler == "apply indexer cluster bundle"
  run_once: true

Bug: SHC Rolling Restart Fails

Hi Splunkers!

I have deployed a SHC using the role however I am facing a strange issue when performing a rolling restart. It will start by stopping the first node but I doesn't continue to start it.

When I manually give it a start (systemctl start Splunkd) it will continue to the next node which faces the same issue. I am not sure how to troubleshoot this issue as I can't find anyone else facing this but I am open to take any actions.

I am installing Splunk Enterprise v8.1.3