usmqe / usmqe-setup Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 6.0 545 KB

QE ansible playbooks for initial and test setup

Home Page: https://usmqe-tests.readthedocs.io/en/latest/

License: Apache License 2.0

Python 89.30% Shell 10.70%

ansible qe tendrl usmqe

usmqe-setup's People

Contributors

Stargazers

Watchers

Forkers

dahorak mkudlej mbukatov fbalak ltrilety ebondare

usmqe-setup's Issues

ceph-ansible role contains hardcoded path /root/ceph-ansible-keys

Role ceph-ansible contains hardcoded path to /root/ceph-ansible-keys, which prevent using this role by different user (jenkins or usmqe).

We have to discover, if it is possible to use this role and also whole ceph-ansible in non-root environment.

Merge qe-server and qe-server-user roles into single qe-server role

Merge qe-server and qe-server-user roles into single qe-server role using ansible playbook block feature. This feature is available since Ansible 2.3.

Move template for usm.ini file into usmqe-tests repository

We have a template file for usmqe ini file here: https://github.com/usmqe/usmqe-setup/blob/master/templates/usm.ini.j2 but this means that the defaults are duplicated in setup and tests repositories. It would be much better to convert the default example file in tests repo into the template we could reuse here in usmqe-setup.

Current defaults, examples and templates:

Related to usmqe/usmqe-tests#89

Remove Workaround: SELinux should not be disabled

Revert pull request #49
Workaround based on https://github.com/Tendrl/documentation/wiki/Tendrl-Package-Installation-Reference#selinux-configuration (commit https://github.com/Tendrl/documentation/wiki/Tendrl-Package-Installation-Reference/60ab08c13ec3240d247c71c0c6c9690ef8a47ecd)

Deprecation warning because of result|success

In some roles the syntax "result|success" is used. Ansible 2.6.0 issues a warning that this syntax is deprecated and "result is success" should be used instead.

Here is the list of roles that have that issue:

./qe-server/tasks/main.yml
./rh-python36/tasks/main.yml
./rh-python35/tasks/main.yml
./ceph-centos-repo/tasks/main.yml
./gluster-client/tasks/main.yml
./qe-munin-node/tasks/main.yml
./firewall-gluster/tasks/main.yml
./qe-evidence-probe-journald/tasks/main.yml
./qe-evidence-probe/tasks/main.yml
./epel/tasks/main.yml
./gluster-server/tasks/main.yml

Remove Workaround: switch node-agent install task with cluster creation

We should not rely to particular placement of node-agent installation within ci default playbook.

The workaround introduced in #99

Add setup of Firefox enviroment for Selenium tests

Role qe-server should install and configure Firefox/Selenium environment for web test runs. This includes new systemd service, which would be used to control this environment.

Replace meaningless gluster-centos role with one for gluster server, gluster client and so on

We have one role which installs gluster packages, but it would be better if we have multiple roles for:

gluster server machine (aka storage node)
gluster client machine (aka client)

Also note that the roles doesn't need anything else beyond installing packages and enabling services, the rest of configuration would be handled by gdeploy.

etcd expects an IP address to bind to

In a system deployed with tendrl_server.yml I see this in the logs:

Mar 29 20:32:59 tendrl etcd[3499]: expected IP in URL for binding (http://tendrl:2380)
Mar 29 20:32:59 tendrl etcd[3499]: expected IP in URL for binding (http://tendrl:2379)

I suspect that name based binding will lead to problems since there is less control. Should we not be binding to all addresses 0.0.0.0:2379 ? The only downside I could see here is security but that should be handled by tendrl api itself not through network measures.

remove installation from source code

Setup playbooks contains option for tendrl installation from source code, which could be enabled by setting install_from = source, as described in Details On Installation From Sources

Since this feature is no longer needed, we should remove it entirely, as originally planned.

remove firewall.tendrl.yml

remove firewall.tendrl.yml playbook

playbook qe_server.yml fails during installation of ceph-ansible package

We have a package conflict in qe_server playbook, which fails during installation of ceph-ansible package:

TASK [qe-server : Install ceph-ansible] **************************************************************************************************************************************************************************************************************************************************************************************
Thursday 29 June 2017  14:47:32 +0200 (0:00:00.537)       0:00:41.122 ********* 
fatal: [mbukatov.example.com]: FAILED! => {"changed": true, "failed": true, "msg": "Error: Package: ceph-ansible-2.2.10-38.g7ef908a.el7.noarch (ceph-ansible)\n           Requires: ansible >= 2.2.0.0\n           Installed: ansible-2.1.2.0-1.el7.noarch (@epel)\n               ansible = 2.1.2.0-1.el7\n", "rc": 1, "results": ["Loaded plugins: fastestmirror\nLoading mirror speeds from cached hostfile\n * base: mirror.slu.cz\n * extras: ftp.agh.edu.pl\n * updates: mirror.slu.cz\nResolving Dependencies\n--> Running transaction check\n---> Package ceph-ansible.noarch 0:2.2.10-38.g7ef908a.el7 will be installed\n--> Processing Dependency: ansible >= 2.2.0.0 for package: ceph-ansible-2.2.10-38.g7ef908a.el7.noarch\n--> Finished Dependency Resolution\n You could try using --skip-broken to work around the problem\n You could try running: rpm -Va --nofiles --nodigest\n"]}
	to retry, use: --limit @/home/martin/.ansible/retry-files/qe_server.retry

Expanded error message:

Error: Package: ceph-ansible-2.2.10-38.g7ef908a.el7.noarch (ceph-ansible)
           Requires: ansible >= 2.2.0.0
           Installed: ansible-2.1.2.0-1.el7.noarch (@epel)
               ansible = 2.1.2.0-1.el7
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.slu.cz
 * extras: ftp.agh.edu.pl
 * updates: mirror.slu.cz
Resolving Dependencies
--> Running transaction check
---> Package ceph-ansible.noarch 0:2.2.10-38.g7ef908a.el7 will be installed
--> Processing Dependency: ansible >= 2.2.0.0 for package: ceph-ansible-2.2.10-38.g7ef908a.el7.noarch
--> Finished Dependency Resolution
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

I'm running the playbook on my qe server with CentOS:

$ cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core)

Remove Workaround: update ansible

required version of ansible should be installed automatically (as a dependency for particular package)
workaround introduced as #202 and #205

tendrl_server.yml and tendrl_node.yml use root user for all operations

The option to use a non root user with "become: True" should be allowed but perhaps commented out. This gives better security and logging via the system sudo facilities.

Wrong tendrl-monitoring-integration port in firewall.tendrl.yml

Port opened for tendrl-monitoring-integration should be 8789, not 8989 (firewall.tendrl.yml):

   65   - name: Enable port for tendrl-monitoring-integration
   66     firewalld:
   67       port=8989/tcp
   68       zone=public permanent=true state=enabled immediate=true

see https://github.com/Tendrl/documentation/wiki/Tendrl-firewall-settings

Gluster Client setup may not survive reboot

When I reboot all machines (including Tendrl machine, GlusterFS servers and a client), it could happen that a client wouldn't be able to mount the the volume during boot, so that the volume is not mounted.

Since Gdeploy creates a fstab entry, running just mount /mnt/volume_usmqe_alpha_distrep_4x2/ would be enough.

There are multiple possibilities how to address this:

add extra task to mount the volume after reboot (not sure if a good idea, as it would require to be able to list all volumes)
have a playbook to do this mount on a client and run it via pytest fixture (so that a test which requires to work with data on a volume would be sure that the volume is mounted)

Add gdeploy config file to setup bitrot detection on a volume

Add gdeploy config file to enable bitrot detection on a volume.

References

https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#bitrot-detection
gluster/gdeploy#451

ansible2gdeploy - enhance hosts replacement functionality

It would be useful if it will be possible to limit or specify number of "used" hosts. I'm thinking about following two specifications:

a precise number,
maximum available multiple.

Switch gluster setup from loadtheaccumulator/ansible-gluster to gdeploy

Replace current ansbile-gluster based roles and playbooks with gdeploy config files, so that gdeploy will be the only way we would configure and setup gluster.

Reasoning

Based on recent refocus on Gluster, we need to automate much more complex Gluster configurations and to be able to do that, we need to unify Gluster setup so that there is only single way to work setup/configure gluster which could handle all possible gluster configurations we need to test with.

Gdeploy is tested and used by Gluster QE team, who verified that this approach is reasonable.

I would rather not execute gdeploy via ansible, because gdeploy it itself an ansible wrapper, so such integration would be error prone, hard to debug and would invalidate core ansible assumptions.

Details

So instead of playbooks such as:

gluster_peers_bricks.yml

we will have a gdeploy config file:

gluster_peers_bricks.conf

which will be executed via gdeploy:

# gdeploy -c gluster_peers_bricks.conf

Add gdeploy config file to setup dm-cache for bricks

Details pending.

update envidence playbook roles to match tendrl wiki

Check current evidence roles and add new checks if information required in https://github.com/Tendrl/documentation/wiki/Information-required-for-debugging-issues-on-the-Tendrl-stack are not already present.

Remove Workaround: start tendrl-monitoring-integration

Remove workaround introduced in commit 1442d95 of pull request #128.

Remove Workaround: add manually user tendrl-user

Remove workaround from pull request #107
It adds manually user tendrl-user to all hosts.
It is a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1456552#c5

Remove Workaround: tendrl-node-agent should not be explicitly stopped and started again

Remove workaround for issue described in Tendrl/api#86 (comment) introduced in commit f45cb68 of pull request #57.

Updte test config template for changes introduced in usmqe/usmqe-tests/pull/203

Based on PR usmqe/usmqe-tests/pull/203, usmqe-tests configuration will be stored in yaml files, instead of current usm.ini configuration file.
We have to update the configuration template (and related links) https://github.com/usmqe/usmqe-setup/blob/master/templates/usm.ini.j2

Install webstr in qe-server role

Role qe-server should install webstr module, which will be used by our web test code.

Remove Workaround: stop firewalld on Tendrl server and Tendrl nodes

Remove workaround introduced in commit 76e6e4f of pull request #73.

Where and how to configure and launch ceph-ansible

Now ceph-ansible is configured on qe_server, which expects to have qe_server in inventory file this way:

[qe_server]
localhost ansible_connection=local ansible_user=jenkins

This needs to be changed, and qe_server should not be listed in inventory file for particular cluster.

Remove Workaround: exclude grafana-4.6.* in tendrl_unit_test_setup.yml #168

This workaround (PR #168) is for Tendrl/monitoring-integration#319 and should be removed once the relevant issue will be fixed.

add public key fails with ssl error

When I run the tendrl_server playbook I get:
TASK [tendrl-repo : Add public key for master repo] ****************************
fatal: [tendrl]: FAILED! => {"changed": false, "failed": true, "msg": "Failed to validate the SSL certificate for copr-be.cloud.fedoraproject.org:443. Make sure your managed systems have a valid CA certificate installed. You can use validate_certs=False if you do not need to confirm the servers identity but this is unsafe and not recommended. Paths checked for this platform: /etc/ssl/certs, /etc/pki/ca-trust/extracted/pem, /etc/pki/tls/certs, /usr/share/ca-certificates/cacert.org, /etc/ansible"}

Create test setup playbook for FIPS mode

We need playbooks for enabling ~~and disabling~~ FIPS mode on RHEL 7 machines, called something like test_setup.fips.yml ~~and test_teardown.fips.yml~~. If this procedure is complicated, we can maintain the common tasks via ansible role in roles/fips.

Note that:

it's a little tricky, as it requires regeneration of initramfs and reboot
~~for a reliable teardown, we will need to make a backup of original initramfs during setup~~
before starting actual work, we may check that it's not already automated via eg. ansible-rhel7-nist-800-171-cui-role

Details

Documentation: Chapter 8. Federal Standards and Regulations

Add firewall setup for Tendrl

Add firewall setup (playbook and a role) for Tendrl based on upstream https://github.com/Tendrl/documentation/wiki/Tendrl-firewall-settings

Remove Workaround: run partprobe on ceph OSD nodes

Remove workaround for issue described in ceph/ceph-ansible#1403 introduced in commit 7c69a3d of pull request #71.

Enhance FIPS test setup

Edit /etc/sysconfig/prelink and set prelinking to no if prelink rpm is installed.

This doesn't affect our setup, as we don't have it enabled.

Test setup for testing alerting via Tendrl REST call (idea)

Create python2 script (in bin directory of this repo) which will call Tendrl TEST call for alerts in a loop, saving the response in a logfiles (identified with a timestamp), so that this can be rechecked later.

Create systemd unit file which will control the script, making it a daemon/service. This way, we will make sure that it's running.

Create test setup playbook, which will install this script on a client machine (into /usr/local/bin in a similar way how test setup for wikitarball is done).

Reasoning

We have concept of workload fixtures, which perform some workload and returns the expected values along with a time range, during which the expected values should be reported. Test case can then later ask for data from this time range, and check if it matches. The same run of a workload fixture can be reused for test cases covering alerting via snmp or smtp, but not via Tendrl API/Web - because by the time test case is running, the status has changed long time ago.

To make it possible to test Tendrl API representation of alerting information in the same way as we can do it for snmp or smtp (reusing workload fixtures without a need to rerun the workload from the test itself), I propose to create a script which will poll the Tendrl API and store the results for later inspection by the test case.

usmqe / usmqe-setup Goto Github PK

usmqe-setup's People

Contributors

Stargazers

Watchers

Forkers

usmqe-setup's Issues

References

Reasoning

Details

Details

Reasoning

Recommend Projects

Recommend Topics

Recommend Org