litmuschaos / litmus Goto Github PK

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q

Home Page: https://litmuschaos.io

License: Apache License 2.0

Makefile 0.94% Dockerfile 0.34% Go 85.01% JavaScript 0.06% Shell 0.48% SCSS 13.16%

chaos-engineering kubernetes chaos-experiments cloud-native chaoshub hacktoberfest cncf operator-sdk site-reliability-engineering golang

litmus's People

Contributors

Stargazers

Watchers

Forkers

kmova deepakjayaprakash purnima9040 amitkumardas epowell101 shubhambhattar dipanshkhandelwal utkarsh2102 testaccount2912 ibreakthecloud wallnerryan ashishranjan738 smartfreedom a4abhishek slalwani97 rake7h singhmeghna79 sagarkrsd satyamz avishnu qiell yudaykiran hombre9 0xflotus aslafy-z geethmaka vrutkovs doruboyina arunb007 tharudaya diegous mrl1605 arpt-svt vickxy andream16 prateekpandey14 chirag200666 anil-matcha dhairyapatel7 kaustumbh7 prabhu43 vharsh shyamjalan shivani-1521 venkatvani alerika y1026 utkarshmani1997 vishnuitta mittachaitu mynktl deepav16 shenry07 nsathyaseelan linus5 dargasudarshan utkarsh-devops xcke rampreeth nicaurybenitez ksatchit-org avaussant colstuwjx shenjiangbo talits linuxerwang taitd vasu-chowdary amitbhatt818 rogerhmar sushma1118 fr34k8 sakshamkatiyar anupriya0703 shendrekbharath y1027 perry-contribs erikbfeeley naveenbellary aswathkk mathagician rakeshonrediff loldevelopr gavinaf patoconnor43 shivlondon tpanthera kmjayadeep sinithh parikshit-hooda shivangkeshari ankitaugalmugale eylonronen sakshamtaneja21 wangzihao3 rahulchheda pseudonerd venoodkhatuva12 anupisdeveloper kaiiyer

litmus's Issues

Create YSCB benchmark k8s jobs to test MongoDB on OpenEBS

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

YCSB is a popular benchmark tool for NoSQL. It have ready adapters for different NoSQL DB like Cassandra, Mongo, Redis and others.

The K8s YAML files to deploy a cassandra statefulset can be found here : https://github.com/openebs/openebs/tree/master/k8s/demo/cassandra

Benchmarking MongoDB on OpenEBS using popular a BM tool like YCSB will help us identify bottlenecks and optimize OpenEBS for better application performance.

What you expected to happen:

Create the YSCB test container with requisite adapters for MongoDB and also the K8s job YAMLs to run the benchmark tests & capture metrics.

ansible-runner Dockerfile missing copying playbooks

BUG REPORT

When executing `kubectl apply -f apps/percona/tests/mysql_storage_benchmark/run_litmus_test.yaml" the ansible-runner container (inside litmus pod) returns:

ERROR! the playbook: ./percona/tests/mysql_storage_benchmark/test.yaml could not be found

You can retrieve that message with kubectl logs <litmus pod id> ansibletest -n litmus

This is because, recently, files in openebs/litmus have been restructured and the Dockerfile of ansible-runner got broken and stopped copying the playbooks.

I'm working on a PR to fix this.

The run_litmus_test job should have a consistent name

The run_litmus_test.yaml - which is the main deployment corresponding to each automated test under "tests" folder should use a common job name:

apiVersion: v1
kind: Job
metadata:
  name: litmus
  namespace: litmus

This is useful for the executor framework which tracks the job lifecycle by the name.

Create a sysbench-mongodb test container to perform mongdb benchmarks with OpenEBS storage

If you’re not familiar with sysbench, it’s a great project developed by Alexey Kopytov that lets you run different types of benchmarks (referred to as “tests” by the tool), including database benchmarks. The database tests are implemented in Lua scripts, which means you can customize them as needed (or even write new ones from scratch) – something useful for simulating specific workloads.

PerconaLab has tweaked sysbench to integrate mongodb support. We need to containerize this to be readily available to be used as part of a Kubernetes Job to perform mongodb benchmarks.

Create a custom percona image which is integrated with the percona-monitoring & management (pmm) client

Create a custom percona image that is packaged with the percona-monitoring & management client (pmm-client) . Currently, the pmm-server is available as a docker image, while most instructions for pmm-client show its setup as a manual process - i.e., install package after configuring the percona repo on the DB server.

This is needed to enable easy and comprehensive analysis of the application benchmarks like (tpcc/sysbench oltp tests) on the databases.

Add asciinema/gif to the README describing a sample ansible-based chaos litmus test

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

The Readme consists of a demo of the godog-based minio deployment test run, which communicates the philosophy of "running a test as a job"
However, the Readme doesn't stress on OR contain any demo on litmus as a chaos test framework

What you expected to happen:

Another such asciinema/gif can be added for an ansible-based chaos test run.

How to fix it?

Create/embed a new git which can cover the following aspects:
- Show existing stateful application deployed (say, percona)
- Navigate to dir & open the chaos test litmusbook and highlight the ENV (such as APP_LABEL, APP_NAMESPACE etc.,)
- Run litmusbook (`kubectl create -f apps/percona/chaos/openebs_volume_replica_failure/run_litmus_test.yaml)
- Have a parallel tab/terminal session showing running pods in litmus & app namespace
- View the logs of the ansibletest container of litmus pod to view the test steps
- View/Describe the result CR upon test completion

Anything else we need to know?:

Reference to asciinema usage: https://asciinema.org/docs/usage

reference to writing good scenarios - best practices

http://itsadeliverything.com/declarative-vs-imperative-gherkin-scenarios-for-cucumber

As a user, I want to obtain and visualize test logs to debug my litmus test failure

Currently, the litmus tests use the OpenEBS "logger" utility, which runs as a sidecar to the main test container in the litmus job to capture the pod logs (based on regex passed in the job spec). Also, the playbook/console execution log is captured using the "log_plays" stdout callback plugin. However, following major issues are present:

The logger takes a hard duration value instead of ending/terminating with the test job. This forces us to estimate test duration and provide buffers
The logger alone is not capable of providing useful visualization of collected logs for a given timestamp. This is necessary for useful debugging and should be available w/ Litmus

Add checks in chaos-based tests to confirm actual fault/failure injection

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

Currently, failures are being injected in chaos-based tests using tools such as pumba, kubectl etc., It would be of value to "confirm" the faults were actually injected, using certain mechanisms - such as restart counts, deployment resource versions etc., which are impacted by the chaos.

This is especially helpful in trusting the results when litmus suites are executed in an automated manner in CI pipelines.

e2e test case to verify labels of an openebs volume

Is this a BUG REPORT or FEATURE REQUEST?

Feature Request

Describe the Feature in detail

Every OpenEBS volume creates one or more Kubernetes Deployments. It is important to
verify if these Pods are labelled appropriately.

Some of the uses of these labels are:

Filtering OpenEBS based Deployments/Pods
Integrating Prometheus to monitor OpenEBS specific K8s objects
so on.

The e2e test case should be run to verify if appropriate OpenEBS labels are assigned to
OpenEBS Volume Deployments.

NOTE:

New labels may be added in newer releases of OpenEBS
Old labels may be removed in newer releases of OpenEBS

Simplify the executor (bulk/suite run of litmus jobs) by performing ENV setup via a "Install container"

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

Currently, the execution of litmus jobs in the "suite" mode involves setting up ansible on the control node with inventory file to access the cluster nodes.
This is an anti-pattern wrt Litmus, which is supposed to simplify the process of testing stateful workloads.
This can be resolved by running a "setup" container that can prepare the desired configuration with machine info.

What you expected to happen:

The pre-requisites to run a bulk/suite run of the test jobs should be as minimal/simple as possible.

As a user, I want to know the result of my litmus test (pass/fail)

Currently, the litmus test does not have a standard way of recording/displaying the test result. This info is being derived by the log/console output of the test job, which is not efficient. The result should be available in a set format at a specified location - this will also be useful in the context of the executor.

As a user, I want to obtain the mysql benchmark numbers for Local PV

The objective of the Litmus project is to enable the user to compare test behaviour for different storage solutions. This US will track the implementation of the logic necessary to setup local PV and use it as the the storage provider in the ansible-based MySQL benchmark test.

This will enable the user to obtain tpmC values for the TPC-C test for Local PV

Create a sysbench-mysql test container for cpu bound tests on OpenEBS storage

Sysbench lets you stress many of the fundamental components of your hardware and infrastructure, such as your disk subsystem, along with your CPUs and memory. An additional option exists that is designed to perform synthetic stress testing of MySQL

Create a openebs/tests-sysbench-client container image with a respective test config file (on the lines of the openebs/tests-tpcc-client & tpcc.conf) to able to readily run sysbench benchmarks on OpenEBS

Ability to recover data from a MySQL server by creating a read-only copy from older snapshot.

This is to test the use case described here: https://github.com/kubernetes-incubator/external-storage/blob/master/snapshot/doc/volume-snapshotting-proposal.md#alice-wants-to-backup-her-mysql-database-data

The example is copied here:

Example Use Case

Alice wants to backup her MySQL database data

Alice is a DB admin who runs a MySQL database and needs to backup the data on a remote server prior to the database upgrade. She has a short maintenance window dedicated to the operation that allows her to pause the dabase only for a short while. Alice will therefore stop the database, create a snapshot of the data, re-start the database and after that start time-consuming network transfer to the backup server.

The database is running in a pod with the data stored on a persistent volume:

apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    name: mysql
spec:
  containers:
    - resources:
        limits :
          cpu: 0.5
      image: openshift/mysql-55-centos7
      name: mysql
      env:
        - name: MYSQL_ROOT_PASSWORD
          value: rootpassword
        - name: MYSQL_USER
          value: wp_user
        - name: MYSQL_PASSWORD
          value: wp_pass
        - name: MYSQL_DATABASE
          value: wp_db
      ports:
        - containerPort: 3306
          name: mysql
      volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql/data
  volumes:
    - name: mysql-persistent-storage
      persistentVolumeClaim:
      claimName: claim-mysql

The persistent volume is bound to the claim-mysql PVC which needs to be snapshotted. Since Alice has some downtime allowed she may lock the database tables for a moment to ensure the backup would be consistent:

mysql> FLUSH TABLES WITH READ LOCK;

Now she is ready to create a snapshot of the claim-mysql PVC. She creates a vs.yaml:

apiVersion: v1
kind: VolumeSnapshot
metadata:
  name: mysql-snapshot
  namespace: default
spec:
  persistentVolumeClaim: claim-mysql

$ kubectl create -f vs.yaml

This will result in a new snapshot being created by the controller. Alice would wait until the snapshot is complete:

$ kubectl get volumesnapshots

NAME             STATUS
mysql-snapshot   ready

Now it's OK to unlock the database tables and the database may return to normal operation:

mysql> UNLOCK TABLES;

Alice can now get to the snapshotted data and start syncing them to the remote server. First she needs to promote the snapshot to a PV by creating a new PVC. To use the external provisioner a new storage class must be created:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: snapshot-promoter
provisioner: volumesnapshot.external-storage.k8s.io/snapshot-promoter

Now Alice can create the PVC referencing the snapshot in the annotations.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: snapshot-data-claim
annotations:
    snapshot.alpha.kubernetes.io/snapshot: mysql-snapshot
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: snapshot-promoter

Once the claim is bound to a persistent volume Alice creates a job to sync the data with a remote backup server:

apiVersion: batch/v1
kind: Job
metadata:
  name: mysql-sync
spec:
  template:
    metadata:
      name: mysql-sync
    spec:
      containers:
      - name: mysql-sync
        image: rsync
        command: "rsync -av /mnt/data [email protected]:mysql_backups"
      restartPolicy: Never
      volumeMounts:
        - name: snapshot-data
          mountPath: /mnt/data
  volumes:
    - name: snapshot-data
      persistentVolumeClaim:
      claimName: snapshot-data-claim

Alice will wait for the job to finish and then may delete both the snapshot-data-claim PVC as well as mysql-snapshot request (which will delete also the snapshot object):

$ kubectl delete pvc snapshot-data-claim
$ kubectl delete volumesnapshot mysql-snapshot

Reduce the turn-around time for testing apps with production like data.

Here is my usecase:

Felix is a DevOps admin who is responsible for maintaining Staging Databases for a large enterprise corporation with 400+ developers working on 200+ applications. The Staging database contains a pruned (for user information) and is constantly updated with production data. When developers make some data schema changes, they would like to test them out on the Staging setup with real data before pushing the changes for Review.

The staging database PV, PVC and the associated application are created in a separate namespace called “staging”. Only Felix has access to this namespace. He creates snapshots of the production database volume. Along with creating the snapshots, he appends some information into the snapshots that will be helpful for developers like the: like the version of the applications that are running in the staging database when this snapshot was taken.
Each developer has their own namespace. For example Simon, runs his development application in “dev-simon-app” namespace.
The cluster admin authorize Simon to access (read/get) the snapshots from the staging setup.
Simon gets the list of snapshots that are available. Picks up the snapshot or snapshots that are best suited for testing his application.
Simon creates a PVC / PV with the select snapshot and launches his applications with modified changes on it.
Simon then runs the integration tests on his application which is now accessing production like data - which helps him to identify issues with different types of data and running at scale.
After completing the tests, Simon deletes the application and the associated cloned volumes.

Add support for chaos tests on kafka

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

This is a feature request based on community feedback to include support for application-specific functional & chaos on kafka. Scenarios include:

deploying confluent helm chart for kafka and checking brokers gets Storage bound successfully
Kill broker and ensure it comes back
kill zookeeper pod and ensure it comes back
kill consumer pods (my own apps) and ensure they come a back and doesn't miss message
kill producer pod and ensure it comes back and ensure all messages get sent.

Reference for health-checks/monitoring: https://github.com/andreas-schroeder/kafka-health-check

reference to a curated list of 'testing distributed systems'

https://asatarin.github.io/testing-distributed-systems/

We can read through the items & raise issues if they are of any help to litmus!!

OpenEBS stern-based logger improvements in Litmus test jobs

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

What happened:

Sync the logger run duration with test container execution

Currently, the litmus job launches two pods : The ansible test container, followed by the stern-based logger. The logger takes a "duration arguments (default at 10m), which signifies for how long the pod logs need to collected, at the end of which the systemd (kubelet logs) are collected. However, in certain "slow" systems - like vagrant based environments where the time taken to complete a test are far more than, say GKE, the logger terminates before the test business logic is executed.

This needs to be fixed to ensure that the logger is in sync with the test container, i.e., the duration should auto-tune till the test execution actually completes.

The sonobuoy-based systemd/kubelet log collection takes place outside the testcase directory in the litmus node.

This is because the mount point used in the logger pod spec is not propagated to "nodelogger.yaml" job, called internally by logger (nodelogger is used to deploy the daemonset to collect kubelet logs). This has to be fixed by modifying the nodelogger YAML with the specified mount path before deploying it.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The mysql-slave dockerfile uses the same server-id as the mysql-master

The mysql-slave dockerfile uses the same server-id as the mysql-master in the following error during replication:

"Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work "

The dockerfile needs to be edited to use a different server id for slave

Add integration test to verify workflow of redis with OpenEBS

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

OpenEBS can be used as a Persistent Volume for Redis. The same can be setup using the YAML file provided at https://github.com/openebs/openebs/tree/master/k8s/demo/redis

What you expected to happen:

Implement a test suite in ansible to deploy redis standalone & redis cluster on Kubernetes using OpenEBS as volume.

What should the litmus container image contain

litmus should provide some guidelines with regards to the binaries that goes into its container.
Should the litmus container have following:

godog
ansible
go runtime

Should there be different litmus containers, one each for godog & ansible?

create cluster playbook should have set flags with block and rescue

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:
For better error handling block and rescue is needed. Gitlab job should parse for failure flags returned by the playbook and appropriately mark the job/stage as failed. Use result Custom Resource that can be updated with test/playbook result.
Block and Rescue needs to be incorporated with create-k8s-cluster.yml playbook with set_facts options, that will setup the flag as Test Passed and Test Failed and Gitlab or any CI tool can mark the job as passed or failed a/c to the Flags set

Anything else we need to know?:
Currently the playbook is not handelled with any sort of error. It unsuccesfully terminates when per se any tasks fails.

Test playbook of EBS Volume & GPD Disk Creation work from a container

FEATURE REQUEST?

What happened:

Test create-ebs-volume.yml & create-gpd-disk.yml playbook work from a container as like as cluster-creation working

What you expected to happen:

After running playbook ebs volume in AWS and gpd disk in GCP successfully attached and mounted

Anything else we need to know?:

AWS ebs-volume file location :- https://github.com/openebs/litmus/tree/master/k8s/aws/ebs-volumes
GCP gpd-dsik file location :-

Container should take output directory as another mountpoint

This will allow the invokers to run with the requirement parameters, parse the results for success, failure or generating graphs.

Make the active dataset size configurable for the workloads

The workloads mentioned in the vdbench files, use 10M as the active dataset size. This should be configurable via the parameters passed to the container.

--active-datasize <10[M|G|T]>

Update the kubectl binary in the ansible-runner & godog-runner images

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

The kubectl binary used in the litmus runner images (ansible-runner, godog-runners) are of older versions & need to be updated to the latest stable version (these will be revisited as the usage of litmus across (newer) versions increases/litmus capabilities or features necessitate changes)
The dockerfiles for the runners are available here: https://github.com/openebs/litmus/tree/master/tools

What you expected to happen:

Use the latest "stable" kubectl binaries for litmus tests.

How to fix it:

Update the dockerfile with the appropriate KUBE_LATEST_VERSION environment variable

Create a ISSUE_TEMPLATE.md

Add issue template,then project contributors will automatically see the template's contents in the issue form body. Templates customize and standardize the information we'd like included when contributors open issues.

Upgrade the openebs/tests-tpcc-client container to perform parallel database load in tpcc-runner.sh

The openebs/tests-tpcc-client container currently performs the following step to load the database with entries (as per the number of warehouses specified in the tpcc.conf benchmark config file) before running the actual benchmark test (tpcc_start) :-

./tpcc_load -h <DB_SERVER_IP> -P3306 -d<DB_NAME> -u <DB_USER> -p <DB_PASSWORD> -w

This is a sequential process that increases in duration with the number of warehouses specified, thereby delaying the start of the actual benchmark.

This process can be made quicker by running multiple parallel tpcc_load operations on a subset of warehouses. There is already a script (load.sh) provided by percona in the tpcc-mysql repo which does this - we need to use it in the tpcc-runner.sh with the logic to identify completion of all tpcc_load processes before proceeding to start the benchmark

Create a litmusbook to setup the local PV infrastructure (provider) on Kubernetes cluster

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

Litmus, as per its objective, is designed to be generic & run the functional/chaos tests on any storage providers (storage classes) (by adding the chaoslib/funclib corresponding to the provider). The pre-requisite for this is to ensure that the provider's control-plane is setup on the cluster before the tests are run.
Litmus simplifies the process of setting up the provider, by converting these tasks into a "litmusbook" ( i.e., a litmus job that runs the provider "setup test").
Currently, litmusbooks are only available for setup of OpenEBS (refer: https://github.com/openebs/litmus/tree/master/providers/openebs/installers/operator/master). It would be desirable to include litmusbook for setup of other providers.
Kubernetes Local PersistentVolume is a popular storage option for applications that are self-replicating. It is available as a beta release from 1.10. The local PV can be setup via a static provisioner OR by manually by applying the local volume (PV) spec.
The local volume provider setup logic (simple local PV setup using PV spec), with necessary supporting artifacts is available as ansible-playbooks here: https://github.com/openebs/litmus/tree/master/executor/ansible/provider/local-pv with associated role here: https://github.com/openebs/litmus/tree/master/executor/ansible/roles/k8s-local-pv
This issue tracks creation of a litmusbook (a job that will run ansible code similar to the above) to setup the same. The comparative benefit of the litmusbook lies in being an independent self-contained job whose simple deploy will setup the provider (thereby avoiding dependencies such as ansible install, inventory files etc.,).

What you expected to happen:

Local PV can be setup (local storage class as well as PV) by deploying a litmusbook

How to fix it:

This litmusbook can be made up of following components:
- Ansible playbooks to execute the steps to install local PV components (typically, involves a test_vars.yml, test_prerequisites.yml, test.yml)
- Local PV specification files (PV spec, storage class templates)
- Kubernetes job with right set of ENVs describing the PV attributes passed to the ansibletest container (such as say, storage class name, storage size, path of the mounted disk etc.,)

How to test it?:

The successful setup of local PV provider can be validated by deploying any application with the local storage class created by the litmusbook.

Create YSCB benchmark k8s jobs to test Cassandra DB on OpenEBS

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

YCSB is a popular benchmark tool for NoSQL. It have ready adapters for different NoSQL DB like Cassandra, Mongo, Redis and others.

The K8s YAML files to deploy a cassandra statefulset can be found here : https://github.com/openebs/openebs/tree/master/k8s/demo/cassandra

Benchmarking cassandra on OpenEBS using popular a BM tool like YCSB will help us identify bottlenecks and optimize OpenEBS for better application performance.

What you expected to happen:

Create the YSCB test container with requisite adapters for cassandra and also the K8s job YAMLs to run the benchmark tests & capture metrics.

Update "app" and "ChaosType" fields in the result custom resource spec in the litmus tests

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

Litmus result custom resource jinja template: https://github.com/openebs/litmus/blob/master/hack/litmus-result.j2)
Currently, the result custom resource updates (which are done at the beginning of a test's execution, i.e., before proceeding to the business logic (SoT) - and - upon completion of test business logic (EoT)) which are performed in a litmus test playbooks (refer test.yml of any given litmus test) only update the test name, test phase & test result fields of the CR.
It is desirable to update the app & chaostype fields of the CR in order to facilitate filtering of results for a given type of app or chaos test.

What you expected to happen:

Litmus test results can be queried on the basis of app type or chaos test type.

How to fix it:

The fix will involve:
- Passing the {{app}} & {{chaostype}} variables in the ansible template task, similar to how the other params such as {{test}}, {{phase}} & {{verdict}} are being passed. Ex: https://github.com/openebs/litmus/blob/master/apps/percona/chaos/openebs_volume_replica_failure/test.yml#L36
- The values to be provided for {{app}} is the app-label (which is derived from the pod ENV & set as a test_var called {{label}} (https://github.com/openebs/litmus/blob/master/apps/percona/chaos/openebs_volume_replica_failure/test_vars.yml#L3)
- The value to be provided for the {{chaostype}} is the name of the {{chaosutil}} derived from execution of this template (https://github.com/openebs/litmus/blob/master/apps/percona/chaos/openebs_volume_replica_failure/chaosutil.j2) - minus the (.yaml)

Test folders should contain test-specific "setup" artifacts with consistent names

While each test will be executed as an individual Kubernetes job, there may be certain pre-requisites that need to be executed/resources created (such as creation of config maps, mounted into the job pod). These have to be placed in the respective test folder with a standard naming convention, say:

..setup_<cm_1>.yaml
..setup_<cm_2>.yaml
..setup_<secret_1>.yaml

Currently, the executor identifies test jobs by its name (run_litmus_test.yaml) and picks it for execution. This can be enhanced to identify the presence of "setup" tasks like described above and execute them before running the test job.

Make it easy to debug the build (CI) failures with stateful apps

Here is my usecase:

Tim is a DevOps engineer at a large Retail store who is responsible for running a complex build pipeline that involves several mico-services. The microservices that implement a order and supply management functionalities - store the states in a set of common datastores. The Jenkins CI pipeline simulates real world interactions with the system that begin with simulating customers placing the orders to the backend systems optimizing the supply and delivery of these orders to the customers. Time has setup the Job execution pipeline in such a way that, if there are failures, the developers can back trace the state of the database and the logs associated with each stage.

The build (or job) logs are saved onto OpenEBS PV, say Logs PV
The datastores are created on OpenEBS Volumes, say Datastore PVs.
At the end of each job, either on success of failure, snapshots are taken of the Logs PV and the Datastore PVs.
When there is a build failure, the volume snapshot information is sent to all the developers whose service were running when the job was getting executed.
Each developer can bring up their own debug session in their namespace by creating a environment with cloned volumes. Either they re-run the tests manually by going back to the previous state with higher debug level or analyze the data currently available that is causing the issue.

Add integration test to verify workflow for Mongo DB in OpenEBS

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:
OpenEBS can be used as a Persistant Volume for Mongo DB. The same can be setup using the YAML file provided at https://github.com/openebs/openebs/tree/master/k8s/demo/mongodb.

What you expected to happen:
Implement a test suite in ansible to deploy Mongo DB on Kubernetes using OpenEBS as volume.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

kubectl get nodes
kubectl get pods --all-namespaces
kubectl get services
kubectl get sc
kubectl get pv
kubectl get pvc
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

Enhance the EBS disk attach playbooks to factor in existing device names on VM instance

FEATURE REQUEST?

Attach multiple EBS Volume to AWS instances by looking for available disk name inside instances.
Recommended disk name for EBS volume is - /dev/sd[f-p]

for example: -

At the time of attaching EBS Volume in instances if dev/sdf disk name already present then use the next available likedev/sdg

Currently used ansible module for creating EBS Volume

        - name: Creating and attaching EBS Volume in AWS
          ec2_vol:
            instance: i-0820e863967d14dc0
            device_name: /dev/xvdb
            region: eu-west-2
            state: present
            volume_size: 50
            volume_type: gp2
            zone: eu-west-2a

Disk name is auto generated by ansible module if device_name is not specified but currently it not check for available disk name.

Reference :- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

Include litmus sample test demo video in the Readme

An asciinema media file in the readme explaining the steps to run a sample litmus test will be really helpful for users and contributors.

Litmus job pod should have the functionality to execute commands on remote hosts

FEATURE REQUEST:
-> Currently litmus tests can run only commands only on localhost pods.
-> Litmus should have the functionality to run shell commands on remote hosts by doing ssh.
-> Running commands directly on remote hosts will give us more control over chaos tests.

Create custom mysql images to be used as master and slaves in a replication cluster

Regular MySQL dockerfiles need additional steps specifying replication details - specific to its role as master and slave container. These images can be used while setting up the mysql replication cluster on kubernetes.

Validate the size of the volumes being tested

The volumes provided to the test containers could be thin provisioned. Ensure atleast the active dataset size is available for writing.

Litmus Tests should be available as modules and should not be packed into image

FEATURE REQUEST

-> Litmus Tests should be available as modules and should not be packed into the ansible runner image itself.
-> Currently litmus tests are packed on ansible-runner image which means for running any test the user has to download all the tests which may not be of any use for him.
-> Litmus Tests can be converted to ansible roles and can be pushed to the galaxy or can be packed into a zip and pushed to some other hosting platform.
-> Since Litmus tests are ansible playbooks which means these playbooks should be reusable.

What happened:
-> Currently when we run any litmus job the ansible-runner image is pulled which contains the test.

What you expected to happen:
-> Ansible-runner image should have the business logic to get any specific test and execute it.
-> Separating tests from the image will help managing tests in a better way.

Integrate basic performance tests to execute relative benchmarks on OpenEBS storage in CI workflow

Integrate standard workload-based performance benchmark tests to execute build/image-wise relative performance comparison

Make the chaos-injection test modules reusable across tests

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

The chaos modules are supported by a single test at this point (mysql_data_persistence : refer PR openebs-archive/e2e-tests#68

What you expected to happen:

These chaos modules should be reusable across tests. Change the folder structure & the affected paths in playbook to enable this (the taskfiles to induce chaos are already generic in nature)

Isolate test execution & strengthen cleanup routines

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

What happened:

The litmus tests are all executed in a single namespace today : "litmus". In case of automated suite-run, it is possible that failure to cleanup applications or other test-specific objects can cause failures. This needs to be avoided. The tests should also contain "always-execute" sections which are capable of forced cleanup operations.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Litmus should have some standard to qualify ansible-playbooks as litmus book

FEATURE REQUEST
-> There should be some standard maintained by the litmus to judge ansible playbooks and qualify them as litmus book.
-> Currently not all ansible playbook can be converted to litmus book.
-> Some binary like litmus lint checker should be available through which users can check their ansible-playbooks and know whether their playbook is litmus parsable or not.
-> Further those tools can also automate the step of converting ansible-playbook to litmus.

All workloads

--run-wl=all

or a set of workloads

--run-wl=Basic,File

or a specific workloads

--run-wl=Basic/File-SimpleReadWrite

Make the test duration configurable

The duration of test run should be configurable via docker args.

Ensure the same image tags are used in the ansible-runner images across litmus jobs

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

What happened:

Currently, some of the litmus jobs contain the ansible-runner image with tags "ci" & others use "latest" (no tag on the image implies latest). The Travis CI has been setup to build & push "ci" (litmus still doesn't have release tags, no releases yet. As per best-practices, "latest" tag should be pushed along w/ releases. While these will be implemented & recommended for users once the 0.1 release is out, the jobs in the master branch should have all jobs with "ci" tags).

What you expected to happen:

Use the "ci" image for ansible-runner across jobs

How to fix it?:

The fix will involve:
- Performing an audit of the available litmus jobs (typically named run_litmus_test.yml) across app-deployers, liveness, loadgen, chaos & test folders & updating the image tags

litmuschaos / litmus Goto Github PK

litmus's People

Contributors

Stargazers

Watchers

Forkers

litmus's Issues

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Describe the Feature in detail

Is this a BUG REPORT or FEATURE REQUEST?

Example Use Case

Alice wants to backup her MySQL database data

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Is this a BUG REPORT or FEATURE REQUEST?

Recommend Projects

Recommend Topics

Recommend Org