gluster / gadmin Goto Github PK

Go 100.00%

gadmin's Introduction

gadmin - the Gluster CLI experience

Why?

History

Through the recent releases of Gluster, glusterd has been the preferred method to accomplish Day1 and Day2 operations as well as publish data sets about Day3. However, glusterd was designed to be the config management layer to complete transactions in the Gluster Filesystem — expand and shrink operations and such.

With project expanding, as other tools emerged to do the config management better, there were many projects to handle different stages of a Gluster deployment’s, such as Day 1 (gdeploy) and Day 3 (gstatus, gluster-prometheus-plugin).

Also, for it became necessary for Gluster to integrate with projects such as SMB, NFS-Ganesha etc. This integration was achieved via hooks provided by glusterd.

Present

The glusterd2 project is the successor to glusterd to address scaling concerns as Gluster deployments grow in size. While it provides a more programmable access to its own management functionality via a RESTful interface and plugins, its main focus is on setting up and managing a Gluster cluster.

With the scope of glusterd2 defined, the design issue is to be addressed by a Gluster management tool is: "Between the various administrative tasks performed by a Storage Administrator, which are Gluster specific and which are broader Storage Management related?".

Specifically, in the context of gadmin, it is necessary to address high-level, workflow based Storage Administration concerns for a Gluster based storage infrastructure, as tools such as gluster-ansible and glustercli from glusterd2 enable Gluster specific low-level administration.

What?

Gadmin has been conceptualised as a unified CLI tool that enables a Storage Administrator to work with a Gluster based storage infrastructure. The focus is on enabling end-to-end management experience for a Gluster based storage infrastructure, without the need to delve into Gluster specific implmentation details.

The tool is aimed as part of the implementation of a Gluster Management CLI experience that must be uniform across deployment platforms, infrastructures and scenarios. In some cases some details of the experience may differ to accommodate the specifics of a platform or a scenario. However, it is key to ensure that the administrator does not need to change the thought process behind how Gluster infrastructure is managed. The project will be developed and will work in conjunction with the gluster-ansible project, which automates the administration tasks required to setup Gluster infrastructure.

Features

Gadmin is a portable and lightweight CLI tool whose dependencies are kept at a minimum. gluster-ansible and its various roles are the primary dependencies that Gadmin needs installed and invokable on the same system.
Gadmin presents a virsh like shell session to the user.
Gadmin has a scripting mode which can be used to execute one-off workflows to enable to be used in a programmable manner.
Gadmin is a higher level tool that is concerned with the Gluster deployment as a whole. As such, it natively supports Samba, NFS-Ganesha etc. deployments and includes the relevant information about these components in conjunction with native GlusterFS information wherever applicable.
Gadmin supports deployment of glusterd2 based GlusterFS cluster and various supporting components such as Samba and NFS-Ganesha on different platforms, including various pre-baked scenarios (Day 1).
Gadmin presents workflows that enable day-to-day administration of a Gluster deployment (Day 2).
Gadmin supports monitoring the status of a Gluster deployment. However, this is on-demand, rather than perpetual. Gadmin is a CLI tool that is stateless and agent-less. It cannot function as a full-time monitoring and metrics stack. However, it may be possible to include some limited continuous monitoring functionality such as that provided by tools like top.

How?

Gadmin is stateless towards Gluster. This means that each time the status of a Gluster component is required, Gadmin makes an appropriate request for it. Gadmin does not store the state of the infrastructure anywhere.
Gadmin is directly coupled with gluster-ansible. Much of Gadmin’s functionality requires gluster-ansible. However, the opposite is not true. gluster-ansible must be able to function irrespective of whether Gadmin is available or being used.
Gadmin supports both synchronous ('requests') and asynchronous ('jobs') tasks.
Requests are tasks that comprise of a single action (an API call eg.).
Requests are point-in-time and their output is only for display purposes.
Gadmin supports multi-step tasks. These are executed as jobs in a background process.
Steps can be invocations of gluster-ansible playbooks.
gluster-ansible invocation is always a job.
Gadmin creates a directory structure per job into which all of the job’s output is written.
Gadmin executes gluster-ansible such that it writes its own output to disk, in a format Gadmin can understand, in the job’s directory structure.
Gadmin does not try to monitor the stdout or stderr streams of a step being executed. Instead, it reads the output files created by the steps. This ensures that even if Gadmin itself dies, the step being executed can continue.
Every executed job and each step of the job has the full context in which to carry out its operation.

gadmin's People

Contributors

Stargazers

Watchers

Forkers

sankarshanmukhopadhyay devyanikota brainfunked

gadmin's Issues

Background job handling

Background jobs are executed as a subprocess which is spawned with all the necessary information for it to be able to carry out all the necessary actions, even if the main gadmin process dies while the job is being executed. It is a requirement for all input and output of the job are logged to the disk. This approach ensures that complete data is available for any future debugging. The files also provide an IPC mechanism whereby the main gadmin process can simply show any part of the job's output to the user, on demand, by reading and/or tailing the specific files.

All output of a job is written to a separate directory under $GADMIN_HOME/jobs/. The description of the various files in this directory, used below, is in #15.

The implementation would consist of the following:

Commands would be implemented to indicate whether they're a background job.
When a command that is a background job are executed, the following happens:
- A goroutine is spawned as a monitor.
- The goroutine writes the command arguments to the input file.
- The goroutine spawns a separate process for the actual job execution and waits on it. The only communication between the monitor and the job process is to check whether the process is running and once the process exits, to gather its exit status (#17).

gluster-ansible integration

Gadmin has a dependency on gluster-ansible. gluster-ansible implements various system tasks that are needed by Gadmin to provide end user workflows. This tasks include setting up of a gluster cluster in various scenarios and system level operations required for some management functionality as well.

The integration with gluster-ansible is proposed to function as follows:

Playbook YAML templates need to be maintained for various workflows that are enabled via gadmin.
During the execution of a command, gadmin would populate a playbook file by filling in the variables based on user inputs.
This filled-in playbook file is stored in the job's directory structure.
Gadmin execute ansible against this playbook.
gluster-ansible needs to ship with a plugin that enables ansible to directly write a parsable output (YAML, JSON etc.) in a defined format, per playbook, in the job's directory structure. Gadmin would need to supply the filename to write the output to at the time of execution.
Gadmin monitors the exit status of the ansible execution to figure out the success or failure.
Gadmin shows the output of the job being executed by reading and parsing the output file written by gluster-ansible.

The following details need to be figured out:

Which project maintains and ships the playbooks? To me, gluster-ansible seems like the right place. It may be possible that in some scenarios, gadmin specific playbooks need to be maintained because gluster-ansible's default playbook doesn't fit into Gadmin's command structure. The actual playbooks themselves would need to be used/developed on a per-command basis in Gadmin.
Plugin implementation needs to be done in gluster-ansible. The format for the file written needs to be documented.
Are ansible's exit statuses well defined?

@sac @devyanikota

Execution as a non-privileged user with no system configuration

When gadmin is run by a non-privileged user, the following needs to happen:

Check if $GADMIN_HOME is set. If yes:
- Inform the user that it is set and that its value will be used as the work directory.
- Check if the directory is writable. If not, inform the user and exit.
If $GADMIN_HOME is not set, inform the user and exit.

Job exit statuses

Specifically defined exit statuses are used by the monitor to show the success or failure of a finished background job to the user (#16). The following exit statuses are required to start with:

0: Job completed successfully.
1: Job failed, no rollback possible.
2: Job failed, rollback executed, rollback succeeded.
3: Job failed, rollback executed, rollback failed.
254: Unknown error.

System-wide execution privileges and the corresponding installation procedure.

The default method of running gadmin should be as a non-privileged user. The only requirement for gadmin to be able to be able to invoke gluster-ansible roles. This requires root access to the target systems to be configured for the user that would run gadmin. Additionally, the user gadmin runs as needs to be able to write to a specific directory (referenced throughout as $GADMIN_HOME).

When installed from a package, it would be possible to setup the following environment on the system to enable the above:

Create a system user and group gadmin with the following configuration:
- Disable password based login.
- Enable shell.
- Set home directory to /var/lib/gadmin.
- Set home directory permissions to 0755 with ownership root:root.
- Install a .bashrc in /var/lib/gadmin to set $GADMIN_HOME to /var/lib/gadmin/work.
- Create /var/lib/gadmin/work with permissions 0755 with ownership gadmin:gadmin.

Cluster inventory refresh

The inventory structure has been defined in #8.

gadmin would determine the list of available clusters by checking for the following:

Get the list of directories under $GADMIN_HOME/ansible/inventory/.
For each of the directories, check if the file hosts.yaml exists.

The above steps give us an inventory 'refresh' step. The refresh would need to be carried out each time a new cluster is defined or an existing cluster is removed.

Execution module interface

Execution modules enable gadmin to communicate with external systems. Four types of modules are currently envisioned:

Local command
Remote command via ansible
GD2 api call
Ansible role execution via playbooks

Cluster list

List all the available clusters in the inventory.

Check whether ansible and the necessary ansible roles are available on the system

To be updated based on gluster/gluster-ansible#40.

Startup checks

Are we running as root? If yes, warn and exit.
Is $GADMIN_HOME set and the corresponding work directory writable?
- System-wide configuration: #5
- Non-privileged user with no system-wide configuration: #6
Is ansible installed? #7
Are the necessary ansible roles installed? #7
Does the host inventory exist? If yes, refresh it. #9
If only one cluster has been defined, verify (#10) and select it (#12).
If more than one cluster has been defined, ask the user (#11) to select one (#12).

Background job definition

Commands implemented in gadmin could indicate whether they're a background job or not. This is to ensure that all the necessary handling steps are executed when a command is run by the user (#16).

Jobs themselves are transactions consisting of multiple definable steps, executed in a serialised sequence. Each step calls an execution module with arguments. The execution modules are built into gadmin. To start with, the following modules could be shipped: local command, remote command (using ansible), gd2 api client, ansible role. Generic module structure and interface are addressed in separate issues.

Every step can also be accompanied with rollback, verification and rollback verification steps. Any or all of these three types can be defined. Essentially, they're regular steps, but their execution is based on specifically defined conditions.

Rollback steps can be multiple and executed in a serialised sequence. Rollback steps are executed only when the step they're defined against fails. If any of the rollback steps fails, the rest of the steps, both regular and rest of the rollback are skipped.

Rollback verification steps are executed only after all the rollback steps are successfully completed.

Verification steps are executed only when a regular step succeeds.

The execution module interface is being defined at #19.

Runtime directory structure

The runtime directory must be supplied to gadmin via the $GADMIN_HOME environment variable. Check #5 and #6 for more details.

Upon startup, gadmin would check if the directory exists and is writable. If not, an error message would be displayed to the user asking for the directory to be created with the permissions 0755 with the ownership of the user gadmin is being run as, before exiting with a failure exit status.

The runtime directory would be of the following structure:

GADMIN_HOME/
├── ansible
├── input
└── jobs

ansible directory contains all the ansible inventory (#8).
input directory would contain the templates for the command inputs to be filled in by the user. These are generated as required.
jobs directory would contain the directory structure for all the jobs executed.

Cluster select

Allow the user to select a cluster from the ones defined in the inventory. Verify (#10) the cluster before confirming the selection.

Once selected, set a session variable pointing to the cluster's name.
All the commands in the session that apply to a cluster would be carried out against this cluster by passing the variable as an argument to each command.
When no cluster is selected, cluster scoped command executions are disabled.
Inventory management commands such as cluster define can be executed even when no cluster is defined.
Upon selection, the prompt should change to indicate the name of the selected cluster.

Per-job directory structure

For every background job, a separate directory is created under $GADMIN_HOME/jobs/. The name of the directory would be a job_id. The job_id would be generated as a string interpolation, separated by an underscore, of the following:

ISO timestamp
A random string
Command name

The string representation of each command would be a part of the command's definition code.

Under the job directory, the following files could exist:

input.yaml: all the arguments to the command, including the name of the cluster and any specific group from that cluster the job is executed against.
output.yaml: output from the job process to list the steps being executed, the status of each of the steps as they're executed and the cluster state (if applicable) at the end of the execution of each of the steps.
<step_number>_<step_description>: output from individual steps. The format of the output would be determined by the execution module. The step number is picked up from the definition of the transaction.
<step_number>_rollback_<rollback_step_number>_<step_description>: output from individual rollback steps associated with a specific step.

Per-cluster inventory verification and loading

Inventory structure has been defined in #8.

At the time of cluster selection, an initial verification of the cluster's definition needs to be carried out. The hosts.yaml files for respective clusters would be read and parsed when a cluster is selected. The parsing is only to determine the defined roles and the number of hosts in the cluster. If there are no hosts defined, the cluster cannot be selected and hence no operations can be carried out against the cluster. The user would be instructed to define the hosts for this cluster.

Cluster inventory

gadmin executes ansible roles. These need to be executed against an inventory of hosts. gadmin itself needs to know the cluster hosts, their roles and the endpoints to contact etc. The logical choice is for gadmin to maintain this data in the form of ansible inventory.

gadmin supports multiple clusters. Instead of writing all the information about all the clusters to a single inventory file, it would be beneficial to use a directory based layout with the inventory files themselves being YAML. This makes it easier for the administrator to maintain individual clusters' inventory separately, if manually modified, and also for gadmin to update these files without touching the other clusters.

Here's the directory layout under $GADMIN_HOME:

ansible/
└── inventory/
    └── <cluster_name>/
        ├── group_vars/
        ├── hosts.yaml
        └── host_vars/

<cluster_name> would be the name supplied by the user when defining a cluster.

In the inventory itself, for every cluster a top level group of <cluster_name> would be defined. Here's an example YAML inventory file:

all:
  children:
    cluster0:
      hosts:
        192.168.100.71:
        192.168.100.72:
        192.168.100.73:
      children:
        monitoring:
          hosts:
            192.168.100.71:
        smb:
          hosts:
            192.168.100.72:

The group names monitoring and smb are illustrative. The actual group names would be standardised as features are implemented in gadmin for different roles.

gadmin needs to be able to read and write the inventory file based on this directory structure.

Proof of Concept implementation

Implement a proof of concept for gadmin with the following context:

A virsh-like shell experience.
A script mode to invoke gadmin one command at a time for use in shell scripts.
A stateless, query-response implementation where gadmin always queries gd2 to gather state information about the gluster.
No exclusive access or lock on the cluster.
Cluster related information to be presented from a consumer perspective: eg, for a volume present the health, capacity, workload type and gateway information.
A wrapper over gluster-ansible to accommodate automated system administration tasks that can't be implemented in any of the actual storage components such as gd2, nfs-ganesha etc. The fact that ansible is being used should be invisible to the administrator using gadmin.

I'll update this issue to list the specific scenarios that would be part of the PoC to demonstrate the above.