Giter VIP home page Giter VIP logo

iocmanager's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

iocmanager's Issues

MNT: upgrade to python3 and apply our cookiecutter

Naively, I propose upgrading this project to python3 and applying our standard python cookiecutter. This would bring with it the framework for running tests in GHA and installation via pypi/conda-forge

Is there a specific reason this repo hasn't been upgraded? (some py2 specific feature? Secrets?)

Rough To-Do List:

  • run pyupgrade
  • apply cookiecutter
    • set up cli entrypoint
    • list dependencies
  • include in pcds-conda
  • set up pypi project and conda-forge feedstocks

Automatic submission of configuration files

Move the commit process from the application to an automated process to be run regularly (every 24h?) and remove related code from GUI.

This will make sure that configuration can be recovered even when people did not feel they were ready. It also allows the "commit host" aka 'host that holds the file lock' to be a machine more accessible e.g. to beamline staff.

Add overview documentation

Need to reformat information from the presentation:

IocManager: History
and Internals
December 15, 2020


Overview
The purpose of this talk is to describe the design and implementation of the IocManager.
How were IOCs started before IocManager?
What were the problems with this approach?
How does IocManager start IOCs, and how does this address the problems we had originally?



In the Beginning, There Was “init”...
The first user task started in Linux is called “init”.

In RHEL5 systems, this runs a set of startup scripts that live in the /etc/init.d directory.

In RHEL7 systems, this is a symbolic link to systemd, which starts a set of system services.

In either case, our “ioc” script/service does nothing more than run (as root):
       /reg/d/iocCommon/hosts/$HOSTNAME/startup.cmd


The Host startup.cmd, v1.0
Originally, the startup.cmd did two things:
Prepare the system for running IOCs by loading any necessary drivers, adjusting kernel parameters, etc.
Run /reg/d/iocCommon/sioc/$IOCNAME/startup.cmd for each IOC that should be started on this host.



The IOC startup.cmd
#!/bin/bash

export IOC="ioc-xrt-xcsimb1"
source /reg/d/iocCommon/All/fee_env.sh

$RUNUSER "mkdir -p $IOC_DATA/$IOC/autosave"
$RUNUSER "mkdir -p $IOC_DATA/$IOC/archive"
$RUNUSER "mkdir -p $IOC_DATA/$IOC/iocInfo"
$RUNUSER "chmod ug+w -R $IOC_DATA/$IOC"

cd /reg/g/pcds/package/epics/3.14/ioc/xrt/ipimb/R4.0.5/build/iocBoot/$IOC

$RUNUSER "cp -f -p ../../archive/$IOC.archive $IOC_DATA/$IOC/archive"

$RUNUSER "$PROCSERV --logfile $IOC_DATA/$IOC/iocInfo/ioc.log \
                    --name $IOC 30002 ./st.cmd"


What Problems Does This Have?
This is a lot of boilerplate script that needs to be written for each IOC.
The port number is hardcoded in the sioc file, so from the host startup.cmd, it is not immediately clear which ports are in use and which are free to use in a new IOC.
The software version of the IOC is hardcoded in the sioc file, which is run as root at startup.  So changing an IOC version requires not only editing the file, but manually killing the old IOC and using sudo to start the new one.

Goals for the IocManager
Minimize the amount of boilerplate scripting necessary for a new IOC.
Keep information about IOCs in a hutch in one place.
Simplify changing the process of updating an IOC, so it is a less privileged operation.

iocmanager.cfg
The configuration file for each hutch.
Contains:
A few global settings (hosts, etc.).
A list of python dictionaries, one for each IOC, giving all of the information about what should be run and where it should be run.

The Host startup.cmd, v2.0
Prepare the system for running IOCs by loading any special drivers, etc.
Run /reg/g/pcds/pyps/apps/ioc/latest/initIOC to load common drivers, etc. and start the all of the IOCs that belong on that host.

initIOC
Determine which hutch we are running in.
Consult hosts.special, a map of hosts to hutches.
Otherwise assume we have hostnames of the form xxx-yyy-zzzzz, where yyy is the hutch.
Set up a basic python environment.
Run the hutch-specific version of the initialization in /reg/g/pcds/pyps/config/$HUTCH/iocmanager/initIOC.hutch.

(This allows different hutches to use different versions of the IocManager initialization script.  initIOC is (mostly) unchanging, but changes can be put into initIOC.hutch.)

initIOC.hutch
Source the hutch-specific environment from /reg/d/iocCommon/All/$(HUTCH)_env.sh.
Startup procmgrd.
Conceptually, this can be thought of as a daemon which accepts remote requests to run procmgr with a given script and port as the appropriate ioc user.
In reality, it’s procServ running /bin/sh as the ioc user.
Install EDT framegrabber and EVR drivers.
Start the caRepeater processes.
Run the IocManager “startAll” script to start all of the IOCs on this host.

startAll
Read the iocmanager.cfg file.
Loop over every entry:
If the hostname is the current host, send a request to procmgrd to run the IocManager “startProc” script in a procServ, passing this script the IOC name.

startProc
Setup the standard hutch environment and create the /reg/d/iocData directory structure.
Signal the procServ process to start a new log.
Run the IocManager “getDirectory” script to read iocmanager.cfg and find the directory entry for the current IOC.

(Reading the directory from the configuration file at this point allows simple upgrading by simply restarting the IOC after changing the configuration file.)


Entering the IOC Directory in startProc
cd /reg/g/pcds/epics
if test -d $dir; then cd $dir; fi
if test -f env.sh; then source ./env.sh; fi
if test -d children/build/iocBoot/$ioc; the
      cd children/build/iocBoot/$ioc; 
fi
if test -d build/iocBoot/$ioc; then cd build/iocBoot/$ioc; fi
if test -d iocBoot/$ioc; then cd iocBoot/$ioc; fi
if test -f env.sh; then source ./env.sh; fi
IocManager tries to be smart about the directory.  It might be a relative or absolute path.  It might be a templated parent with children, a templated child, or completely untemplated.

env.sh can be in the top-level or IOC directory to further customize the environment.



startProc, Continued
Create a small status file with hostname and port info /reg/g/pcds/pyps/config/.status/$HUTCH/$IOCNAME.

(This is used to detect IOCs that are still running on a different port than the one they are currently configured to run on.)

Run “st.cmd” to start the IOC!


The Thorn on the Rose
How do we control access to the iocmanager.cfg file to prevent corruption from simultaneous writes?

NFS does have a file locking mechanism, but it has a generally bad reputation.

Therefore, IocManager uses local file locks, so all write access to the configuration must be from a single host!



Authentication and the COMMITHOST
Each configuration file may define a COMMITHOST, which defaults to “psbuild-rhel7-01”.

On startup, IocManager does an ssh to the COMMITHOST.  This doubles as an authentication mechanism.

This is probably the #1 cause of IocManager silently failing to start: it cannot access the COMMITHOST from where it is being run, and it is not prompting the user for a password or passphrase.

Summary of the Benefits of IocManager
All information about IOCs are in one file.
Easy to make sure ports are unique on each host.
Easy to update IOC software versions.
Easy to move IOCs from one host to another (especially since code was added to adjust for RHEL5/7!).
Easy to detect IOCs running on the wrong port.


In general, life is easier!

Usage of standard output for error messages interferes with script usage

Note the redirection of standard error:

[klauer@psbuild-rhel7-01  IocManager]$ ./getDirectory.py ioc-lm1k4-inj-ek9000-01 las 2>/dev/null
Error while trying to read HIOC startup file for ioc-las-und-phaseCavity!
/cds/home/t/tjohnson/trunk/workarea/ek9000

Due to:

print "Error while trying to read HIOC startup file for %s!" % host

These error printouts to standard output are everywhere in the codebase, but this one in particular appears to be causing some IOC startup issues.

IOCs failing to boot

Symptom

@@@ @@@ @@@ @@@ @@@
@@@ Received a sigChild for process 15300. The process was killed by signal 9
@@@ Current time: Wed Jun  9 16:15:00 2021
@@@ Child process is shutting down, a new one will be restarted shortly
@@@ ^R or ^X restarts the child, ^Q quits the server
@@@ Restarting child "ioc-mfx-tfs-lens"
@@@    (as /reg/g/pcds/pyps/config/mfx/iocmanager/startProc)
@@@ The PID of new child "ioc-mfx-tfs-lens" is: 15734
@@@ @@@ @@@ @@@ @@@

and nothing happens, even waiting a long time. No output except for what procServ dumps out.

I've seen this happen multiple times, and things usually recover at some point (30 mins or more?).

Guess

I think some of the config-related tools can create problems. My guess is that the lock acquisition and our poor, finicky network filesystems create hostile environments for these tools:

$ python /reg/g/pcds/pyps/config/mfx/iocmanager/getDirectory.py ioc-mfx-tfs-lens mfx

The above locks the terminal, blocking Ctrl-C and even Ctrl-Z for the longest time. 3 minutes later, and still no luck getting the IOC directory.

strace

Let's see what strace shows...
Oh, now it's back to working, so strace output is useless.

Suspicious code?

I'd guess it hangs here:

fcntl.lockf(f, fcntl.LOCK_SH) # Wait for the lock!!!!

Any thoughts on the above or where else it might be, @mcb64?

imgr: consider adding --json for script interoperability [LCLSPC-182]

Entirely a "would-be-nice-if", but consider if imgr list --json dumped out detailed IOC information that could be used and interpreted by a separate process (namely jq if you're working in a shell/pipe-based workflow).

This would, of course, require information beyond "IOC name" - adding in IOC host and all the other bits.

imgr: display help when hutch not determinable from host [LCLSPC-182]

I noticed that when not on a machine the tool recognizes --hutch is required. This makes sense, of course, but as it throws a not user-friendly exception the only path to figuring out what you did wrong is guessing/assuming that --help exists.

Throws an IndexError: list index out of range (see LCLSPC-182 for further context)

imgr: --help text is lacking [LCLSPC-182]

Beyond the commands themselves, there's not much information in --help.

A simple task to get familiar with imgr's inner-workings for a newcomer could be to add in some help text.
Maybe even link back to confluence for further details?

Port ranges and port selection - let IocManager do the work for us

We have a certain set of TCP ports that are special:

30000 RESERVED Closed For caRepeater process only
30001 - 38999 Open Closed For Soft IOCs which belong to a single hutch
39000 - 39099 RESERVED Across Subnets For Controls Group use only - Used for process manager daemons, etc
39100 - 39199 Open Across Subnets For Soft IOCs which are common to many hutches (ie: XRT shared soft IOCs)
39200 - 39999 RESERVED Across Subnets For future allocation

I think the GUI should aid you in picking an unused valid port above, from 39100-39199.
If really necessary, an "advanced" option could be configuring the port number manually.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.