pcdshub / iocmanager Goto Github PK
View Code? Open in Web Editor NEWpyqt5 + pyca-based EPICS IOC Manager
Home Page: https://confluence.slac.stanford.edu/display/PCDS/IOC+Manager+for+Users
pyqt5 + pyca-based EPICS IOC Manager
Home Page: https://confluence.slac.stanford.edu/display/PCDS/IOC+Manager+for+Users
Naively, I propose upgrading this project to python3 and applying our standard python cookiecutter. This would bring with it the framework for running tests in GHA and installation via pypi/conda-forge
Is there a specific reason this repo hasn't been upgraded? (some py2 specific feature? Secrets?)
Rough To-Do List:
Move the commit process from the application to an automated process to be run regularly (every 24h?) and remove related code from GUI.
This will make sure that configuration can be recovered even when people did not feel they were ready. It also allows the "commit host" aka 'host that holds the file lock' to be a machine more accessible e.g. to beamline staff.
Need to reformat information from the presentation:
IocManager: History
and Internals
December 15, 2020
Overview
The purpose of this talk is to describe the design and implementation of the IocManager.
How were IOCs started before IocManager?
What were the problems with this approach?
How does IocManager start IOCs, and how does this address the problems we had originally?
In the Beginning, There Was “init”...
The first user task started in Linux is called “init”.
In RHEL5 systems, this runs a set of startup scripts that live in the /etc/init.d directory.
In RHEL7 systems, this is a symbolic link to systemd, which starts a set of system services.
In either case, our “ioc” script/service does nothing more than run (as root):
/reg/d/iocCommon/hosts/$HOSTNAME/startup.cmd
The Host startup.cmd, v1.0
Originally, the startup.cmd did two things:
Prepare the system for running IOCs by loading any necessary drivers, adjusting kernel parameters, etc.
Run /reg/d/iocCommon/sioc/$IOCNAME/startup.cmd for each IOC that should be started on this host.
The IOC startup.cmd
#!/bin/bash
export IOC="ioc-xrt-xcsimb1"
source /reg/d/iocCommon/All/fee_env.sh
$RUNUSER "mkdir -p $IOC_DATA/$IOC/autosave"
$RUNUSER "mkdir -p $IOC_DATA/$IOC/archive"
$RUNUSER "mkdir -p $IOC_DATA/$IOC/iocInfo"
$RUNUSER "chmod ug+w -R $IOC_DATA/$IOC"
cd /reg/g/pcds/package/epics/3.14/ioc/xrt/ipimb/R4.0.5/build/iocBoot/$IOC
$RUNUSER "cp -f -p ../../archive/$IOC.archive $IOC_DATA/$IOC/archive"
$RUNUSER "$PROCSERV --logfile $IOC_DATA/$IOC/iocInfo/ioc.log \
--name $IOC 30002 ./st.cmd"
What Problems Does This Have?
This is a lot of boilerplate script that needs to be written for each IOC.
The port number is hardcoded in the sioc file, so from the host startup.cmd, it is not immediately clear which ports are in use and which are free to use in a new IOC.
The software version of the IOC is hardcoded in the sioc file, which is run as root at startup. So changing an IOC version requires not only editing the file, but manually killing the old IOC and using sudo to start the new one.
Goals for the IocManager
Minimize the amount of boilerplate scripting necessary for a new IOC.
Keep information about IOCs in a hutch in one place.
Simplify changing the process of updating an IOC, so it is a less privileged operation.
iocmanager.cfg
The configuration file for each hutch.
Contains:
A few global settings (hosts, etc.).
A list of python dictionaries, one for each IOC, giving all of the information about what should be run and where it should be run.
The Host startup.cmd, v2.0
Prepare the system for running IOCs by loading any special drivers, etc.
Run /reg/g/pcds/pyps/apps/ioc/latest/initIOC to load common drivers, etc. and start the all of the IOCs that belong on that host.
initIOC
Determine which hutch we are running in.
Consult hosts.special, a map of hosts to hutches.
Otherwise assume we have hostnames of the form xxx-yyy-zzzzz, where yyy is the hutch.
Set up a basic python environment.
Run the hutch-specific version of the initialization in /reg/g/pcds/pyps/config/$HUTCH/iocmanager/initIOC.hutch.
(This allows different hutches to use different versions of the IocManager initialization script. initIOC is (mostly) unchanging, but changes can be put into initIOC.hutch.)
initIOC.hutch
Source the hutch-specific environment from /reg/d/iocCommon/All/$(HUTCH)_env.sh.
Startup procmgrd.
Conceptually, this can be thought of as a daemon which accepts remote requests to run procmgr with a given script and port as the appropriate ioc user.
In reality, it’s procServ running /bin/sh as the ioc user.
Install EDT framegrabber and EVR drivers.
Start the caRepeater processes.
Run the IocManager “startAll” script to start all of the IOCs on this host.
startAll
Read the iocmanager.cfg file.
Loop over every entry:
If the hostname is the current host, send a request to procmgrd to run the IocManager “startProc” script in a procServ, passing this script the IOC name.
startProc
Setup the standard hutch environment and create the /reg/d/iocData directory structure.
Signal the procServ process to start a new log.
Run the IocManager “getDirectory” script to read iocmanager.cfg and find the directory entry for the current IOC.
(Reading the directory from the configuration file at this point allows simple upgrading by simply restarting the IOC after changing the configuration file.)
Entering the IOC Directory in startProc
cd /reg/g/pcds/epics
if test -d $dir; then cd $dir; fi
if test -f env.sh; then source ./env.sh; fi
if test -d children/build/iocBoot/$ioc; the
cd children/build/iocBoot/$ioc;
fi
if test -d build/iocBoot/$ioc; then cd build/iocBoot/$ioc; fi
if test -d iocBoot/$ioc; then cd iocBoot/$ioc; fi
if test -f env.sh; then source ./env.sh; fi
IocManager tries to be smart about the directory. It might be a relative or absolute path. It might be a templated parent with children, a templated child, or completely untemplated.
env.sh can be in the top-level or IOC directory to further customize the environment.
startProc, Continued
Create a small status file with hostname and port info /reg/g/pcds/pyps/config/.status/$HUTCH/$IOCNAME.
(This is used to detect IOCs that are still running on a different port than the one they are currently configured to run on.)
Run “st.cmd” to start the IOC!
The Thorn on the Rose
How do we control access to the iocmanager.cfg file to prevent corruption from simultaneous writes?
NFS does have a file locking mechanism, but it has a generally bad reputation.
Therefore, IocManager uses local file locks, so all write access to the configuration must be from a single host!
Authentication and the COMMITHOST
Each configuration file may define a COMMITHOST, which defaults to “psbuild-rhel7-01”.
On startup, IocManager does an ssh to the COMMITHOST. This doubles as an authentication mechanism.
This is probably the #1 cause of IocManager silently failing to start: it cannot access the COMMITHOST from where it is being run, and it is not prompting the user for a password or passphrase.
Summary of the Benefits of IocManager
All information about IOCs are in one file.
Easy to make sure ports are unique on each host.
Easy to update IOC software versions.
Easy to move IOCs from one host to another (especially since code was added to adjust for RHEL5/7!).
Easy to detect IOCs running on the wrong port.
In general, life is easier!
Note the redirection of standard error:
[klauer@psbuild-rhel7-01 IocManager]$ ./getDirectory.py ioc-lm1k4-inj-ek9000-01 las 2>/dev/null
Error while trying to read HIOC startup file for ioc-las-und-phaseCavity!
/cds/home/t/tjohnson/trunk/workarea/ek9000
Due to:
Line 955 in 71e05c9
These error printouts to standard output are everywhere in the codebase, but this one in particular appears to be causing some IOC startup issues.
@@@ @@@ @@@ @@@ @@@
@@@ Received a sigChild for process 15300. The process was killed by signal 9
@@@ Current time: Wed Jun 9 16:15:00 2021
@@@ Child process is shutting down, a new one will be restarted shortly
@@@ ^R or ^X restarts the child, ^Q quits the server
@@@ Restarting child "ioc-mfx-tfs-lens"
@@@ (as /reg/g/pcds/pyps/config/mfx/iocmanager/startProc)
@@@ The PID of new child "ioc-mfx-tfs-lens" is: 15734
@@@ @@@ @@@ @@@ @@@
and nothing happens, even waiting a long time. No output except for what procServ dumps out.
I've seen this happen multiple times, and things usually recover at some point (30 mins or more?).
I think some of the config-related tools can create problems. My guess is that the lock acquisition and our poor, finicky network filesystems create hostile environments for these tools:
$ python /reg/g/pcds/pyps/config/mfx/iocmanager/getDirectory.py ioc-mfx-tfs-lens mfx
The above locks the terminal, blocking Ctrl-C and even Ctrl-Z for the longest time. 3 minutes later, and still no luck getting the IOC directory.
Let's see what strace
shows...
Oh, now it's back to working, so strace
output is useless.
I'd guess it hangs here:
Line 490 in 39a284d
Any thoughts on the above or where else it might be, @mcb64?
Entirely a "would-be-nice-if", but consider if imgr list --json
dumped out detailed IOC information that could be used and interpreted by a separate process (namely jq
if you're working in a shell/pipe-based workflow).
This would, of course, require information beyond "IOC name" - adding in IOC host and all the other bits.
I noticed that when not on a machine the tool recognizes --hutch
is required. This makes sense, of course, but as it throws a not user-friendly exception the only path to figuring out what you did wrong is guessing/assuming that --help
exists.
Throws an IndexError: list index out of range
(see LCLSPC-182 for further context)
Beyond the commands themselves, there's not much information in --help
.
A simple task to get familiar with imgr
's inner-workings for a newcomer could be to add in some help text.
Maybe even link back to confluence for further details?
We have a certain set of TCP ports that are special:
30000 | RESERVED | Closed | For caRepeater process only |
---|---|---|---|
30001 - 38999 | Open | Closed | For Soft IOCs which belong to a single hutch |
39000 - 39099 | RESERVED | Across Subnets | For Controls Group use only - Used for process manager daemons, etc |
39100 - 39199 | Open | Across Subnets | For Soft IOCs which are common to many hutches (ie: XRT shared soft IOCs) |
39200 - 39999 | RESERVED | Across Subnets | For future allocation |
I think the GUI should aid you in picking an unused valid port above, from 39100-39199.
If really necessary, an "advanced" option could be configuring the port number manually.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.