bluesky / bluesky-queueserver Goto Github PK

View Code? Open in Web Editor NEW

11.0 7.0 20.0 3.63 MB

Server for queueing plans

Home Page: https://blueskyproject.io/bluesky-queueserver/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00% Shell 0.01%

bluesky bluesky-queueserver bluesky-webclient

bluesky-queueserver's Introduction

Bluesky — An Experiment Specification & Orchestration Engine

Source	https://github.com/bluesky/bluesky
PyPI	`pip install bluesky`
Documentation	https://bluesky.github.io/bluesky
Releases	https://github.com/bluesky/bluesky/releases

Bluesky is a library for experiment control and collection of scientific data and metadata. It emphasizes the following virtues:

Live, Streaming Data: Available for inline visualization and processing.
Rich Metadata: Captured and organized to facilitate reproducibility and searchability.
Experiment Generality: Seamlessly reuse a procedure on completely different hardware.
Interruption Recovery: Experiments are "rewindable," recovering cleanly from interruptions.
Automated Suspend/Resume: Experiments can be run unattended, automatically suspending and resuming if needed.
Pluggable I/O: Export data (live) into any desired format or database.
Customizability: Integrate custom experimental procedures and commands, and get the I/O and interruption features for free.
Integration with Scientific Python: Interface naturally with numpy and Python scientific stack.

Bluesky Documentation.

The Bluesky Project enables experimental science at the lab-bench or facility scale. It is a collection of Python libraries that are co-developed but independently useful and may be adopted a la carte.

Bluesky Project Documentation.

See https://bluesky.github.io/bluesky for more detailed documentation.

bluesky-queueserver's People

Contributors

Stargazers

Watchers

bluesky-queueserver's Issues

Inconsistencies in ZMQ API

Systematic review of the available ZMQ API showed some minor inconsistencies that should be fixed. The proposed changes are based on the following guidelines for consistency:

Each request (with the exception of status request) is expected to return the parameters success and msg. success is a boolean (True/False) which indicates if the request was processed correctly (accepted) or failed (rejected). The parameter msg contains error message in case the request fails and the empty string otherwise.
Plan queue items should be referred as items, not as plans.

queue_get

Add return parameters success and msg (for consistency with the rest of API).
Rename running_plan to running_item (missed whan plan queue items were renamed from 'plans' to 'items').

history_get

Add return parameters success and msg (for consistency with the rest of API).

history_clear

return parameter msg should be "" if request is processed successfully.

queue_item_get, queue_item_remove, queue_item_move

return parameter plan should be renamed to `item.

queue_clear

return parameter msg should be "" if request is processed successfully.

move server to FastAPI

The current API server is implemented via aiohttp, but we are using FastAPI for several other projects and there is no reason to maintain a different tech-stack here.

Accept other ways of populating the worker's namespace

We currently use IPython profile startup scripts for most beamline configurations. IPython startup scripts are executed in a single namespace, sorted alphanumerically. To make that sorting explicit, we have adopted the convention of prefixing them with a number, like 00-startup.py.

This system is mostly a result of inertia, borne of copy/pasting the first working bluesky configuration (CSX) around the ring during NSLS-II's rapid scale-up. There are many downsides to our system. Two important ones are:

Python scripts that begin with a digit are not importable as modules. This makes it difficult to build tests and documentation around them.
Executing multiple scripts in one namespace may be convenient at small scales, but it confuses linters and modern editors (as they cannot see where variables are defined). This has led to serious bugs in practice. In one memorable incident at SRX, the critical databroker.Broker instance that was saving user data, db, was shadowed in a subsequent script and was therefore garbage collected and unsubscribed from the RunEngine.

The MVP of the worker was tuned to the reality that we use IPython profiles mostly named profile_collection and they tend to include scripts named like [0-9][0-9]*.py. (The worker skips any scripts that do not adhere to this convention even though IPython does not.)

bluesky-queueserver/bluesky_queueserver/manager/profile_ops.py

Lines 208 to 210 in 994b858

 file_pattern = os.path.join(path, "[0-9][0-9]*.py") 

 file_list = glob.glob(file_pattern) 

 file_list.sort() # Sort in alphabetical order

This was a reasonable MVP. But I think we should pivot to work with other approaches and generally try to steer away from IPython profiles in the future, given the downsides outlined above.

I suggest changing start_re_worker in the following ways:

Provide a mutually exclusive group of parameters including:
- a mode that accepts an importable module as in -m my_pacakge.my_module
- a mode that accepts a script as in --script path/to/script.py
- a mode that accepts an IPython profile as in --profile PROFILE_NAME
The name profile_collection is just a convention (one with at least one exception within NSLS-II and possibly more elsewhere). I suggest changing --profile-collection to just --profile PROFILE_NAME, taking a profile name, and respecting IPYTHON_DIR to find the directory on the filesystem.

Docs should be pushed to bluesky/bluesky.github.io repo

There are two ways to publish to the documentation for GH pages: by pushing to a special branch of each repo (gh-pages by convention, but configurable) or by pushing to a central repo with a special name, ORG_NAME.github.io. For the bluesky org it's https://github.com/bluesky/bluesky.github.io.

For example, in the bluesky/bluesky CI, we configure doctr to build the bluesky docs and push them to the master branch of bluesky/bluesky.github.io under the subdirectory bluesky.

      doctr deploy --deploy-repo bluesky/bluesky.github.io --deploy-branch-name master bluesky;

Source: https://github.com/bluesky/bluesky/blob/888716e078dcc0f0479c0583e7a1a940af3f2524/.travis.yml#L110

The advantage of this approach is that all the static assets are in one place and not polluting the respective repos; they can stay lean. GitHub allows you to mix approaches if you want, with some repos having a gh-pages branch and some repos not, which is why blueskyproject.io/bluesky routes to the static assets in bluesky.github.io/bluesky and blueskyproject.io/bluesky-queueserver routes to the static assets in the gh-pages branch of bluesky-queueserver.

For the sake of keeping things consistent across the project, I think we should update the Action to push to bluesky.github.io.

Default (built-in) implementation for the function 'spreadsheet_to_plan_list' (HTTP Server)

PR #137 implements REST API that accepts Excel spreadsheet, converts the spreadsheet to a list of plans and add the plans to the queue. In the existing implementation of API is using the function spreadsheet_to_plan_list provided in external beamline-specific module. The code also contains placeholder for the built-in function that is supposed to be used as default implementation in cases when external module is not loaded or the spreadsheet is rejected by the beamline-specific function (the external function returns None).

Implement the default processing function spreadsheet_to_plan_list that accepts Excel/Open Office (.xlsx, .xls, .odt etc.) single-sheet workbooks or .csv files. Spreadsheet format: top row contains parameter names, the remaining rows contain plan parameters. Columns: first column - plan name, second column - args (comma separated as in the list), the remaining columns are kwargs values (kwargs keys are column names in the first rows).

Externalize the queue

Currently the queue is a dequeue in memory, we should externalize that to redis (?) so that it persists across server restarts.

Editable entries in the queue

Task for 5L demo.

Add support for cancellation / interruption

We need to add a way for the API server to stop / abort / halt the currently running run.

This could be done several ways:

suspender-like (aka killers) + a dedicated PV
actually sending SIGINT(s) to the worker process
some other push mechanism on the worker-server communication channel

Implement CLI and REST API interfaces for 'permissions_reload' ZMQ API

Implement CLI and REST API interfaces for 'permissions_reload' ZMQ API that was introduced in PR #110

Provide a parameterized interface to add plan to queue

Currently the examples of using the API server PUT to it from the command line, we should also present the user with a html (qt?) form (built via #12) that the user can use to set the plan parameters and a button to add to the queue.

per-user persistent meta-data

We should be able to capture metadata from the user (not already in the sample or admin databases) and persist that meta-data across sessions.

API for detecting if a script is executed by RE Worker

Implement API that can be used in startup scripts to detect if the script is executed in RE Worker environment.

select the serialization between worker and server

Currently json is pulled from the server to the worker. This is probably a good choice to start with but we may want to re-consider (or allow more than one encoding?).

Publish the parameters of a plan in a machine readable way

To be able to (semi) auto generate UIs we need to be able to describe the parameters is a machine readable way (names, types, optional or not, etc ). This has two parts:

pick a markup to send between the client and the server
start to annotate plans

Expose method to user to re-order queue / cancel scheduled plans

Currently the only way to remove things from the queue is to pop them from the top (which is done by the worker). The user should be able to re-order or remove plans in the queue that have not yet been run.

ZMQ API: rename API method '' (empty string) to 'ping'.

Systematic review of the existing ZMQ API showed that in order to ping RE Manager an application needs to call the method named '' (empty string). This is unconventional and the method should be renamed to 'ping' method. HTTP server will still support /. The change is not expected to break any known 3rd party code.

RE-mimic CLI to talk to queue server

Provide a restor that mimics the current feel of the RE, but instead of having the plans / devices in the user's process, it has proxies that on RE(plan(...)) does the correct PUT to the queue server and waits for that plan to return.

We may want to build on this and have a non-blocking version of this as well for scripting purposes / use in autonomous agents (but how closely we want to mimic the current feel in that case is debatable).

Develop the vocabulary between the server and the worker

Currently the only communication between the worker and the sever is "next plan please". We will presumable need a richer language to support :

the next plan to run
interruptions / cancellations
heart beat
currently running a plan / idle
what plans / devices are or should be available
??

This could be a single channel with many message types or this could be many dedicated channels?

qserver_list_of_plans_and_devices points to a malformed entrypoint

$ qserver_list_of_plans_and_devices -h
Traceback (most recent call last):
  File "/home/dallan/miniconda3/envs/py38/bin/qserver_list_of_plans_and_devices", line 33, in <module>
    sys.exit(load_entry_point('bluesky-queueserver', 'console_scripts', 'qserver_list_of_plans_and_devices')())
  File "/home/dallan/miniconda3/envs/py38/bin/qserver_list_of_plans_and_devices", line 25, in importlib_load_entry_point
    return next(matches).load()
StopIteration

0MQ API: submit a batch (list) of plans to RE Manager

PR #137 offers implementation of the REST API for uploading Excel workbook to HTTP Server. The workbook is then converted to a batch of plans and uploaded to the queue. The algorithm for converting spreadsheet rows to plans may be complicated, resulting in multiple plans generated per spreadsheet row. The additional plans may be needed to implement steps of changing samples, calibration, changing parameters etc. Therefore the generated sequence should be executed as a whole, without missing any step (skipping any plan).

In current implementation the batch of plans is uploaded to RE Manager via 0MQ by calling queue_item_add API for each plan. If some plans are rejected by RE Manager in the process of uploading, the queue may contain the sequence of plans that doesn't make sense and may be harmful for sample or equipment in case it is executed. So the existing implementation may be suitable for demonstration of the feature, it may not work reliably in practice.

0MQ API that accepts a batch if items should be implemented. The API will receive the list of items, validate each item (success guarantees that the item could be added to the queue) and if validation is successful for all items, then the batch is added to the queue. Otherwise the batch is rejected with error message. Update the existing implementation for /queue/upload/spreadsheet API to use the new 0MQ API instead of queue_item_add API.

Implement encryption for 0MQ communication channel

Implement security features that would protect RE Manager controlling real hardware from requests from unauthorized clients. It is assumed that both ZMQ Server (RE Manager) and the client (HTTP Server or GUI client) are located on a secure network within the lab (where it is relatively easy to get physical access to the computers running the server and the client). The intention is to protect RE Manager from accidental interference due to errors in configuration (network may contain multiple server/client pairs, including systems used for development and testing and may be running without encryption), not from malicious hacking attacks.

The proposed security feature may be fully implemented by activating and configuring built-in ZMQ Curve-based encryption and providing convenient configuration options for each component of the system. The encryption scheme is using two public/private key pairs, one for the server and one for the client. The key pairs are generated using zmq.curve_keypair() function (there is also zmq.curve_public() function that generates public key from the private key). In the first version, the clients will be assigned permanent key pair (it can be changed to a configurable or randomly generated key pair later if needed). The server and the client should be configured with private and public keys from the server key pair before the client can communicate with the server. If the keys don't match or the client has encryption disabled, the ZMQ requests will time out.

The proposed approach was tested in the existing Queue Server code by hard-coding the key pairs and running the server in multiple modes. In RE Manager code ZMQ socket was set as a Curve server and a private key (long term secret key) was set:

        logger.info("Starting ZeroMQ server ...")
        self._zmq_socket = self._ctx.socket(zmq.REP)
        self._zmq_socket.set(zmq.CURVE_SERVER, 1)
        self._zmq_socket.set(zmq.CURVE_SECRETKEY, ">YXLq7tT:)VGXS>&2f0r*x[S24fFjl*V6b(lISyI".encode("utf-8"))
        self._zmq_socket.bind(self._ip_zmq_server)
        logger.info("ZeroMQ server is waiting on %s", str(self._ip_zmq_server))

The client ZMQ socket is operating in the default Curve client mode. The SERVERKEY is the server public key, which will be made configurable. The PUBLICKEY/SECRETKEY is the key pair of the client, which is used for encrypting message set from server to client. This key pair may remain hard coded for now, since it does not play essential role in the security scheme.

        self._zmq_socket = self._ctx.socket(zmq.REQ)
        # Set server public key
        self._zmq_socket.set(zmq.CURVE_SERVERKEY, "AmNRencT%-oprGXs?BLp!Q2*xxWQ{sHRShO.JU#/".encode("utf-8"))
        # Set public and private keys for the client
        self._zmq_socket.set(zmq.CURVE_PUBLICKEY, "wt8[6a8eoXFRVL<l2JBbOzs(hcI%kRBIr0Do/eLC".encode("utf-8"))
        self._zmq_socket.set(zmq.CURVE_SECRETKEY, "=@e7WwVuz{*eGcnv{AL@x2hmX!z^)wP3vKsQ{S7s".encode("utf-8"))

Encryption will be disabled by default, since it is not needed for most of the demo/development. The options to configure RE Manager (as a server), HTTP Server (as a client) and qserver CLI tool (as a client) will be implemented. RE Manager will accept private (secret) key as a value of a CLI parameter or an environment variable. HTTP Server will accept public key as a value of an environment variable. qserver CLI tool will accept public key as a value of CLI parameter or environment variable.

priority vs FIFO queue

Currently the queue is implemented as a FIFO queue (as a dequeue in memory of the server). We should consider moving to a priority queue instead. This could allow better collaboration between multiple humans and autonomous agents who are independently adding things to the queue.

control available plans / devices by user

Only show the user plans and devices that they are allowed to see / use.

Relies on #22 to know who the user is and #3 to re-launch the worker is a more (or less) locked down state.

Extra testing requirements

Should these come in through requirements-dev.txt instead of being separately installed in the workflow?

bluesky-queueserver/.github/workflows/testing.yml

Lines 25 to 26 in 9105035

  pip install scikit-image 

  pip install pytest

Excel upload

Task for 5LS demo.

happi, but for plans

Similar to #11 we need a clean way to specify what plans are available. This may be as simple as

import importlib
module, name = plan_spec
plan = getattr(importlib.import(module), name)

but we may want to have a happi like database that would also store:

module / name
signature
comments for humans?
level of permission / danger of plan

Write documentation for ZMQ API functions

Write detailed documentation for the existing API for controlling RE Manager via ZMQ, including the description of supported methods, outgoing and returned parameters. The documentation will need to be maintained and extended as the set of API is modified and extended.

integrate with admin / user database

The server should ensure that every run has the correct user / group / SAF / ... attached to it.

This should not be user-editable (but should be BL staff editable).

API that expose the list of runs generated by currently executed plan

Implement RE Manager API that expose the list of runs generated by the currently executed plan. The run information should include Run UID, current status of the run that indicates if it is open or already closed and exit status for the closed run. The data should be updated in real time. Provide client with the means to detect the events when the run list is updated and minimize the number of list downloads.

provide BL staff administrative interfaces

Beamline staff need to be able to override any user request, change the active user, etc and generally enforce /proscribe business rules (xref #22).

publish logs

Based on how we pass configuration in via #3 we should publish out (verbose) logs from the worker. As we are running the plans in a worker behind a server we are losing much of our visibility in to the process which needs to be replaced by logging.

We may want additional logging above what is currently in the RE.

Launch worker from server

The API server should be able to launch / exit / restart the worker process. The worker process should not be externally configure, but should build its state from the information pushed in by the API server. Currently you have to start the plan "manually".

The information pushed in should include (but not limited to):

what plans should be available in the namespace
what devices should be available in the namespace
where to publish the documents to
where to publish logs to

Maybe should inculde:

where to pull the the "next run id" from

Should not include:

user / sample metadata (this belongs to the server and should be pushed in with every plan)

We also should use this to set up a two-way communication between the worker and the server.

Candidates for how to manage the worker process:

systemd/supervisord tasks (not quite sure how to do the information injection but I assume it is possible)
Subprocess
multiprocess.Process

The worker should remain its own process not a thread so we can restart it from the API server, and we may want to eventually run the worker on a different machine than the API server.

Upgraded packages in testing workflow

Does the default python come with setuptools installed? What about numpy? On Travis it does and so we have learned that updating both of these, as well as pip, helps avoid installation issues.

bluesky-queueserver/.github/workflows/testing.yml

Line 22 in 9105035

python -m pip install --upgrade pip

Set up automatic doc publishing

Authentication / authorization

The server will need to only respond to the authenticated and currently "active" user.

This may need to be several users from the same group.

This should be over-ridable by BL staff.

Failing test on CI 'test_fixture_db_catalog'

The test is passing locally, but fails on GitHub actions CI with the error KeyError: 'qserver_tests'. It would be useful to have explanation why both attempts to instantiate the databroker or access the catalog fail, while the test test_fixture_re_manager_cmd_2, in which the Data Broker is instantiated in a different process, reliably succeeds:

    # Try to access the catalog in 'standard' way
    from databroker import catalog

    assert list(catalog[db_catalog["catalog_name"]]) == list(db_catalog["catalog"])

    # Try to instantiated the Data Broker
    from databroker import Broker

    Broker.named(db_catalog["catalog_name"])

happi integration

As part of #3 we want to be able to inject from the server to the worker the devices and plans that are allowed. To make sure that we never try to ship a "live" ophyd object between the processes, we should use happi as the device database.

Harden the plan request -> plan + objects code

Currently we bind a dictionary of plans and a dictionary of devices and (rather naively) use that to covert a json document -> runnable plans. If things go wrong (i.e. we get a plan or object we were not expecting) we explode. We need to:

not exit the worker if there is bad input
provide some feedback to the API server (xref #7 ) if there is bad input
validate the input before we go looking for python objects
do our utmost to never exec or eval

More firendly interface for `qserver` CLI tool.

The interface of qserver CLI tool was designed exclusively for testing RE Manager API and therefore it is not very intuitive. Also it was discovered that an option of submitting instructions to the plan queue (#93) could not be easily added without modifying the existing set of options. Idea to make CLI more friendly was discussed before and now may be a good time to work on it.

Below are the examples of the updated CLI commands. The proposed set of options is still covering all ZMQ API of RE Manager.

qserver -h
qserver monitor

qserver ping
qserver status

qserver environment open
qserver environment close
qserver environment destroy

qserver allowed plans
qserver allowed devices

qserver queue add plan '<plan-params>'
qserver queue add instruction '<instruction-params>'
qserver queue add plan front '<plan-params>'
qserver queue add plan back '<plan-params>'
qserver queue add plan 2 '<plan-params>'
qserver queue add plan -1 '<plan-params>'
qserver queue add plan before '<uid>' '<plan-params>' 
qserver queue add plan after '<uid>' '<plan-params>' 
(same options are supported for instructions)

qserver queue get
qserver queue clear

qserver queue item get
qserver queue item get back
qserver queue item get front
qserver queue item get 2
qserver queue item get '<uid>'

qserver queue item remove
qserver queue item remove back
qserver queue item remove front
qserver queue item remove 2
qserver queue item remove '<uid>'

qserver queue item move 2 5
qserver queue item move back front
qserver queue item move front -2
qserver queue item move '<uid-src>' 5
qserver queue item move 2 before '<uid-dest>'
qserver queue item move 2 after '<uid-dest>'
qserver queue item move '<uid-src>' before '<uid-dest>'

qserver queue start
qserver queue stop
qserver queue stop cancel

qserver re pause
qserver re pause deferred
qserver re pause immediate
qserver re resume
qserver re stop
qserver re abort
qserver re halt

qserver history get
qserver history clear

qserver manager stop
qserver manager stop safe on
qserver manager stop safe off

qserver manager kill test

ability to inject plans / objects into the worker

If we trust the users they should be able to inject into the server new plans (and objects?) similar to how they do they do now when they are at the beamline. This needs to be done carefully as it is letting the users send us arbitrary Python code. We should consider

details of the mechanism (eval/exec, write to disk and import)
how to persist user added plans + devices
should we bother trying to validate / sanitize / sandbox (S. Dower has publicly said this is a lost cause)
if there needs to be any review steps by BL staff before the code is accepted to the system

Server logs (https server and manager) need to be quieter

Task for 5L demo.

Server logs (https server and manager) need to be quieter. Sometimes client times out because the web server is busy printing 10k lines of JSON.

thread configuration for redis host + port through start-re-server

PlanQueueOperations is configurable at init time, but there is no way to push this information through the cli tool.

More detailed log messages for RE Manager running in normal mode

The changes in PR #135 included options to change logging verbosity for RE Manager. In normal mode (started without explicitly specifying verbosity level), RE Manager displays messages with level INFO and above. Some of the messages are not sufficiently informative to monitor operation of RE Manager. For example, when a new plan is started, the log should contain some information about the plan (plan name, parameters, user name etc.) without overloading logging input with debug level information. Logging messages should be reviewed and information should be added wherever necessary.

Tests for 'start-re-manager' parameters

Unit tests are needed from start-re-manager parameters.

Option to submit metadata with a plan

Extend queue_item_add API to include optional meta key used to pass a dictionary or a list of dictionaries with plan metadata to Run Engine. If meta key contains a list of dictionaries, the list is merged into a single dictionary so that contents of a dictionary with the smallest index have higher priority (will overwrite key values from dictionaries with higher index). The option to keep metadata obtained from different sources in separate dictionaries may be convenient for some workflows. For example in HTTP server we may want to separate metadata into editable (entered by user) and non-editable (autogenerated) parts and keep the ability to edit the editable part and generate the autogenerated part, which could be easily achieved if they are kept in separate dictionaries.

Integrate with sample database

The server needs to be able to pull a list of samples (and their information from the data base) and present the user with a way to attach the information to the run.

Improve consistency of 0MQ API

This proposal contains breaking change of the 0MQ API.

Current 0MQ API of RE Manager contain a number of inconsistencies related to representation of queue items. The differences in representation of queue items accumulated over time as functionality of the Queue Server was extended from 'plans' to generalized 'items' and have no justification. The inconsistencies don't limit functionality of the Queue Server, but can make application development more complicated. The proposed changes will require very limited changes in RE Manager code, significant changes in unit tests. The existing applications communicating with RE Manager will also need to be changed, but those are going to be superficial changes (mostly renaming of parameters).

Following is the example of currently accepted representation of a plan and an instructions as queue items (the format of the items as they are added to the queue and returned by 0MQ API functions queue_item_get, queue_item_remove and queue_item_move):

# Current representation of a plan as a queue item
{
  "name": "count",  # Required
  "args": [["det1", "det1"]],  # Optional arguments (list)
  "kwargs": {"num": 5, "delay": 1},  # Optional kwargs (dict)
  "meta": {},  # Optional dict or list(dict)
  "item_type": "plan",  # Currently set by the server when the item is added to the queue
  "item_uid": "...",  # Set by the server
  "user": "User Name",  # Name of the user who added the plan, set by the server
  "user_group: "Group",   # User group to which the user belongs
}
# Current representation of an instruction as a queue item
{
  "action": "queue_stop",  # Required
  "args": [],  # Optional arguments (list)
  "kwargs": {},  # Optional kwargs (dict)
  "meta": {},  # Optional dict or list(dict) - not used for instructions
  "item_type": "instruction",  # Currently set by the server when the item is added to the queue
  "item_uid": "...",  # Set by the server
  "user": "User Name",  # Name of the user who added the plan, set by the server
  "user_group: "Group",   # User group to which the user belongs
}

The representation of plans and instructions is almost identical except that instruction 'name' is represented as 'action'. Change 1: replace key 'action' with 'name' (an instruction can be distinguished from a plan by looking at 'item_type' key). Using common schema for representation of all existing items will simplify processing of lists of items on all stages of development and simplify extension of API to different types of items if needed.

# Item representation using common schema. Optional arguments may be omitted
#   when item is submitted in the request (`queue_item_add` or `queue_item_update`).
#   If an items is set by the server, it will be overwritten by the server even if it is
#   set in the request.
{
  "item_type": "plan",  # Required (should be 'plan' or 'instruction')
  "name": "count",  # Required
  "args": [["det1", "det1"]],  # Optional arguments (list)
  "kwargs": {"num": 5, "delay": 1},  # Optional kwargs (dict)
  "meta": {},  # Optional dict or list(dict)
  "item_uid": "...",  # Optional, set by the server
  "user": "User Name",  # Optional, set by the server
  "user_group: "Group",   # Optional, set by the server
}

In current implementation, the input and output parameters of API functions queue_item_add, queue_item_update and queue_item_add_batch are using different way to represent items (purely for historic reason):

# Current implementation: example plan representation for `queue_item_add` function
plan_payload = {
    "plan": { 
        "name": "<plan-name>",
        "args": [...],  # Optional args
        "kwargs": {...},  # Optional kwargs
        "meta": {},  # Optional dict or list(dict)
    },
    "user": <user_name>,
    "user_group": <user_group>,
}
# Current implementation: example instruction representation for `queue_item_add` function
instruction_payload = {
    "instruction": { 
        "action": "<instruction-name>";
        "args": [...];  # Optional args
        "kwargs": {...};  # Optional kwargs
        "meta": {},  # Optional dict or list(dict)
    },
    "user": <user_name>,
    "user_group": <user_group>,
}

Change 2: functions queue_item_add, queue_item_update and queue_item_add_batch should accept and return items represented using standard schema (set 'item_type' in request, replace 'action' with 'name' for instructions):

# Proposed implementation: example plan representation for `queue_item_add` function
plan_payload = {
    "item": { 
        "item_type": "plan",  # Now it is required (currently it is set by the server)
        "name": "<plan-name>",  # Required
        "args": [...],  # Optional args
        "kwargs": {...},  # Optional kwargs
        "meta": {},  # Optional dict or list(dict)
        # Optionally 'item_uid', 'user', 'user_group' may be submitted, but the values 
        #    will be replaced by the server.
    },
    "user": <user_name>,
    "user_group": <user_group>,
}
# Proposed implementation: example instruction representation for `queue_item_add` function
instruction_payload = {
    "item": { 
        "item_type": "instruction",  # Now it is required (currently it is set by the server)
        "name": "<instruction-name>";  # Required
        "args": [...];  # Optional args
        "kwargs": {...};  # Optional kwargs
        "meta": {},  # Optional dict or list(dict)
        # Optionally 'item_uid', 'user', 'user_group' may be submitted, but the values 
        #    will be replaced by the server.
    },
    "user": <user_name>,
    "user_group": <user_group>,
}

Change 3: rename return parameters in queue_item_add and queue_item_update. In the current implementation, 'plan' is used as a return parameter name if plan is added and 'instruction' is used if an instruction is added. This is consistent with current implementation of input parameters. It is proposed change the parameter name from 'plan'/'instruction' to 'item' for all types of items.

Change 4: input/output parameters for queue_items_add_batch. Change the representation of the list of items from

[ { "plan": { ... plan1 ... },
  { "plan": { ... plan 2 ... }, 
  { "instruction": { ... instruction 1 ... } ]

[ { ... plan1 ... },
  { ... plan2 ... },
  { ... instruction1 ... } ]

The change is possible, since in the new standard each item is expected to contain item_type parameter which tells if the item is a plan or instruction. The structure of output parameters should also be changed:

# Current structure of output parameters of 'queue_items_add_batch'
{ "success": True,  # Success of batch processing (tells if the batch was accepted)
  "msg": "",   # Error message for the whole batch
  "item_list": [ 
       { "plan": { ... plan1 ...},
         "success": True,  # Tells if the plan was validated successfully
         "msg": "",  # Validation error message for the plan
       },
       { "instruction": { ... instruction1 ...},
         "success": True,  # Tells if the instruction was validated successfully
         "msg": "",  # Validation error message for the instruction
       },
   ]}

# Proposed structure of output parameters of 'queue_items_add_batch'
{ "success": True,  # Success of batch processing (tells if the batch was accepted)
  "msg": "",   # Error message for the whole batch
  "items": [   # Regular list of items that could be handled as any other list of items
       { ... plan1 ...},
       { ... instruction1 ...},
  ]
  "results": [  # Item validation results (list of the same size as "items")
       {"success": True, "msg": ""},
       {"success": True, "msg": ""},
  ]

In the new set of returned parameters, the item parameters are put in the separate list from validation results. The item list can now be treated as any other item list and the item that failed validation could be easily found by using item index. The only downside of this approach is that the results could be difficult to visually interpret for long lists if the returned results are printed during debugging, but this is not the major concern, since logging output may always be generated in the convenient form.

Change 5: rename return parameter of queue_get request from queue to items for consistency with other API functions.

Change 6: rename return parameter of history_get request from history to items for consistency with other API functions.

select communication channel between the worker and the API server

Currently the worker plan polls the API server for the next plan (and sleeps for some period of time if there is no work) and there is no feedback from the worker to API server that anything is happening.

Some of this will be handled by #5 or by the API server subscribing to the document firehose, but we probably want an additional bi-directional and lower volume communication channel between the worker and the API server to carry information such as:

the next plan to run
interruptions / cancellations
heart beat
currently running a plan / idle
what plans / devices are or should be available
last finished plan (?)

This is not a priority for the MVP1

queue_clear

The docs says:
qserver -c queue-clear

and also:
qserver -c history_clear

should queue-clear be queue_clear?

Remove vestigial Travis-CI artifacts.

This project has already been migrated to GH Actions, but there is some cleanup left to do:

Delete _.travis.yml
Update badges in README.

There might be more; those are just the ones I notice right away.

	file_pattern = os.path.join(path, "[0-9][0-9]*.py")
	file_list = glob.glob(file_pattern)
	file_list.sort() # Sort in alphabetical order

bluesky / bluesky-queueserver Goto Github PK

bluesky-queueserver's Introduction

Bluesky — An Experiment Specification & Orchestration Engine

bluesky-queueserver's People

Contributors

Stargazers

Watchers

Forkers

bluesky-queueserver's Issues

Recommend Projects

Recommend Topics

Recommend Org