Giter VIP home page Giter VIP logo

ias's Introduction

ias-build-install

ias

The production version of the core of Integrated Alarm System.

Website: Integrated Alarm System

Browse the wiki pages for help.

Use gradlew build install to build all the modules and install them in $IAS_ROOT

The core is composed of the following modules (in order of compilation):

  • Tools: generation of API documentation, external libraruies, configuration files, IAS python scripts, support of scala logging and ISO 8601 timestamps
  • Cdb: RDB and JSON CDB
  • BasicTypes: IAS supported value types, operational mode, validity
  • KafkaUtils: producers, consumers and utilities for Kafka
  • Heartbeat: heartbeat generation
  • plugin: java plugin library
  • PythonPuginFeeder: python 2.x and 3.x plugin libarries
  • Converter: the converter that converts IASIOs produced by plugin in IAS valid types
  • CompElement: the ASCE computing element that runs the transfer function
  • DistributedUnit: the DASU the produce the output delegating to ASCEs
  • Supervisor: the supervisor to deploy more DASUs in the same JVM
  • WebServerSender: the web server sender that forwards IASIOs published in the BSDB to the web server via websockets
  • TransferFunctions: a collection of transfer functions
  • SinkClient: clients that gets IASIOs out of the BSDB and process them like the email sender
  • Tests/SimpleLoopTest: a module with a test
  • Extras: supporting tools like the cdbChecker

ias's People

Contributors

acaproni avatar cristobal-vildosola avatar fafilippo avatar nicosingh avatar sfehlandt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nicosingh

ias's Issues

Add support for JAVA_OPTS environment variable to pass java options

JAVA_OPTS environment variable allows the user to set java options to be passed to scala/java executables.
scala natively support JAVA_OPTS being a script that ultimately invokes java.

iasRun.py for

  • java: should check if JAVA_OPTS environment variable is defined and pass the option it contains to java
  • scala: do nothing

Note that enabling assertions, -a switch of iasRun, is done by setting the -ea java option. This should be done by setting the value in JAVA_OPTS and removed by iasRun

Allow RDB CDB reader and writer to close and release the allocated resources

Actual implementation of CdbReader and CdbWriter delegated RDB interactions to RdbUtils that offers a close() method to release the resources.

However, it is not possible to invoke close() from the reader and writer that cannot release the resource. Adding this feature is particularly important as several IAS applications reads the configuration from CDB only once at start and they could release the resources immediately after the initialization.

Implement KafkaIasiosProducer in KafkaUtils

Kafka utils ships KafkaIasiosConsumer to consume and be notified of IASValues published in the Kafka BSDB.
For simmetry it would be useful to have a KafkaIasiosProducer that allows to write a IASValue in the BSDB. Such a class would be helpful for DASUs, Supervisors and tests.

Implement the throttling for a monitor point

The plugin should inhibit the resending of a value of a monitor point if it changes too many times per second as the user could not follow on the panel.
From one side the plugin must immediately notify the user of a change of its value but from the other side too many changes per second are impossible to follow.
A proper filtering could mitigate this behaviour but the code of the plugin must be robust enough to prevent such a situation.

Simplify CDB configuration

The rate at which the DASU publishes its output when it does not change is taken for the refresh rate of the IASIO that the ASCE produces. The rates of the IASIOs produced by all other ASCEs running into the DASU are ignored but must be written in the CDB.
To simplify the editing of the CDB we could make the refresh rate of the IASIOs option and assign a refresh rate to the DASU.

A very similar problem is in the configuration of the plugin (now only possible on JSON): each moitor point value and alarm produced by the monitored sytem and sent by the plugin to the core has a assigned refresh rate.It would be probably enough to assign a derired refresh rate to the plugin and, if it is the case, allow a different refresh rate for selected monitor point values and alarms.

The ASCE must be able to catch the case of some of its input having a null value

When the computing element is built, it has a initial set of inputs whose value is null waiting to be updated by the values coming from the BSDB.

When inputs arrives to the DASU it forward them to the ASCEs that run the TF and produce the ouput. In general we can expect that the inputs does not arrive all at the same moment: in that case the TF of one ASCE can run with one or more inputs having a null (not-yet-initialized) value.

The ASCE should not run the TF if one or more of its inputs has a null value with the advantage to avoid NPE in user defined implementations of the TF.
in that case the ASCE will not produce any output either as we do not want to propagate null values inside the DASU or, even worst, into the BSDB.

The plugin should allow to dynamically change the sampling rate of a monitor point

The actual implementation of a monitored value in the plugin does no allow to change the refresh rate however, I could imagine scenarios when we need to increase or decreasing the monitoring of a monitor point depending on its behaviour or even in relation to the value of another monitor point.

The plugin should be able to autonomously change the refresh rate of the sampling of a monitor point.

Test the JSON format of the messages sent by the WebServerSender

Currently, the WebServerSender sends the serialization of the IASValue using the IASValueJsonSerializer, and hence, the WebServer will expect JSONs in that format (with those fields)

In order to ensure that further changes to the serializer do not break this contract between the IAS Core and the IAS WebServer we should implement a test in the WebServerSender module that asserts the correct format.

Review Supervisor, DASU and ASCE constructors

DASU and Supervisor constructors ignore the DAO and read it again with the passed CDB reader.

We should review and improve DASU and Supervisor constructor and uniform the ASCE constructor

The ASCE should inhibit to run the TF if its execution time is too slow for given amount of time

The ASCE monitors the execution time of the user provided transfer function marking changing its state from healthyto TFSlow if it is too slow.

If the TF is too slow for a defined time interval as defined in the TransferFunctionSetting scala object, the ASCE should change its state to TFBrokenand avoid running the TF again.
This is to avoid that a misbehaving TF affects the entire system.

The InOut and IASValue shall not update with null/empty values

The ASCE never receives values (either from plugins or the BSDB) having a null or empty value.
The InOut and IASValue data type should ensure that he value is not null and reject updating the value they contain with a null/empty new value.

After initialization the ASCE has a set of inputs and th eoutput with an empty value because the value is not yet been sent by plugins or the BSDB. After awhile we expect that all the inputs av a non-empty value and the ASCE runs the TF and produces the output. If the values from plugins do not arrive anymore, the outproduced by the TF of the ASCE is still based on the last received values: what changes is the validity to signal that some or all of the inputs are too old.

Plugin should flag the data produced by a remote system when their values are invalid

Raw data produced by a monitored software system should be filtered out if their values are not consistent.
For example if a temperature has a value of NaN or too high to be valid.

Open question: actually the validity of a monitor point or alarm produced by a remote system depends only on the time it has been produced. If it is too old the something must not be working as expected and we flag it as invalid. We should probably define another type of validity when the monitored system return values that have no meaning.

Implement the Supervisor

The supervisor is the container of the DASUs and allows to run more DASUs in the same JVM. It is a fundamental tool otherwise we need to run a JVM for each alarm produced by the IAS that would require too many resources.

The Identifier must support run-time relocation

The Identifier is composed of a unique ID, a String, a type and a parent Identifier.
The Identifier is a scala immutable class where the parent identifier is an Option to allow cases where an Item has no parent identifier, like the Supervisor or the monitored software system.

The parent of a IASIO can be either a ASCE or a composed by many Identifier if it is produced by a plugin:

  • IASIO@ASCE@DASU@SUPERVISOR
  • IASIO@CONVERTER@PLUGIN@MONITORED_SOFTWARE_SYSTEM

In the actual implementation the parent identifier is fixed, but it is not always the case. For example in the former case the ASCE could be relocated to another DASU; in the latter case the IASIO can beproduced by another PLUGIN or another CONVERTER if there is a pool of converters.

For the sake of the ASCE (and clients like the GUIs), what makes the difference is the unique ID of the IASIO (i.e. the String stored in the CDB) while the parent ID can (and possibly will) change at run-time.

The Identifier must be changed to support relocation i.e. to allow a change of the parent Identifier.

Change CDB configuration to support mutliple scenarios

The ID of the IAS is an integer automatically generated by the DB. Other DB entities do not refer to the IAS (i.e. no foreign key) because there is only one instance of the IAS running at a given time.

By changing the ID of the IAS to a user provided string and adding foreign keys to the other tables, it would be possible to have more then one configured IAS in the database. At run time, it would be possible to select which IAS configuration to run providing the IAS identifier as a java property or an environment variable. The simplest use case is to have one IAS for testing and one for production but this feature could be useful in operation to have for example different deployments.

Add a WebServerSender

We need a module that reads Alarms from the Kafka Queue and sends them through websockets to the IAS Webserver.
It should also filter non-alarms (at least in the meantime)

Extend the plugin to support complex data types

The plugin sends data to the core of the IAS as JSON string. The value to send is converted to a string by calling its toString() method.
While it works smoothly for basic types, it does not cope perfectly with user defined objects or complex data type like for example arrays of integers.

DASU processing input time bug

While running a DASU i got the following exception
KafkaSubscriber - Subscriber of DASU [DasuTemp2] got an error processing event [IASValueBase: ...]
java.lang.IllegalArgumentException: requirement failed: Invalid execution time
at scala.Predef$.require(Predef.scala:277)
at org.eso.ias.dasu.TimeScheduler.getNextRefreshTime(TimeScheduler.scala:135)

reviewing the code i found the line that throws this exception is
def getNextRefreshTime(lastExecTime: Long): Int = {
require(Option(lastExecTime).isDefined && lastExecTime>0,"Invalid execution time")
...

where lastExecTime is a time difference calculated here
val before = System.currentTimeMillis()
val newOutput = propagateIasios(notYetProcessedInputs.values.toSet)
val after = System.currentTimeMillis()
...
timeScheduler.getNextRefreshTime(after-before)

Assuming the variable is defined (System.currentTimeMillis() is pretty standard), it must be less than or equal to 0 to raise the exception. Then, the error could be that the processing is taking less than 1 millisecond, this is possible as my DASU is running only 1 ASCE with a simple threshold function, only a couple of instructions are executed.

2 possible fixes i can think about are

  1. changing the condition to >= 0.
  2. changing the measure to microseconds.

but i'm not sure if thats the answer, maybe my analysis is incorrect.

Rationalize package names

We should uniform package names.
java and scala packages of sources written by ESO should all be like org.eso.ias....; a similar pattern must be followed by packages for sources produced by Inria and CTA.
Package names for sources imported by the prototype should not container the word prototype anymore.

Operatonal mode not provided by the monitored control system

A monitor point value or alarm produced by a monitored system normally has a operational mode. At the present, the IAS foresees the following modes:

  • STARTUP
  • SHUTDOWN
  • MAINTENANCE
  • OPERATIONAL
  • UNKNOWN

The latter, UNKNOWN, is associated the creation of the internal IAS data structures and in case of a control software that does not provide a operational mode.

The ALMA Common Software (ACS) out of my knowledge is one of the control system that does not associate a operational mode to a monitor point. In that case, the operational mode of a monitored value or alarm could be provided by an external entity. In the case of ACS the maintenance of an antenna or device is reported in the dashboard and a plugin can feed the IAS with such information.

How to set the operational mode of a monitor point value when not supported by the monitored control system?

IASIO type is defined twice

The type of a IASIO is defined twice: in the CDB and in the BasicTypes module.
One type is the carbon copy of the other but for different purposes: the type defined in Cdb, IasTypeDao is to store and get the type from the database; the type defined in the BasicTypes module, IASTypes is used at run-time to dynamically recognize the type of a monitor point alarm or values.

I am just writing helper methods to convert one type into another in the IASTypes enumerated.

To avoid converting from one type to another at run-time it would be a good idea to use only the type defined in the Cdb (possibly with a better name).
Using only one data type also ensure consistency when changing or extending the supported data type.

I also suggest to reduce the number of supported datatypes by either

  • replace all floating point values with DOUBLE
  • replace all integer flavors (SHORT, BYTE, etc.) with LONG

The kafka consumer of the DASU must collect and send more IASValues in a single message to the listener

The Kafka consumer of the DASU delegates to the SimpleStringConsumer that sends to the listener one received string in this case a IASValue) at a time.
For this reason the consumer of the DASU sends one IASValue at a time to the DASU for processing while the DASU can process more events at a time being much more performant.

The SimpleStringCosumer must be modified to send to the listener all the records it received in a single kafka poll.
As a possible implementation, we suggest to keep the SimpleStringConsumer as it is because it can be useful for testing or in other context. SimpleStringConsumer can be splitted with a base class that sends all the records to a listener and SimpleStringConsumer itself that gets the string and sends them one by one to the listener.

Define color coding for alarms

Alma memo 600 suggests a possible color coding for alarms in section 5.2.2:

  • Green: normal operatonal state
  • Yellow: validated warning state
  • Red: validates non nominal state
  • Blue: unknown/invalid state
  • Plum: alarm raised not validated
  • Grey: not configured or in maintenance

Color coding shall be part of the configuration database

Update Licenses to LGPLv3

The content of the licenses files shows a version of GPL and not the LGPLv3 that indicates the name of the files.

Add a type field to the Identifier

The Identifier (defined in the BasicTypes module) is composed of a unique ID plus the identifier of the parent.
As described in the architecture document the identifier is recursive and allows to reconstruct where something run. For example if a monitor point has been produced by a plugin and which plugin or which DASU produced a IASIOs.

Apart of the name and the unique ID, each identifier in the recursive data structure shall have a defined type like PLUGIN, MONITORED_SYSTEM, DASU, ACE, CONVERTER and so on.

The complete identifier contains deployment information and cannot be used as key for example by the GUIs and web applications. GUIs display only 2 types of information:

  • IASIOs produced by plugins (i.e. converted monitor point values red from a monitored system)
  • IASIOs produced by DASUs
    Therefore the unique identifier to get IASIOs can be composed either of PLUGINID+MonitorPointValueID or DASUID+IASIO-ID. In both case the type allows to get tis information quickly.

Add the Validity to the IASValue

Actual implementation of the IASValue whose JSON representation is sent from the plugins to the core and between ACSEs and DASUs provides an operational mode but no validity.

The validity must be overridden by the ASCEs and propagated to other elements of the core. At the same time, a plugin could mark as invalid a value produced by the monitored system under certain conditions. We need then to be able to propagate the validity along IAS components and should be added to the IASIO represented by a IASValue.

Note that when a IASIO has been received its validity must be recalculated taking into account the timing. The core took too long to process or network problems between IAS components or kafka servers too slow distributing the values are example of situation that must be handled only on the receiver side.

Let the IAS components report metrics at runtime

Instead of implementing its own metric system, the IAS should adopt an existing metric reporting tool to provide statistics to monitor, at least, the most critical part of the code.
There are several tools for that. One that seems very popular is provided by
dropwizard.

The kafka SimpleStringConsumer to programmatically start getting data from beginning or end of the partition

By default a kafka consumer allows to get data starting from the position set by passing java properties.
It would be useful to allow to give the opportunity to start from the end or the beginning of the partition even from the API so that the user does not have to have any knowledge of kafka internals.

The seek must be given when the partition is assigned as shown for example in this article.

When implemented, starting from the end should be used in the Converter TestKafkaStreaming test.

Add delete capability to the CDB

The actual implementation of the CDB allows read and write/update items but does not allow to delete them.
Deletion should be added to the interfaces and implemented in RDB and file implementers.

Investigate if it is possible to get rid of InOut in favor of IASValue

ASCEs and DASUs continuosly convert IASValue data types to/from InOut.

For example the DASU receives IASValues from the BSDB and send IASValues to the ASCEs i.e. with the same format. The ASCE runs the TF and produces a InOut that will be converted back to a IASValue to be sent to the BSDB and other ASCEs running in the same DASU.

The IASValueis used by java TFs while scalka TFs uses InOut.

All such conversions can be time consuming and reduce performances. On the other hand the InOut is an enriched version of the IASValue that was intended mostly for sending values to the core components of the IAS than to perform calculations so it can be more convenient to keep both data types trying to minimize the conversions at run-time.

The operational mode set in the MonitoredValue is not sent to the IAS

The operational mode is a property of a MonitoredValue. When the filter is applied, it produces a new FilteredValue to be sent to the core of the IAS. Such a FilteredValue contains a operational mode but the filter is unaware of it so the default mode ('UNKNOWN') is always used.

The filter is supposed to do filtering of the samples of a monitor point value to avoid sending spurious values to the core of the IAS. As such the filter is independent of the operational mode.

The FilteredValue, returned applying the filter, shall not contain a operational mode. A new data type extending the FilteredValue with the operational mode is what the plugin must send to the core of the IAS.

The solution is then

  1. the implementation of the Filter returns a FilteredValue as already implemented
  2. extend the FilteredValue with a new data type providing the operational mode
  3. the MonitoredValue must not notify the listeners passing a FilteredValue but passing an object of the type defined in the previous step and the operational mode properly set

Apart of the operational mode, this solution allows to add more information before sending a value to the core of the IAS.

Let Kafka assign the partition setting the ID of the plugin as the key of the published records

Actual implementation of the plugin reports an error if the number of the kafka partition is not set or invalid (i.e. less then 0). The partition number is passed in the configuration or in a java property.

Considering that each plugin has its own unique ID and that kafka assigns records to partitions based on the key of the record, we can relax the constraint to have the partition explicitly passed in the configuration.
Kafka uses a hash function to assign records to partition based on the key so it reasonably ensures that all the records produced by the same plugin go in the same partition.

The preferred way to assign records to partitions should be the key (i.e. the ID of the plugin) leaving the chance to manually set it with a property.
Automatic partition considerably reduces the chance of a misconfiguration of the partitions

ant to install only the jars built in src

During installation, CommonAnt.xmlin the Tools module copies all the jars in lib in $IAS_ROOT/lib.
As also jars built in the testfolder are placed into the lib folder they are also installed in $IAS_ROOT and included in the class path of IAS executables.

Even if it is not a big deal, it shall be fixed: CommonAnt.xml has the name of the jar file to build in in the value of the jarName so it can install that jar instead of a blind copy.
Even with this fix, running ant install from the test folder will trigger the copy of the test jar in $IAS_ROOT. We could fix by letting CommonAnt.xml aware of the folder where it runs but I would not add such feature at the present

Improve Alarm implementation

The actual implementation of a alarm (taken from the prototype) is not fully compliant with the requirements (@see ALMA Memo 600):

  • not all alarms can be shelved: key alarms require an authorization (so in principle they are shelved as well but not by everybody)
  • an alarm can have different priorities ranging from warning to top priority (requirements do not specify how many priorities to implement but literature suggests 3, no more then 4)
  • an alarm can have up to 4 different sounds associated let operators to distinguish between new alarms
  • link to documentation

The DASU configuraton should have the ID of the produced IASIO

The configuration of the DASU, DasuDao.java in the configuration database must have the ID of the IASIO that the DASU produces.
Such IASIO is produced by one of the ASCEs running in the DASU itself and is finally published in the IASIOs queues for the other DASUs and IAS clients like the GUIs

plugin publishes too much data

While testing the plugin API with CTA simulated environment we had the impression that the plugin is publishing too many data for one single monitor point value.
It reached 3.7k in about 10 sec..

It needs to be investigated as the plugin shall not send the monitor point if its value did not change apart of the refresh rate that in out experiment was set to 2 seconds.

Simplify the way the CDB over JSON files saves the TFs

At the present the CDB over JSON file stores the transfer functions into the folder CDB/TF' with one file for each TF: the name of the file is the identifer, also replicated inside the JSON file.
For example there is CDB/TF/org.eso.ias.prototype.transfer.impls.MinMaxThresholdTF.json whose content is

{
	"className" : "org.eso.ias.prototype.transfer.impls.MinMaxThresholdTF",
	"implLang" : "SCALA"
}

This ticket is to siplify the CDB configuration adopting a solution similar to what is done for the IASIOs:

  • all TFs goes in a file named CDB/TF/tfs.json
  • the content of the file is an array of id and language (where the ID is the class to run)

The defintion of the tranfer function in the CDB is incomplete

The configuration of the ASCE accepts the transfer function as a String with the class to run.
This is not enough because the IAS allows to pass the TF as a scala or java class but needs to distinguish the 2 cases to properly convert the parameters.

A possible solution is to extend the configuration of the transfer function specifying the name of the class and the programming language. A question is if this change would help the case of other programming languages like python, if we want the IAS to support them.

Another solution is to pass to both languages the parameters in the same (java) format. This case could be less flexible for other programming languages but must be investigated.

The formers soltion, setting the programming language in the CDB, could have an impact editing the CDB if there are a lot of different customized TFs. The impact is less if the same TFs are reused in many places

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.