Giter VIP home page Giter VIP logo

Comments (28)

thorst avatar thorst commented on June 11, 2024 1

@jonbartels Its funny my first query did use limit and order so it was more performant, but I looped over all the channels and made an individual query per channel, which is what made it so slow.

Thanks for all the help and Im hoping this will take some strain off the system. We were getting some errors occasionally in mirth around the every 15-minute cron job because of how intensive it was.

from connect.

thorst avatar thorst commented on June 11, 2024 1

I've updated the title and request body to more clearly state I'm interested in the last sent datetime, and it needs to be at the connector level.

Over the past couple of days, I've tried several solutions. We can write a performant query at the channel level, but at the connector level, even with added indexes, we can't get it to be performant.

Throwing something in $g or $gc would work potentially, but requires code to be maintained on every connector, which is less than ideal.

For now, I will probably tweak my process to run continuously, with pauses between queries, instead of running it all on a custom API I wrote, that does it all synchronously on demand. This is just too much work to do without caching. We get errors like
com.mirth.connect.donkey.util.DonkeyElement$DonkeyElementException: java.lang.NullPointerException. So I will query per channel, getting the connector level last sent, throw in a custom table, and then query off of that when a user calls my API for all the last sent times.

from connect.

odoodo avatar odoodo commented on June 11, 2024 1

Can you clarify your comment?

Sure, please find enclosed an implementation that records the time and type of the last event for each connector:

How does it work?

  • The trick is basically done via a code template that is applied to the global postprocessor.
  • After a message has been processed, the datetime and type (sent, transformed, filtered, etc.) of the event is recorded to a global map structure called CONNECTOR_MONITORING.
  • That way the last event of all connectors of every channel is recorded w/ zero configuration.
  • However, it is possible to exclude channels from monitoring by calling function excludeFromMonitoring()

How to install

  1. simply import the code templates contained in the zip archive (GitHub does not support xml attachments)
  2. Assure that the global(!) postprocessor is called. This is the case if the global postprocessor contains any code.
    Thus, if it is empty add something like var connectorMonitoring = true;

Download
RecordLastMessageEventOnConnectorLevel.zip

Additional Info
There are further code templates for activating and deactivating alarms as well as a channel that demonstrates how to sent an email if an alarm was fired and also when the connector processing is back to normal.

As I wrote above, I'll finally share the code in case of interest but I first have to find time to review it to assure everything is in a proper state and does not contain sensitive information.

from connect.

jonbartels avatar jonbartels commented on June 11, 2024

This is a good feature idea.

While you wait on the feature to be implemented, I would like to suggest a workaround - You can use dynamic SQL to query all Mirth channels at once. You want to query d_channels to get the LOCAL channel ID for each channel, then query the latest row in all of the d_m${Local_channel_id} tables. You'll want to ORDER BY id so that you hit the PK index and make it fast enough.

The dynamic SQL lets you do this in one hit. You still have a load but its all in the DB.

An example of how to do this in Postgres is https://gist.github.com/jonbartels/b961574b2043b628f1b0fd96f440179b

A reference for the DB schema is at http://mirthschema.diridium.com/

from connect.

odoodo avatar odoodo commented on June 11, 2024

I have something like that:
The last activity of every connector is automatically traced. A monitoring channel allows to define thresholds for specific connectors and message states. As soon as such a threshold is met, an alert mail is sent. Another mail is sent when the alert was disarmed (meaning the configured event occurred at the connector).

In case of interest I can share the code. However, there is not yet a fancy configuration frontend and it will take some time as it was quite some time ago when I created it & I would have to find some time to bring it in a shareable state.

from connect.

thorst avatar thorst commented on June 11, 2024

@odoodo I would be interested. The system we are using we built about 20 years ago (we've upgraded parts over the years, but still very old). Add to that, that the original developer, of course, is no longer here. We are going to rebuild it to be more modern, cohesive, and mirth specific once we are finished migrating from cloverleaf to mirth.

So we would very much be interested in seeing how someone else does it before we start our effort.

from connect.

thorst avatar thorst commented on June 11, 2024

@jonbartels very cool. I tried this in our test environment, and it was nice and quick. However, it ran over a minute and a half in production, and so I cancelled it.

Do I need to do something to the db in prod to make it more performant? (this could be why we are having other issues in production)

from connect.

jonbartels avatar jonbartels commented on June 11, 2024

I run this in PROD routinely and its fast enough and no impact for my deployments.

A challenge with reporting on MC is that the tables are lightly indexed. This helps Mirth because it is INSERT/UPDATE heavy but hurts reporting.

It could also be slow because of general DB tuning, size, I/O etc. I can't really advise on that.

Do you understand the strategy the query uses to adapt it to get your "last activity time" report? It might be fast since your use case is a lot less data to chew through.

from connect.

thorst avatar thorst commented on June 11, 2024

I commented on your gist with my code, both our codes are slow enough that I cancelled them without seeing how long they actually execute, but it was over 1 minute and a half.

from connect.

jonbartels avatar jonbartels commented on June 11, 2024

https://gist.github.com/jonbartels/b961574b2043b628f1b0fd96f440179b#gistcomment-4794607 ooh yea I see your update. Performance on that is going to be tough because PG will have to do a table scan.

Since the Primary Key is sequential and indexeed change your query to use ORDER BY id DESC LIMIT 1 the newest row will have the latest received time. There are details like threading and reprocessing where there could be "newer" activity though.

from connect.

thorst avatar thorst commented on June 11, 2024

I think It's ok to be close enough, most of our channels are pretty active. I will look to tweak that Monday, but I have a feeling it'll still be slow, given that yours never completed either.

from connect.

thorst avatar thorst commented on June 11, 2024

How many channels do you have, and message volume? is it a small box?

from connect.

jonbartels avatar jonbartels commented on June 11, 2024

I have a feeling it'll still be slow, given that yours never completed either.

And I took that personally :P :P . I like tuning DB queries and I took this as a fun challenge on Friday at 3:30PM :D

I ran your query and just got the contents of big_query:

WITH CHANNEL_QUERY AS (
    SELECT DC.LOCAL_CHANNEL_ID,
           C.NAME,
           FORMAT(
                   'SELECT
                       %L::TEXT as channel_id,
                       %L::TEXT as channel_name,
                       connector_name,
                        max(RECEIVED_DATE)::TIMESTAMP AS max_received_date,
                        max(RESPONSE_DATE)::TIMESTAMP AS max_response_date
                    FROM D_MM%s GROUP BY connector_name',
                   DC.channel_id,
                   C.NAME,
                   DC.LOCAL_CHANNEL_ID
               ) AS QUERY
    FROM D_CHANNELS DC
             INNER JOIN CHANNEL C ON C.ID = DC.CHANNEL_ID
),
     BIG_QUERY AS (
         SELECT ARRAY_TO_STRING(ARRAY_AGG(QUERY), E'\n UNION ALL \n ') || ' ORDER BY 1 ASC, 2 DESC ' AS BIGGIE
         FROM CHANNEL_QUERY
     )
SELECT
    --query_to_xml(biggie, true, true, 'jonb-ns')
*
FROM big_query;

Then for one of my largest tables on this system (approx 100k messages) I extracted the single query for that table and got the EXPLAIN plan:

SELECT
    '695c0176-fc79-4dcb-a92a-1240ceaca007'::TEXT as channel_id,
    'xxx'::TEXT as channel_name,
    connector_name,
    max(RECEIVED_DATE)::TIMESTAMP AS max_received_date,
    max(RESPONSE_DATE)::TIMESTAMP AS max_response_date
FROM D_MM12 GROUP BY connector_name;

On this box it yielded a cost of about ~6000.

Then this simplified version which orders by message_id and leverages an INDEX scan:

SELECT
    '695c0176-fc79-4dcb-a92a-1240ceaca007'::TEXT as channel_id,
    'xxx'::TEXT as channel_name,
    connector_name, id, message_id,
    RECEIVED_DATE max_received_date,
    RESPONSE_DATE max_response_date
FROM D_MM12 ORDER BY message_id DESC LIMIT 1

Has a cost of 0.46. This is a LOT more efficient because it can leverage an index scan. The most recent message_id is, by definition, the most recently received message to the channel.

Now the caveat - UNION ALL doesn't play nicely with order by unless you add some parenthesis:

WITH CHANNEL_QUERY AS (
    SELECT DC.LOCAL_CHANNEL_ID,
           C.NAME,
           FORMAT(
                   '(SELECT
                       %L::TEXT as channel_id,
                       %L::TEXT as channel_name,
                       connector_name, id,
                       RECEIVED_DATE AS max_received_date,
                       RESPONSE_DATE AS max_response_date
                    FROM D_MM%s ORDER BY message_id DESC LIMIT 1)',
                   DC.channel_id,
                   C.NAME,
                   DC.LOCAL_CHANNEL_ID
               ) AS QUERY
    FROM D_CHANNELS DC
             INNER JOIN CHANNEL C ON C.ID = DC.CHANNEL_ID
),
     BIG_QUERY AS (
         SELECT ARRAY_TO_STRING(ARRAY_AGG(QUERY), E'\n UNION ALL \n ') || ' ORDER BY 1 ASC, 2 DESC ' AS BIGGIE
         FROM CHANNEL_QUERY
     )
SELECT
    query_to_xml(biggie, true, true, 'jonb-ns')
FROM big_query;

This ran in 30ms on my test DB. Yours took 90ms. I think this will be VERY efficient even on larger PROD DBs with lots of messages or suboptimal tuning since it leverages the index on message_id

from connect.

thorst avatar thorst commented on June 11, 2024

Modified query runs faster in my prod environment as you predicted, now i just need to re-adapt it to my needs

from connect.

pacmano1 avatar pacmano1 commented on June 11, 2024

I would do all this calling the API for transportablilty between backend DBs... but anyway see #6033, let's get generic about the ask on columns.

from connect.

thorst avatar thorst commented on June 11, 2024

I would do all this calling the API for transportablilty between backend DBs... but anyway see #6033, let's get generic about the ask on columns.

I assume your talking about a future state? At the moment you cannot get the last activity quickly/easily for all channels, or am I missing it? The query we were working on is a stop gap, to improve what I currently have that is way to slow and intensive.

Yes the ideal in my mind is to come back with the dashboard stats, and the last activity be written when counts are incr.

With your ticket #6033 you would need to update the last activity in the channels code right? Or is that already a $gc var?

from connect.

pacmano1 avatar pacmano1 commented on June 11, 2024

You can get the last message ID per channel then go get that message information by the explicit message ID. It's not as fast as calling the DB directly of course. In our case we already cache a list of channels and their stats in $g for other alerts reasons, so iterating over that object and calling the rest API is a simple add.

There are other ways of course, e.g. in a post processor per channel populate a $g or $gc var with the last activity, then use those objects as the last activity timestamp. Not the greatest approach if you undeploy and deploy frequently. But a distinct advantage is that this is an in-memory construct and very fast to parse.

The feature request I linked to is an ask to make generic the ability to add columns to the dashboard to handle whatever use case you need.

from connect.

thorst avatar thorst commented on June 11, 2024

Yeah, I like the ability to add any gc/g value, the biggest annoyance is remembering to do it, and ensuring all devs follow the standard, then again, if its on the dashboard itll jump out at you quicker.

from connect.

mhokansonReliance avatar mhokansonReliance commented on June 11, 2024

I don't think that gc/g values are even available in the API are they? My understanding was that the client uses the APIs internally to surface values so before gc/g values could be displayed they'd need to be made available via API.

from connect.

mhokansonReliance avatar mhokansonReliance commented on June 11, 2024

I guess that last comment was covered in part with #6033

from connect.

pacmano1 avatar pacmano1 commented on June 11, 2024

The $gc and $g vars are avaiable because they already show up on the botton of the screen. And even the context is available since selecting a channel shows its $gc vars. So the ask is just the bubble those up on demand as dashboard columns and let ius all have at it for what we would like to put on a dashboard.

from connect.

mhokansonReliance avatar mhokansonReliance commented on June 11, 2024

Interesting. I hadn't seen $g and $gc available in any of the APIs. I'll have to do some more digging to see which endpoints are responsible for surfacing that data at the bottom of MCA.

from connect.

odoodo avatar odoodo commented on June 11, 2024

I've updated the title and request body to more clearly state I'm interested in the last sent datetime, and it needs to be at the connector level.
Throwing something in $g or $gc would work potentially, but requires code to be maintained on every connector, which is less than ideal.

From my point of view, this is the way to go. No, it is not necessary to maintain connector-specific code.

from connect.

thorst avatar thorst commented on June 11, 2024

Can you clarify your comment? I'm not sure what your saying.

What is the way to go? $gc? And if using that, how would it not be code on every connector?

from connect.

thorst avatar thorst commented on June 11, 2024

@odoodo dumb question, but can you post the lines where you call these methods. I've never gotten processing state etc.

Update: Never mind, I see it in the Channel Hook code.

from connect.

odoodo avatar odoodo commented on June 11, 2024

Yeah, no need to call anything specific. Just assure that there is anything in the global postprocessor. This automatically executes the channel hook template when a message was processed by any channel.

from connect.

thorst avatar thorst commented on June 11, 2024

Here is our code template, it takes 0.04837 milliseconds to run, so very performant:

/**
	cacheLastSent
	
	This method gets called on the global postprocessesor. Since it gets called
	after every message on every channel, we need to be very fast and error proof.
	It will save the last activity information if its a sent or transformed state.
	There are two other peices of the puzzle:
	1. A cache channel that runs at a scheduled interval. It writes data out to
	a file, and cleans up the in-memory obecject from deleted channels/connectors.
	2. A wapi api that returns the in memory object.

    No data is returned.
*/
function cacheLastSent() {
    try {
		
        // Blow away global
        // globalMap.remove('connector_lastactivity');
        // return;

        // Get the channel ID
        var channelId = message.getChannelId();

        // Iiterator for looping over this message for all connectors
        var iterator = message.getConnectorMessages().entrySet().iterator();

        while (iterator.hasNext()) {

            // Get record, and the state of the message
            var record = iterator.next();
            var connector = record.getValue();
            var processingState = connector.getStatus() + ''; // coerce to string

            // Source will always be transformed, and destination should be sent
            // No need to move further if its not one of these states
            if (processingState == "TRANSFORMED" || processingState == "SENT" || processingState == "QUEUED") {

                // Get global object, if the server just restarted itll be null
                var monitoring = globalMap.get('connector_lastactivity');

                // Check if the value is null, if so, grab it from the file
                if (monitoring === null) {
                    // Bad way to build the cache file location
                    var cacheDir = '/cl/mirthconnect/_cache';
                    var globalKey = 'connector_lastactivity';
                    var cacheFile = cacheDir + '/' + globalKey + ".txt";

                    // If file exists, read it, otherwise default to a new object
                    var file = new java.io.File(filePath);
                    var content = {};
                    if (file.exists()) {
                        content = JSON.parse(String(FileUtil.read(cacheFile)));
                    }

                    // Put that contents into global, and update the monitoring var
                    globalMap.put('connector_lastactivity', content);
                    monitoring = globalMap.get('connector_lastactivity');
                }

                // Check if this channel is already in object
                if (!monitoring.hasOwnProperty(channelId)) {

                    // Its not, so initialize, and retrieve name
                    monitoring[channelId] = {
                        name: ChannelUtil.getDeployedChannelName(channelId),
                        connectors: {}
                    };
                }

                // Get needed data
                var connectorId = record.getKey();
                var connectorName = connector.getConnectorName();
                var receivedDate = connector.getReceivedDate().getTimeInMillis();
                var sendDate = connector.getSendDate() ? connector.getSendDate().getTimeInMillis() : receivedDate;

                // Save the connector stat
                monitoring[channelId]["connectors"][connectorId] = {
                    name: connectorName,
                    sendDate: sendDate,
                    receivedDate: receivedDate
                };
            }
        }

    } catch (err) {
        logger.error('There is an error with the global postprocessor.');
        logger.error(err);
    }
}

And our postprocessor

// This script executes once after a message has been processed
// This script applies across all channels
// Responses returned from here will be stored as "Postprocessor" in the response map
// You have access to "response", if returned from the channel postprocessor
cacheLastSent();

return;

Then I have a cache/pruner channel that Im still working on, which just seeks to clean up the globalMap variable when channels or connectors are deleted, and write the json to disk so that way when we restart mirth, it doesnt need to accumulate all the times again, it starts from the cache. This runs every 5 minutes, and takes less than a second to run.

Next I have a custom api that will delete the cache and the in memory object.

And lastly, I have a custom api that returns the globalMap as json. It does transform the json to be in a nicer format for consumption.

We are currently getting a concurnecy issue, writting to the globalMap which I have a ticket in about.

ERROR 2023-12-26 19:19:56.048 [Postprocessor JavaScript Task on _FROM JUMP ADT C CR (1a6fd63d-7569-41ac-8eaa-234eb84d387e) < pool-1-thread-10800] com.mirth.connect.server.util.javascript.JavaScriptUtil: Error executing Postprocessor script from channel: 1a6fd63d-7569-41ac-8eaa-234eb84d387e
java.util.ConcurrentModificationException: null
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1605) ~[?:?]
	at java.util.HashMap$EntryIterator.next(HashMap.java:1638) ~[?:?]
	at java.util.HashMap$EntryIterator.next(HashMap.java:1636) ~[?:?]
	at java.util.HashMap.putMapEntries(HashMap.java:519) ~[?:?]
	at java.util.HashMap.putAll(HashMap.java:791) ~[?:?]
	at com.mirth.connect.donkey.model.message.Message.getMergedConnectorMessage(Message.java:155) ~[donkey-model.jar:?]
	at com.mirth.connect.server.util.javascript.JavaScriptScopeUtil.getPostprocessorScope(JavaScriptScopeUtil.java:282) ~[mirth-server.jar:?]
	at com.mirth.connect.server.util.javascript.JavaScriptUtil.executePostprocessorScripts(JavaScriptUtil.java:298) ~[mirth-server.jar:?]
	at com.mirth.connect.server.transformers.JavaScriptPostprocessor$JavaScriptPostProcessorTask.doCall(JavaScriptPostprocessor.java:162) ~[mirth-server.jar:?]
	at com.mirth.connect.server.util.javascript.JavaScriptTask.call(JavaScriptTask.java:114) ~[mirth-server.jar:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1623) ~[?:?]

So if anyone has ideas there, that would be awesome.

Note:
The above is all a work around for this feature. I still would like the feature to have the last sent/received to be part of the dashboard stats.

from connect.

thorst avatar thorst commented on June 11, 2024

One caveat with using the global post-processor is that it will not value any sent dates when the destination is set to queue always. Unfortunately, this is typically what we do. I'm assuming because once it's added to the queue a different CPU thread handles the sending of the transaction, and therefore its asynchronous. Is there any post-processor script that runs after a destination sends a message, our would we need to call a code template on every destination that is set this way, in order to log the last sent date.

from connect.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.