It would be useful for us to have some basic way of carrying out some basic performanc

Rudimentary performance monitoring about framework HOT 8 CLOSED

EinarElen commented on August 19, 2024

Rudimentary performance monitoring

from framework.

Comments (8)

latompkins commented on August 19, 2024 1

Hi Einar,

LK tipped me off to this discussion. IMO it's useful to have the performance information stored in some persistified format (either a performance ntuple which gets written out at the end of a job and can be analyzed, or as another collection in the event record). However, I also think it's extremely useful to have both a summary and the option for detailed output in the job log file. This way someone debugging or running test jobs has easy access to the information. My main experience with this comes as a user of some ATLAS tools. Here is some relatively old documentation about them. Slides , proceedings . In trying to find that reference, I found this article which has a lot of references (although it's got a slightly different focus). Anyways, I hope this is helpful input!

from framework.

tomeichlersmith commented on August 19, 2024 1

Here's an idea:

We do include the performance information within the event file if enabled, but it is kept in a separate TTree. This allows us to avoid copying around performance information when re-processing files, gives us a pretty simple interpretation of the performance information in the output file: it reflects the config that was used to generate that specific file, and it means we can define a new "meta-schema" for this performance TTree that is not restricted by the "meta-schema" already defined for the Events TTree. (This has the added benefit of de-coupling the schema evolution of the performance data from the schema-evolution of the event data.)

Performance TTree Meta-Schema

I'm imagining a pretty simple Meta-Schema where each processor in the sequence has a branch named after it where we can store the performance information. Maybe we define a new ROOT-serializable class or struct to store information we find interesting (like time stamps, run time, memory somehow) and then each branch is that object for each processor.

We could then include other branches for event-by-event, but not processor-specific data (like event processing time including all processors).

I'm unsure if restricting this TTree to be one-entry-per-event is too restrictive. I know that there is some performance data that is not event-by-event (e.g. total run time including init and de-init), but I think we could probably just have another object (perhaps a TTree) for storing that information. Then we would probably want to put this performance data into a subdirectory in the ROOT file to distinguish it from the event and run trees with actual data.

from framework.

EinarElen commented on August 19, 2024

A possible option would be to just run all the measurements in the process, make them accessible from a processor, and then you could in make a dedicated producer that handles writing the corresponding collection to the event if it is used. Otherwise, the measurements would just be discarded. That could potentially also let you do more exotic things if you wanted to

from framework.

EinarElen commented on August 19, 2024

This sounds really good. The one thing I would want is to make sure that it is possible for a processor to register additional measurements to make (thinking of simulator here but probably useful elsewhere too). Would it still be possible to make analyzers that would read the second tree?

I think starting out with just raw runtime is a good place to start, it's (relatively) straight-forward to do and try out some basic things

from framework.

tomeichlersmith commented on August 19, 2024

We could add another processor callback (e.g. logPerformance) that is only called when performance is requested. This could have a event-bus-like interface to the performance tree.

from framework.

tomeichlersmith commented on August 19, 2024

Slight modification to my idea as well, the main location for instrumenting the performance is within the Process::run function. This does not have easy handles to the output event file especially since it is accommodating the possibility of there being multiple output event files. For this reason, I think the performance data should be written to a specific directory in the histogram file p.histogramFile which is always a single file for any single run of fire and has direct handles within Process.

I also think this is somewhat more natural since the histogram file has always been "extra" information that is derived from the event data. Performance data is in some sense "extra" as well.

With this in mind, I think a good idea is to have a specific class that isolates the performance tracking logic so that Process::run doesn't get more cluttered (since it already is pretty cluttered). Then Process would simply create a PerformanceTracker if configured to do so which then has call-backs for specific points in the Process::run logic. I outline the PerformanceTracker API below since I don't want to take the time to make a compiling/running solution right now.

class PerformanceTracker {
  // has some handle to the destination for the data
  TDirectory *storage_directory_;
  // has a TTree for event-by-event perf info
  TTree *event_data_;
  // some mechanism for buffering timestamps and other "in-process" measurements
  // has some ROOT-serializable object for other info
  SomeObject run_data_;
 public:
  // create it with the destination
  // e.g. with Process::makeHistoDirectory("performance")
  PerformanceTracker(TDirectory *storage_directory);
  // destructor needs to make sure that the trees/objects are written
  // so that Process can just delete it when closing
  ~PerformanceTracker();
  /* begin list of callbacks for various points in Process::run */
  void absolute_start(); // literally first line of Process::run
  void absolute_end(); // literally last line of Process::run (only called when run compeltes without errors)
  void begin_onProcessStart(); // before onProcessStart section
  void end_onProcessStart(); // after onProcessStart section
  void begin_onProcessStart(const std::string& processor); // before processor specific onProcessStart
  void end_onProcessStart(const std::string& processor); // after processor specific onProcessStart
  // similar callbacks for the different EventProcessor callbacks
};

This is a really messy solution but I can't think of another way to make it clear what is happening. We could do some preprocessor-macro and/or lambda-function nonsense to reduce the amount of code in PerformanceTracker, but I fear that would simply make Process::run harder to understand which I want to avoid.

from framework.

EinarElen commented on August 19, 2024

I've seen messier things to deal with so I'm not sure doing some preprocessor/lambda stuff is needed. The Process::run function (at least according to me) is long yes but it is relatively straight-forward to read so I don't think your proposal here would make things much worse. If we are worried about the length of the Process::run function, I think factoring out some distinct functions from it probably would deal with most of it.

from framework.

tomeichlersmith commented on August 19, 2024

https://stackoverflow.com/a/64166

Looks like we can use some C boilerplate to get memory, CPU usage, and time at any given point. I will want to test how long it takes to actually do these measurements and see if the order matters at all before committing to all of them.

from framework.

Rudimentary performance monitoring about framework HOT 8 CLOSED

Comments (8)

Performance TTree Meta-Schema

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent