ldmx-software / framework Goto Github PK

Event-by-event processing framework using CERN's ROOT and C++17

CMake 1.50% C++ 94.94% Python 3.47% C 0.10%

framework's Introduction

⚠️ Archived ⚠️

This repository is now archived. v1.4.2 was merged into LDMX-Software/ldmx-sw as a subdirectory where development continues.

Framework

A event-by-event processing Framework using CERN's ROOT and C++17.

The core idea of this Framework is the assumption that our data (simulated or real) can be grouped into "events" that we can assume are independent (for the software, not necessarily true in real life). Each event is given to a sequence of "processors" that can look at the current data in the event and potentially produce more data to put into the event. The object that carries the event data between processors in the sequence is known as an "event bus". The objects on the "event bus" are "passengers" that each carry something for the processors to look at.

The behavior of this Framework is dynamically configured at run-time by running an input python script and then translating python objects into their C++ counter-parts. This configuration style is extermely flexible and allows both C++ and Python to do what they do best.

Besides this core functionality of processors looking at data event-by-event, there are additional helpers that allow processors to log through boost logging and store data in a less-hierarchical format (Ntuples).

framework's People

Contributors

Stargazers

Watchers

Forkers

sophiemiddleton

framework's Issues

Crash when processing a small number of events when passing a large list of inputFiles

I've noticed that when running ldmx-sw in run mode, if one passes a small number of events but a large list of input files, this crash will be seen:
#17 0x00007fd879ab58d2 in framework::Bus::everybodyOff (this=0x7ffc6e3eda58) at /Users/pbutti/sw/ldmx-sw/Framework/include/Framework/Bus.h:139 #18 0x00007fd879ab52e5 in framework::Event::onEndOfFile (this=0x7ffc6e3ed960) at /Users/pbutti/sw/ldmx-sw/Framework/src/Framework/Event.cxx:145 #19 0x00007fd879ae8938 in framework::Process::run (this=0x55714b991150) at /Users/pbutti/sw/ldmx-sw/Framework/src/Framework/Process.cxx:318 #20 0x000055714981d95f in main (argc=2, argv=0x7ffc6e3edde8) at /Users/pbutti/sw/ldmx-sw/Framework/src/Framework/fire.cxx:113

The crash is reproducible for example by requiring 1000 events and passing a list of 10 files each of 1000 events. ldmx-sw will keep opening-closing all the other files in the list and Bus will call clear on the passengers, which will eventually crash. I suppose the desired behaviour is to stop processing after the limit on the number of events is reached and avoid opening/closing the rest of the files in the input list.

Make RunHeader not found exception special

Right now, during an analysis run, the whole process of retrieving a new run header and giving it to the processors for beforeNewRun and onNewRun is done within a single try...catch branch. This means processors (or conditions providers) can't throw exceptions within those methods.

Small issue with a relatively simple fix. I doubt anyone else has run into this issue before.

Can't analyze 2 or more different pass names in one run

I discovered this when attempting to analyze several different signal style generators at once.

Hypothesis

This is being caused by the branch names of the input-file branches being left defined with the previous input-file's pass name (no proof).

Solution

I still haven't solved this, but I also wanted to bring this up as something that we may not want to solve.... Are we okay with analyzing more than one different passes in a single run?

If not, I can implement an exception so that the user knows why they are seeing this error.
If yes, then I can figure out how to fix this up.

Not Urgent: very niche use case of the Framework.

Additional python debug output on missing dict

When working on LDMX-sw, it isn't uncommon to break your python configuration files. One of the most annoying to debug is the missing dict one. I think it would be good to print a repr of the PyObject* before throwing the exception (or include the repr in the exception message).

Taking inspiration from the existing getPyString, something equivalent to

static void pretty_print(PyObject* object) {
  PyObject* repr = PyObject_Repr(object);  // Get string representation
  if (repr != nullptr) {
    const char* str = PyUnicode_AsUTF8(repr);  
    if (str != nullptr) {
      std::cout << "Python object: " << str
                << std::endl;
    }
    Py_XDECREF(repr);
  }
}

should probably do the trick

Add Total Reset for NtupleManager

This will only be helpful for writing tests which may or may not use the NtupleManager in multiple tests within the same library.

Error observed in LDMX-Software/Hcal#5

Just need a NtupleManager::reset function which is called when Process::run starts.

Dummy Process for Testing

When writing a test for a event processor, it is really annoying to have to define a process and all that and pass parameters and all that nonsense in order to get to the construction and running of a processor without using the process.

We could avoid this nonsense by having a static public member of Process that returns a dummy process that has no helpful configuration and is just the correct type for passing to the event processor's constructor.

Some Exceptions cause seg-faults

Throwing exceptions from certain functions in a processor either are ignored (like in onNewRun or beforeNewRun becuase of the try-catch tree around them) or cause seg-faults (like in configure or onProcessStart). I'm not really sure why these exceptions cause seg faults, I have a suspicion that the seg-faults are being caused by the generation of the stack trace; however, I don't have time to investigate further right now.

Include some helper classes for writing tests

I have written a majority of the tests in ldmx-sw and there are a few common objects that I think we can make available for everyone. I will make these additions header-only so they don't affect the BUILD_TESTS setting.

Class to run a config and make sure it doesn't error out
Class that "provides" certain event objects for downstream processors to use
Class that "checks" certain event objects created by upstream producers

~~I would also like to run any python scripts in the test directory of a module within Catch. This will unify our testing approach into one command, making it easier to maintain.~~
Add running of python scripts in the test directory to ctest is a cmake issue.

Update skim rules to enable use of an OR for triggering

It would be very nice to be able to have several trigger processors be able to chime in on whether to keep an event.

My immediate use case is for skims of overlay samples, but this will likely come in handy when we start to do trigger emulation for real.

The overall plan for the overlay samples is to produce them, store the full deal in Lund, and then run tskims to be stored at (this is the only way to fit a full 1e14 EOT sample at SLAC). Here's what I would like to do for trigger skimming: I want to run a trigger on the no-pileup collections, and one on the with-pileup collections, and keep the event if either of them fire. This is useful for trigger studies (especially, seeing what events we miss out on in a pileup environment) without having to keep all events. (With this scheme one could also consider a random/min bias trigger to keep a few extra events, but not all, for trigger studies. But that's simple enough to code up later.)

Proposed solution
Currently each processor can set a storage hint to keep or drop an event, and in the end, a majority vote is used to decide. But, there is some granularity to the storage hints: hint_shouldKeep or hint_mustKeep, which are currently not treated differently, as far as I can see: see implementation in StorageControl.cxx here. I propose taking the hint_mustKeep literally.

Isolate Framework into a Dependency

Edit: I'm updating the description since the target of this issue has evolved. I've left the original issue description below.

Now that #73 alleviates the dictionary generation annoyances, the main motivation for moving Framework out of ldmx-sw is to make it easier to do two things:

Strictly test its various behaviors
Use it in ldmx-sw-like repos (for example ldmx-analysis)

The https://github.com/LDMX-Software/fire repo achieves these goals while also doing a pretty major refactor of various components of Framework that help clean up the code and make various points of usage less error prone. I think it would be natural to adopt a lot of these refactors while avoiding the HDF5 transition since many of these refactors are helpful on their own. Below, I've listed the more major refactors I'm thinking of along with a description and motivation.

Remove separation between Producers and Analyzers: we already have the input data very well protected since fire will never edit a file in place. This makes Analyzers irrelevant for their const reference to the event bus and so they are only helpful in name. Removing the separation would help make the code easier to maintain and easier to explain.
Remove the configure function in favor of passing parameters to the constructor: Currently, we call configure immediately after construction. We can swap configure for an attach(Process&) method that cannot be overloaded (via final). This removes user access from the Process handle and emphasizes that configuration can (and should) happen before any processing occurs.
Add in templated factory: This would be used for the processors and conditions themselves but then also would be available for downstream users like SimCore.
Generally more use of libraries and namespaces: Keeps the code organized and helps improve the organization of the documentation. config for python configuration, exception for exceptions, logging for logging, factory for the factory, io for reading and writing data, version for holding the version information, etc...
Modernize python module: test it alone and make sure it is helpfully documented and organized. This is also related to #44
Generate own documentation: using doxygen as well as other tools
Refactor python imbedding: no real reason to have an object, run the script and produce the parameters tree within a function and return the parameters tree to be passed to the constructor of Process. This is also related to #44 since I could forsee a solution that binds python in such a way that we don't need to imbed it anymore. We simply run python config.py and config.py constructors the Process and processors as well as calls p.run() at some point.

Currently, the Framework module requires that it is linked to the event object dictionary libraries at compile time. This behavior allows for Framework to use ROOT I/O on the objects in the dictionary libraries.

Why bad?

This is annoying for several reasons.

Any changes to event bus objects trigger a re-build of Framework because Framework is a dependency.
Re-generation of the ROOT dictionaries during the cmake step trigger a re-build of Framework
This limits the applicability of Framework to other experiments

Solution

We "just" need to load ROOT dictionaries before attempting to use ROOT I/O. We already have a library loading mechanism that occurs during the configure step (before any I/O), so that is the first place I will look. After that, I can start de-coupling Framework compilation from ldmx-sw. In the long-ish term, this would allow us to compile Framework into the development container as "just" another dependency. This would slightly speed up the compiling of ldmx-sw and open the door to using Framework more freely in other experiment softwares.

log different event counts in event header

For EoT estimations, the total number of electrons "thrown" (i.e. events started) is important and currently, there is no storage of that number. Currently, we are basically limited to avoiding use of the maxTriesPerEvent parameter when we wish to estimate the EoT of a generated sample.

We can avoid this limitation by storing some extra integers in the run and event headers. I'm thinking of storing the total counts of number of events thrown and number events processed in the run header and keeping event indices (thrown and accepted) in the event header.

Conditions::getConditionPtr assigns a nullptr to a reference (UB)

Describe the bug

When working on LDMX-Software/ldmx-sw#1043, I ran LDMX-sw with UB-sanitizers enabled which found a case of us assigning a nullptr to a reference. This happens if you call Conditions::getConditionsPtr before the process object has set the EventHeader member (i.e. before the loop over the number of events in Process::run()).

To reproduce

Compile LDMX-sw with -fsanitize=undefined and run a configuration that uses the simulator producer.

UBSan report

Framework/src/Framework/Conditions.cxx:56:63: runtime error: reference binding to null pointer of type 'const struct EventHeader'

Details of the issue

What I found (see additional context below for how) was, in this case, this happens when the simulator tries to get the RandomNumberSeedService condition during Process::NewRun. It happens in the following way

void Simulator::onNewRun(const ldmx::RunHeader&) {
  const framework::RandomNumberSeedService& rseed =
      getCondition<framework::RandomNumberSeedService>(
          framework::RandomNumberSeedService::CONDITIONS_OBJECT_NAME);

after a couple of layers of indirection, this will translate into a call to

const ConditionsObject* Conditions::getConditionPtr(const std::string& condition_name);

which is where we get our illegal assignment

const ldmx::EventHeader& context = *(process_.getEventHeader());

The possible paths where this variable could be used are

If the condition is not cached, we will call

std::pair<const ConditionsObject*, ConditionsIOV> cond =
    copptr->second->getCondition(context);

If the conditions is cached but isn’t valid any more

// now ask for a new one
std::pair<const ConditionsObject*, ConditionsIOV> cond =
    cacheptr->second.provider->getCondition(context);

So, for the RandomNumberSeedService, this will be triggered on the first invocation and then never again since it doesn’t get invalidated.

Desired behaviour

Looking briefly at the corresponding getCondition function for RandomNumberSeedService

std::pair<const ConditionsObject*, ConditionsIOV>
RandomNumberSeedService::getCondition(const ldmx::EventHeader& context) {
  if (!initialized_) {
    if (seedMode_ == SEED_RUN) {
      masterSeed_ = context.getRun();
    }
    initialized_ = true;
  }
  return std::pair<const ConditionsObject*, ConditionsIOV>(
      this, ConditionsIOV(true, true));
}

we can see that as long as initialized_ = true=, we will never touch the invalid variable. However, this is the kind of thing that could change during development and cause some really nasty issues to deal with later. The core issue is that we are assigning the result of getEventHeader() to a reference without checking that the resulting pointer isn’t null, which won’t make sense if eventHeader_ ==nullptr. I’m not familiar enough with the details to be able to tell what you would want to do in such a situation though.

Configuration file

from LDMX.Framework import ldmxcfg
from LDMX.SimCore import generators
from LDMX.SimCore import simulator

process = ldmxcfg.Process("process")
process.maxEvents = 1
simulator = simulator.simulator("simulator")
simulator.setDetector('ldmx-det-v13')

gun = generators.gun('particle_gun')
gun.particle  = 'e-'
gun.energy    = 0.5
gun.position  = [0., 0., 0.]
gun.direction = [0., 0., 1.]

simulator.generators=[gun]

import LDMX.Hcal.HcalGeometry
import LDMX.Ecal.EcalGeometry

process.sequence = [simulator]
process.outputFiles = ['output.root']

Environment

The container used is a local container that is built from LDMX-Software/docker@13eaf9a with two additional packages added: gdb and libasan4-dbg. Not entirely sure why the ldmx config output seems to choke though

LDMX base directory: /home/einarelen/.local/ldmx/src
uname: Linux friede 5.15.10-200.fc35.x86_64 LDMX-Software/ldmx-sw#1 SMP Fri Dec 17 14:46:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
OSTYPE: linux-gnu
Bash version: 5.1.8(1)-release
Display Port:
Container Mounts: /home/einarelen/.local/ldmx/src
Docker Version: Docker version 20.10.12, build e91ed57
Docker Tag: ldmx/local:gdb
Template parsing error: template: :1:2: executing "" at <index .RepoDigests 0>: error calling index: reflect: slice index out of range
    Digest:

Additional context: GDB Session

This is the GDB session and corresponding output I ran to find out what was happening.

ldmx gdb fire
break fire.cxx:111
run test.py # some test configuration that uses a simulator
# In main.cxx
# Get the address
set $address = &(p->eventHeader_)
# Set up a read/write watchpoint
# This will break whenever the eventHeader_ variable is read or written to
awatch *(const ldmx::EventHeader**) $address
continue

First break

# Hardware access (read/write) watchpoint 2: *(const ldmx::EventHeader**) $address
# Value = (const ldmx::EventHeader *) 0x0
# 0x00007fd99545629e in framework::Process::getRunNumber (this=0x61600006c380) at Framework/src/Framework/Process.cxx:346
# 346	  return (eventHeader_) ? (eventHeader_->getRun()) : (runForGeneration_);

OK, this checks if eventHeader_ is set. So no issue here.

continue
# Hardware access (read/write) watchpoint 2: *(const ldmx::EventHeader**) $address

# Value = (const ldmx::EventHeader *) 0x0
# 0x00007fd995342489 in framework::Process::getEventHeader (this=0x61600006c380) at Framework/include/Framework/Process.h:67
# 67	  const ldmx::EventHeader *getEventHeader() const { return eventHeader_; }

This is the read of the nullptr that causes the issue later, which we can see with a backtrace

backtrace
#0  0x00007fd995342489 in framework::Process::getEventHeader (this=0x61600006c380) at Framework/include/Framework/Process.h:67
LDMX-Software/ldmx-sw#1  0x00007fd995332988 in framework::Conditions::getConditionPtr (this=0x61600006c458, condition_name="RandomNumberSeedService")
    at Framework/src/Framework/Conditions.cxx:56
LDMX-Software/ldmx-sw#2  0x00007fd9883ed383 in framework::Conditions::getCondition<framework::RandomNumberSeedService> (this=0x61600006c458, condition_name="RandomNumberSeedService")
    at Framework/include/Framework/Conditions.h:84
LDMX-Software/ldmx-sw#3  0x00007fd9883eadc8 in framework::EventProcessor::getCondition<framework::RandomNumberSeedService> (this=0x6120000955c0, condition_name="RandomNumberSeedService")
    at Framework/include/Framework/EventProcessor.h:142
LDMX-Software/ldmx-sw#4  0x00007fd9883e01fd in simcore::Simulator::onNewRun (this=0x6120000955c0) at SimCore/src/SimCore/Simulator.cxx:276
LDMX-Software/ldmx-sw#5  0x00007fd995457b08 in framework::Process::newRun (this=0x61600006c380, header=...) at Framework/src/Framework/Process.cxx:386
LDMX-Software/ldmx-sw#6  0x00007fd995450376 in framework::Process::run (this=0x61600006c380) at Framework/src/Framework/Process.cxx:165
LDMX-Software/ldmx-sw#7  0x000055cb1f0607a4 in main (argc=2, argv=0x7fff10766068) at Framework/src/Framework/fire.cxx:113

We continue

continue
# Framework/src/Framework/Conditions.cxx:56:63: runtime error: reference binding to null pointer of type 'const struct EventHeader'

# Hardware access (read/write) watchpoint 2: *(const ldmx::EventHeader**) $address

# Old value = (const ldmx::EventHeader *) 0x0
# New value = (const ldmx::EventHeader *) 0x7fff10765870
# framework::Process::run (this=0x61600006c380) at Framework/src/Framework/Process.cxx:177
# 177	      numTries++;

Ok, the pointer will be set from here on out so any future reads should be good (assuming that nobody decides to set it to nullptr later).

Clean Up Event class

The Event bus is taped together rather weakly and we should clean it up.

The following line is the main offender, in order for the passengers map to figure out the type of the object, we have to copy the object fully from the TTree to the map.

passengers_[branchName] = *((T *)(branch->GetObject())); //this will fail if wrong type is passed

We don't know how to fix up this copy right now, but we can start by hiding it within a Handle templated class that holds the event bus passenger in it. This handle could be a replacement for std::variant that would hold the class as well as interface with the TTree in a more helpful way.

Enable the use of a dd4hep detector within a processor

Collection existence check fails if one collection name is a substring in another

Sometimes when I develop things I name collections with different suffixes to separate them. I have found that if I name them in the style myCollection and myCollectionTwo, then only the second collection is found when I use event.exists(inputCol_, inputPassName_).

I think this is because 1. the exists() method returns true only if there is exactly one collection found in by searchProducts() and 2. searchProducts() seems to be content if it finds a match, and doesn't care if it's a whole string match or substring.

I suppose this boils down to the behaviour of regcomp. Not sure how we want to solve this. Full word match would be my preference. The logic of 1. is probably sound but it could be helpful to spit out a message if there is more than one match. I know this already happens if there are several passes and no pass name was specified.

To reproduce, just run two instances of a producer of your choice with the above naming scheme, followed by two producers/analyzers using the produced collections. My example is

check out the TrigScint/lk-testbeamDev branch
run ldmx fire ${LDMX_BASE}/ldmx-sw/TrigScint/exampleConfigs/runTSsimAndReco.py
and then ldmx fire ${LDMX_BASE}/ldmx-sw/TrigScint/exampleConfigs/runClusterAna.py [output file] "sim" after enabling both the 2-hit and 3-hit cluster analysis.

Parameter data dump doesn't work anymore because of new python classes

I have a fix for this that I will implement and test shortly.

Rewrite CMake file to make use of modern CMake.

Basically, get rid of the use module.

fire does not tolerate dots in config file name

In order to check for the config script, fire looks at its first argument and checks if it has a .py extension. The method for checking this extension is eager in the sense that it considers everything after the first . character the extension. This is a lame restriction and can be fixed with some simple tweaking of how the extension is found. The two places referencing the extension are below - I think the second one is the culprit.

Framework/src/Framework/fire.cxx

Lines 61 to 73 in 5ddc20e

 int ptrpy = 1; 

 for (ptrpy = 1; ptrpy < argc; ptrpy++) { 

 if (strstr(argv[ptrpy], ".py")) break; 

 } 

 if (ptrpy == argc) { 

 printUsage(); 

 std::cout << " ** No python configuration script provided (must end in " 

 "'.py'). ** " 

 << std::endl; 

 return 1; 

 }

Framework/src/Framework/ConfigurePython.cxx

Lines 222 to 226 in 5ddc20e

 std::string cmd = pythonScript; 

 if (pythonScript.rfind("/") != std::string::npos) { 

 cmd = pythonScript.substr(pythonScript.rfind("/") + 1); 

 } 

 cmd = cmd.substr(0, cmd.find(".py"));

format the code

Just putting this here so I don't forget to do it before making the release.

Overlay event number randomization doesn't work

Turns out, even if the event start number is randomized, and a random number is reported, the actual ientry_ isn't shifted.

This, together with the bug in beam electron position randomization, makes all overlay beam electron hits end up on top of each other, spatially (especially far upstream where the properly randomized physics processes haven't started making a difference).

I have figured out where this happens and have a fix.

Remove try-catch inside of Process::run

I want to do this so fire will return a non-zero exit status when there is an error. This is desirable for our automatic testing which currently does not register as a failure when there is an ldmx-internal error thrown (and caught).

This will also generally clean up the Process::run method.

LDMX Exceptions could have better stacktrace printouts

I would like to add better stacktrace printouts for LDMX Exceptions.

Re-enable LCIO support

To do this, the following changes will be made:

EventFile will become an interface
All existing code in EventFile will move to RootEvenFile
An LcioEventFile will be added to handle LCIO I/O
An EventFile factory will be used to create the required EventFile class at run time.

This should have no impact on how the code is run.

@bryngemark FYI since you are modifying EventFile for the OverlayProducer.

Allow Python upgrade

This is optional or at least a far-off goal.

While investigating why the tests were failing during the cleanup PR LDMX-Software/docker#35 in the docker repo, I discovered that upgrading from Python 3.6 to 3.8 (indirectly by moving from Ubuntu 18.04 to 20.04) meant that our tests shared a single Python kernel. This meant that there was more than one Process object being created and therefore we would throw the "non-unique Process" exception.

Potential Solutions

(in order of complexity)

Remove non-unique Process exception and just always use the latest process defined.
Update kernel-closing procedure to align with Python 3.8 and actually close the kernel during tests
Figure out how to get the tests to behave like a series of independent programs, might involve moving away from Catch2

Auto Python Bindings

@omar-moreno and I have chatted about this idea on-and-off and I am really excited about it. I've done some surface-level research and just wanted to open an issue to keep track of my notes.

Goal

The long term goal would be to get rid of ConfigurePython and fire altogether. We would instead call Process::run directly from a script run inside python after doing all the necessary configuration. i.e. instead of having a "configuration script", we would have a running script that is really similar.

# code snippet

from LDMX.Framework import Process
p = Process('run')

from LDMX.SimCore import simulator
sim = simulator.simulator('test')
sim.setDetector('ldmx-dev-v12')
# other configurations and calls to Cpp functions

# attach processors same as before? (hopefully)
p.sequence = [ sim ]

# pause to make sure config is correct
p.pause()

# actually run the processing pass
p.run()

# could do some post-processing python nonsense

Another goal that would be awesome if we can get it to work is to have a Python parent class for both the Cpp pythonizations and potentially new Python processors. i.e. Something like the following

from LDMX.Framework import Process
p = Process('run')

from LDMX.Framework import Analyzer

class MyAnalyzer(Analyzer) :
    def analyze(self, event) : 
        # event is our event bus
        # do normal python analysis nonsense

# attach python analyzer alongside Cpp processors
p.sequence = [ sim, MyAnalyzer() ]
p.run()

Both of these would be run through python instead of fire:

ldmx python3 run.py

Tools

Omar has metioned Boost.Python
I have looked into cppyy which is used by ROOT.

Both have their pros and cons from my research.

Note	Boost.Python	cppyy
Changes to C++	required	optional
CMake Interface	available	available
Inheritance Handling	available	automatic
Pre-Compiled Python Module	default	unavailable?

Long story short. It seems like Boost.Python would be the way to go if we were rebuilding from the ground-up. That way, the python module would be pre-compiled (i.e. faster) and we would have more control over its behavior. However, I am not interested in rebuilding from the ground up and therefore I am interested in using cppyy to "attach" our C++ objects to pythonic ones similar to how ROOT does it (versions > 6.18ish).

Plan

This is a drastic change to the Framework code-base, so I think this would necessarily be a long way off. Since this is also a big change in terms of user-interaction, we would need to be patient with merging anything like this in and potentially make a release separating our current method from this more pythonic one. For now, I am just collecting notes and links and maybe dipping my toe into the coding pool.

Set Timestamp and Run Number from Producer

For reading in raw data, we will need to be able to set various event and run parameters while decoding the data from raw files.

Remove dependency on Event

Currently, Framework makes use of a couple of classes defined in the ldmx-sw/Event module. These include

Event/EventDef.h
Event/RunHeader.h
Event/EventHeader.h

Some of these can be made generic enough that they can be moved to the Framework module. However, removing the dependency on the EventBusPassenger is trickier. Whatever the case, for this to compile on it's own, the dependency on these classes will need to be removed.

Repr nullptr-check can never trigger

This minor bug was caught by clang's warnings.

Framework/src/Framework/ConfigurePython.cxx

Lines 49 to 55 in ca2ee0d

 std::string repr(PyObject* obj) { 

 PyObject* py_repr = PyObject_Repr(obj); 

 if (repr == nullptr) return ""; 

 std::string str = getPyString(py_repr); 

 Py_XDECREF(py_repr); 

 return str; 

 }

We accidentally try to compare the repr function rather than the py_repr pointer, which ends up comparing with the address of the function which can never be null...

more thorough Framework testing

This is linked to #48 but it is not required. I could imagine a testing solution which clones something link framework-testbench into another directory and includes the current Framework as a symlink or copy.

I think more thorough testing would be helpful to make sure that any future features introduced do not also introduce a regression or bug in another area of Framework (most recently seen with #93 which was unintentionally introduced by #78 ). This testing of Framework would also aid in providing some examples of how to test downstream processors.

Tests

This is just off-the-top-of-my-head. Many of these tests should also be combined with other tests in the same running mode to make sure (for example) performance logging and writing a histogram file works as expected during the same run. I wrote many similar tests for the HDF5-based framework https://github.com/LDMX-Software/fire/tree/trunk/test which might be a helpful reference. Separating the C++-only tests from running a separate full module is helpful for granularity.

Unit Tests
- conditions system
- storage control
- drop keep rules
- event add/get
- config (able to run python and get parameters)
Production Mode
- a histogram file
- performance logging
- reco processor depending on earlier products
Recon Mode
- a histogram file
- performance logging
- processor that depends on products created during that run
- processor requiring products within input file
- one input file to one output file
- multiple input files to one output file
- multiple input files to multiple output files (does anyone use this?)
Exceptions
- Configuration
  - config DNE
  - python syntax or running error
  - type deduction issues
- Running
  - cleanly handle, print, and close exceptions from all callbacks
  - missing event product

enableLogging macro not fully specifying namespace

When using the enableLogging macro outside of the framework namespace, one can see

/Users/einarelen/ldmx/ldmx-sw/Framework/include/Framework/Logger.h:137:11: error: 'logging' does not name a type; did you mean 'long'?
  137 |   mutable logging::logger theLog_{logging::makeLogger(name)};
      |           ^~~~~~~
/Users/einarelen/ldmx/ldmx-sw/Framework/include/Framework/Logger.h:137:11: note: in definition of macro 'enableLogging'
  137 |   mutable logging::logger theLog_{logging::makeLogger(name)};
      |           ^~~~~~~

This can be resolved simply by updating the macro to use a fully-specified namespace.

Rudimentary performance monitoring

It would be useful for us to have some basic way of carrying out some basic performance monitoring of different processors. Ideally, you would want to see a distribution of the elapsed time for each processor for each event. The only real place to do something like this would be in the process directly. Start a clock before each processor runs and stop it once it is done.

You have to be careful when trying to interpret data like this, see e.g. Emery Berger's CPPCon plenary, especially if you try to AB compare on a busy system, but it would still be a useful tool to have.

These measurements would have to be recorded somewhere, after discussion in the software developer's meeting on May 31st we probably will go with including it as part of the event file if enabled. We talked about having this as a part of the event header. The easiest way to add that would be yet another std::map but I'm a bit hesitant about adding a linked datastructure at the core of a performance measurement tool. If you only ever worried about processing without input files, you could just use an array with elements ordered by processor order but that wouldn't work in an obvious manner when reading events. Although since this is a performance measurement and not really a part of the actual data, it might be ok to overwrite any existing measurements in the header from the input file?

Alternatively, you could make a collection for holding these measurements. I'm not entirely sure about how that would work, but it might be worth looking into.

Remove dependency on EventConstant

The only place that EventConstant is used is to set the tree name and run header collection name. These are parameters that should be set via the configuration instead of being hard coded in a class.

Post error message after stacktrace

When an exception is thrown with the exception macro and caught by the framework, we first output the error message and then a stacktrace. Most of the time, the user is more interested in the error message than the stacktrace so ideally it should be the first thing they see. Currently, it is often hidden behind a stack trace which at least has 6 or 7 frames in it. This is especially important for beginners.

Since there have been issues with the stacktrace code crashing, it will still be useful to have the error message printed before the stack trace as well. I think we can afford that overhead.

Stop supporting python2

Time to move on ...

Move DummyAnalyzer to ldmx-analysis.

Building DummyAnalyzer requires CalorimeterHit which lives in the Event module. Building the Framework shouldn't depend on simulation classes so I'm going to move this class to the ldmx-analysis repo.

	int ptrpy = 1;
	for (ptrpy = 1; ptrpy < argc; ptrpy++) {
	if (strstr(argv[ptrpy], ".py")) break;
	}

	if (ptrpy == argc) {
	printUsage();
	std::cout << " ** No python configuration script provided (must end in "
	"'.py'). ** "
	<< std::endl;
	return 1;
	}

	std::string cmd = pythonScript;
	if (pythonScript.rfind("/") != std::string::npos) {
	cmd = pythonScript.substr(pythonScript.rfind("/") + 1);
	}
	cmd = cmd.substr(0, cmd.find(".py"));

	std::string repr(PyObject* obj) {
	PyObject* py_repr = PyObject_Repr(obj);
	if (repr == nullptr) return "";
	std::string str = getPyString(py_repr);
	Py_XDECREF(py_repr);
	return str;
	}

ldmx-software / framework Goto Github PK

framework's Introduction

⚠️ Archived ⚠️

Framework

framework's People

Contributors

Stargazers

Watchers

Forkers

framework's Issues

Hypothesis

Solution

Why bad?

Solution

Describe the bug

To reproduce

UBSan report

Details of the issue

Desired behaviour

Configuration file

Environment

Additional context: GDB Session

Potential Solutions

Goal

Tools

Plan

Tests

Recommend Projects

Recommend Topics

Recommend Org