Describe the bug
When working on LDMX-Software/ldmx-sw#1043, I ran LDMX-sw with UB-sanitizers enabled which found a case of us assigning a nullptr
to a reference. This happens if you call Conditions::getConditionsPtr
before the process object has set the EventHeader member (i.e. before the loop over the number of events in Process::run()
).
To reproduce
Compile LDMX-sw with -fsanitize=undefined
and run a configuration that uses the simulator producer.
UBSan report
Framework/src/Framework/Conditions.cxx:56:63: runtime error: reference binding to null pointer of type 'const struct EventHeader'
Details of the issue
What I found (see additional context below for how) was, in this case, this happens when the simulator tries to get the RandomNumberSeedService
condition during Process::NewRun
. It happens in the following way
void Simulator::onNewRun(const ldmx::RunHeader&) {
const framework::RandomNumberSeedService& rseed =
getCondition<framework::RandomNumberSeedService>(
framework::RandomNumberSeedService::CONDITIONS_OBJECT_NAME);
after a couple of layers of indirection, this will translate into a call to
const ConditionsObject* Conditions::getConditionPtr(const std::string& condition_name);
which is where we get our illegal assignment
const ldmx::EventHeader& context = *(process_.getEventHeader());
The possible paths where this variable could be used are
So, for the RandomNumberSeedService
, this will be triggered on the first invocation and then never again since it doesn’t get invalidated.
Desired behaviour
Looking briefly at the corresponding getCondition
function for RandomNumberSeedService
std::pair<const ConditionsObject*, ConditionsIOV>
RandomNumberSeedService::getCondition(const ldmx::EventHeader& context) {
if (!initialized_) {
if (seedMode_ == SEED_RUN) {
masterSeed_ = context.getRun();
}
initialized_ = true;
}
return std::pair<const ConditionsObject*, ConditionsIOV>(
this, ConditionsIOV(true, true));
}
we can see that as long as initialized_ =
true=, we will never touch the invalid variable. However, this is the kind of thing that could change during development and cause some really nasty issues to deal with later. The core issue is that we are assigning the result of getEventHeader()
to a reference without checking that the resulting pointer isn’t null, which won’t make sense if eventHeader_ ==nullptr
. I’m not familiar enough with the details to be able to tell what you would want to do in such a situation though.
Configuration file
from LDMX.Framework import ldmxcfg
from LDMX.SimCore import generators
from LDMX.SimCore import simulator
process = ldmxcfg.Process("process")
process.maxEvents = 1
simulator = simulator.simulator("simulator")
simulator.setDetector('ldmx-det-v13')
gun = generators.gun('particle_gun')
gun.particle = 'e-'
gun.energy = 0.5
gun.position = [0., 0., 0.]
gun.direction = [0., 0., 1.]
simulator.generators=[gun]
import LDMX.Hcal.HcalGeometry
import LDMX.Ecal.EcalGeometry
process.sequence = [simulator]
process.outputFiles = ['output.root']
Environment
The container used is a local container that is built from LDMX-Software/docker@13eaf9a with two additional packages added: gdb
and libasan4-dbg
. Not entirely sure why the ldmx config
output seems to choke though
LDMX base directory: /home/einarelen/.local/ldmx/src
uname: Linux friede 5.15.10-200.fc35.x86_64 LDMX-Software/ldmx-sw#1 SMP Fri Dec 17 14:46:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
OSTYPE: linux-gnu
Bash version: 5.1.8(1)-release
Display Port:
Container Mounts: /home/einarelen/.local/ldmx/src
Docker Version: Docker version 20.10.12, build e91ed57
Docker Tag: ldmx/local:gdb
Template parsing error: template: :1:2: executing "" at <index .RepoDigests 0>: error calling index: reflect: slice index out of range
Digest:
Additional context: GDB Session
This is the GDB session and corresponding output I ran to find out what was happening.
ldmx gdb fire
break fire.cxx:111
run test.py # some test configuration that uses a simulator
# In main.cxx
# Get the address
set $address = &(p->eventHeader_)
# Set up a read/write watchpoint
# This will break whenever the eventHeader_ variable is read or written to
awatch *(const ldmx::EventHeader**) $address
continue
First break
# Hardware access (read/write) watchpoint 2: *(const ldmx::EventHeader**) $address
# Value = (const ldmx::EventHeader *) 0x0
# 0x00007fd99545629e in framework::Process::getRunNumber (this=0x61600006c380) at Framework/src/Framework/Process.cxx:346
# 346 return (eventHeader_) ? (eventHeader_->getRun()) : (runForGeneration_);
OK, this checks if eventHeader_ is set. So no issue here.
continue
# Hardware access (read/write) watchpoint 2: *(const ldmx::EventHeader**) $address
# Value = (const ldmx::EventHeader *) 0x0
# 0x00007fd995342489 in framework::Process::getEventHeader (this=0x61600006c380) at Framework/include/Framework/Process.h:67
# 67 const ldmx::EventHeader *getEventHeader() const { return eventHeader_; }
This is the read of the nullptr that causes the issue later, which we can see with a backtrace
backtrace
#0 0x00007fd995342489 in framework::Process::getEventHeader (this=0x61600006c380) at Framework/include/Framework/Process.h:67
LDMX-Software/ldmx-sw#1 0x00007fd995332988 in framework::Conditions::getConditionPtr (this=0x61600006c458, condition_name="RandomNumberSeedService")
at Framework/src/Framework/Conditions.cxx:56
LDMX-Software/ldmx-sw#2 0x00007fd9883ed383 in framework::Conditions::getCondition<framework::RandomNumberSeedService> (this=0x61600006c458, condition_name="RandomNumberSeedService")
at Framework/include/Framework/Conditions.h:84
LDMX-Software/ldmx-sw#3 0x00007fd9883eadc8 in framework::EventProcessor::getCondition<framework::RandomNumberSeedService> (this=0x6120000955c0, condition_name="RandomNumberSeedService")
at Framework/include/Framework/EventProcessor.h:142
LDMX-Software/ldmx-sw#4 0x00007fd9883e01fd in simcore::Simulator::onNewRun (this=0x6120000955c0) at SimCore/src/SimCore/Simulator.cxx:276
LDMX-Software/ldmx-sw#5 0x00007fd995457b08 in framework::Process::newRun (this=0x61600006c380, header=...) at Framework/src/Framework/Process.cxx:386
LDMX-Software/ldmx-sw#6 0x00007fd995450376 in framework::Process::run (this=0x61600006c380) at Framework/src/Framework/Process.cxx:165
LDMX-Software/ldmx-sw#7 0x000055cb1f0607a4 in main (argc=2, argv=0x7fff10766068) at Framework/src/Framework/fire.cxx:113
We continue
continue
# Framework/src/Framework/Conditions.cxx:56:63: runtime error: reference binding to null pointer of type 'const struct EventHeader'
# Hardware access (read/write) watchpoint 2: *(const ldmx::EventHeader**) $address
# Old value = (const ldmx::EventHeader *) 0x0
# New value = (const ldmx::EventHeader *) 0x7fff10765870
# framework::Process::run (this=0x61600006c380) at Framework/src/Framework/Process.cxx:177
# 177 numTries++;
Ok, the pointer will be set from here on out so any future reads should be good (assuming that nobody decides to set it to nullptr
later).