slaclab / rogue Goto Github PK
View Code? Open in Web Editor NEWSLAC Python Based Hardware Abstraction & Data Acquisition System
Home Page: https://slaclab.github.io/rogue/
License: Other
SLAC Python Based Hardware Abstraction & Data Acquisition System
Home Page: https://slaclab.github.io/rogue/
License: Other
I am looking throug failure logs for Simons Observatory's SMuRF software, and am finding that it would be really nice to have timestamps on more log messages. For instance, here is a sample of the logs from an old run:
ERROR:pyrogue.Variable.RemoteVariable.AMCc.FpgaTopLevel.AppTop.AppCore.SysgenCryo.Base[7].CryoChannels.etaI[28]:int too big to convert
Traceback (most recent call last):
File "/usr/local/src/rogue/python/pyrogue/_Variable.py", line 324, in set
self._block.set(self, value)
File "/usr/local/src/rogue/python/pyrogue/_Block.py", line 362, in set
ba = var._base.toBytes(value)
File "/usr/local/src/rogue/python/pyrogue/_Model.py", line 128, in toBytes
ba = value.to_bytes(byteCount(self.bitSize), self.endianness, signed=True)
OverflowError: int too big to convert
ERROR:pyrogue.Variable.RemoteVariable.AMCc.FpgaTopLevel.AppTop.AppCore.SysgenCryo.Base[7].CryoChannels.etaI[28]:Error setting value '-50008' to variable 'AMCc.FpgaTopLevel.AppTop.AppCore.SysgenCryo. Base[7].CryoChannels.etaI[28]' with type Int16. Exception=int too big to convert
1718215269.675174:pyrogue.epicsV3.Value: Error setting value from epics: smurf_server_s7:AMCc:FpgaTopLevel:AppTop:AppCore:SysgenCryo:Base[7]:CryoChannels:etaMagArray
/tmp/fw/rogue_MicrowaveMuxBpEthGen2_v1.1.0.zip/python/CryoDet/DspCoreLib/CryoDetCmbHcd/_SerialEtaScan.py:88: RuntimeWarning: divide by zero encountered in scalar divide
ERROR:pyrogue.PollQueue:'Memory Error for AMCc.FpgaTopLevel.AppTop.DaqMuxV2[1].DbgInputValid at address 0xb000002c Timeout waiting for register transaction message response'
ERROR:pyrogue.PollQueue:'Memory Error for AMCc.FpgaTopLevel.AmcCarrierCore.SwRssiServer[1].RetransmitCnt at address 0xa01104c Timeout waiting for register transaction message response'
ERROR:pyrogue.PollQueue:'Memory Error for AMCc.FpgaTopLevel.AppTop.DaqMuxV2[1].TriggerSwStatus at address 0xb0000004 Timeout waiting for register transaction message response'
ERROR:pyrogue.PollQueue:'Memory Error for AMCc.FpgaTopLevel.AmcCarrierCore.EM22xx.READ_TEMPERATURE_1 at address 0xd000234 Timeout waiting for register transaction message response'
ERROR:pyrogue.PollQueue:'Memory Error for AMCc.FpgaTopLevel.AmcCarrierCore.AxiVersion.UpTimeCnt at address 0x000008 Timeout waiting for register transaction message response'
ERROR:pyrogue.PollQueue:'Memory Error for AMCc.FpgaTopLevel.AppTop.AppTopJesd[0].JesdRx.RawData[5] at address 0xc0000154 Timeout waiting for register transaction message response'
There are some timestamps here, but the Timeout waiting for register transaction message response
errors have none. Can we add timestamps to all rogue log messages?
Constructor of rogue::protocols::udp::Client
and rogue::protocols::udp::Server
spawns a runThread
. Inside this thread a shared_from_this()
is called, that returns a shared pointer of the constructed object.
In the case where the object is already constructed, e.g. shared pointer is not weak it works fine.
However, if the spawned thread runThread
tries to generate a shared pointer before the object is constructed it throws std::bad_weak_ptr
.
Now, it seems that runThread
of Client
and Server
has a usleep(1000);
to overcome this problem. However, this is not always sufficient. The runThread
shall wait until the object is created, i.e. shared pointer was created out of it.
For example with use of mutex
//! Class creation
rpu::ServerPtr rpu::Server::create(uint16_t port, bool jumbo)
{
rpu::ServerPtr r = std::make_shared<rpu::Server>(port, jumbo);
r->constructed();
return (r);
}
//! Set object is constructed
void rpu::Server::constructed()
{
mutex_.unlock();
}
//! Run thread
void rpu::Server::runThread()
{
ris::BufferPtr buff;
ris::FramePtr frame;
fd_set fds;
int32_t res;
struct timeval tout;
struct sockaddr_in tmpAddr;
uint32_t tmpLen;
uint32_t avail;
udpLog_->logThreadId();
std::lock_guard<std::recursive_mutex> lk(mutex_); // wait for construction to end
// Preallocate frame
frame = ris::Pool::acceptReq(maxPayload(), false);
while (threadEn_)
{
Inside DataMap.cpp
#include <rogue/hardware/data/DataMap.h>
File DataMap.h is missing from the repo.
shared_from_this() may be called before the object and shared_ptr are fully initialized. This causes std::bad_weak_ptr exception and crashes the application.
Trace Server:
rpu::Server::runThread()
-- ris::Pool::acceptReq(maxPayload(),false)
---- allocBuffer(size,&frSize)
------ shared_from_this()
Trace Client:
rpu::Client::runThread()
-- ris::Pool::acceptReq(maxPayload(),false)
---- allocBuffer(size,&frSize)
------ shared_from_this()
Proposed fix:
auto weakPtr = weak_from_this();
auto sharedPtr = weakPtr.lock();
while (!sharedPtr) {
std::this_thread::yield();
sharedPtr = weakPtr.lock();
}
Buffers that are stored on the buffer queue (dataQ_
) are never freed:
https://github.com/slaclab/rogue/blob/main/src/rogue/interfaces/stream/Pool.cpp#L87
Proposed fix:
ris::Pool::~Pool()
{
while (!dataQ_.empty())
{
free(dataQ_.front());
dataQ_.pop();
}
}
In rpr::Controller::start
a thread is created using new, but never freed:
https://github.com/slaclab/rogue/blob/main/src/rogue/protocols/rssi/Controller.cpp#L146
Proposed fix:
Add delete thread_
to rpr::Controller::stop
or better avoid creating a thread using new
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.