spinnakermanchester / spinnaker_tools Goto Github PK

View Code? Open in Web Editor NEW

19.0 18.0 5.0 13.5 MB

SpiNNaker API, sark, sc&mp, bmp firmware and build tools

License: Apache License 2.0

Makefile 1.94% C 79.55% Assembly 1.85% Shell 0.44% C++ 0.08% Perl 16.14%

c spinnaker build-tools sark

spinnaker_tools's Introduction

SpiNNaker Low-Level Software Tools

Automated Documentation Build

Installation and Setup

Edit the setup file so that it points to your installations of ARM and/or GNU software development tools. You only need one of these two tools installed to build standard applications.

Search for the string "EDIT HERE" to find the two edit points.

The version of the GNU tools that was used to test this release is
```
gcc 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]
```
This (and more recent versions) can be downloaded from this web site.

The version of the ARM tools that was used to test this release is
```
ARM C/C++ Compiler, 4.1 [Build 894]
```
Source the setup file in the directory in which it lives
```
source ./setup
```
Build SARK and spin1_api for ARM or GNU tools (you will probably only want to do one of these)
```
make GNU=0	        # If you are using ARM tools
make        		# If you are using GNU tools
```

Use of the Tools

There is a generic make file which works for both C and C++ (make/app.make) which is used to build simple applications. The preferred way to use this is to create a Makefile which looks something like:
```
# A single-file C app called 'example.c' or
# a single-file C++ app called 'example.cpp'
APP := example
include $(SPINN_DIRS)/make/app.make
```
The following variables may be set at the top of your Makefile, in environment variables or given as arguments to 'make':

APP - to select the default source file to build (required)

GNU - to choose ARM or GNU tools (GNU=1)

THUMB - to choose to generate ARM or Thumb code (THUMB=0)

DEBUG - to include debugging info in the ELF files (DEBUG=0)

API - if you want link with "spin1_api" or not (API=1)

GP - if your GNU binaries don't have names like "arm-none-eabi-gcc" (GP=arm-none-eabi)
Go into the "apps/hello" directory to build your first SpiNNaker app.
```
cd apps/hello
make             # GNU tools
make GNU=0       # ARM tools
```
This should make hello.aplx, which is an executable file that you can load into SpiNNaker
Now start tubotron on your workstation to get output from the program as it runs
```
tubotron &
```
Note that if you see this error message when you start tubotron
```
failed to create socket: Address already in use
```
it is usually because there is already a tubotron (or tubogrid) running and it needs exclusive use of a UDP port.
Now start ybug to bootstrap your SpiNNaker system and load and run the hello application. You'll need to know the IP address of (one of) your boards to do this (here we'll use 192.168.240.253 as an example). If you aren't sure what state your board is in you should reset it by pressing the reset button.
```
ybug 192.168.240.253
```
You should see a start-up message and then a prompt. At the prompt type the following.
```
boot
app_load hello.aplx . 1 16
ps
```
This loads the code into core 1 on a single SpiNNaker chip and assigns it an application ID of 16. You should see a "Hello world" message in the Tubotron window. The ps command displays the status of every core on the SpiNNaker chip.

The hello application also sends output to an internal I/O buffer on the chip. You can see this by using the iobuf command and telling it which core's buffer to display
```
iobuf 1
```
Finally, to clear the I/O buffer and stop the application you can use the "app_stop" command to remove all applications with a given ID and any resources they have used. Exit "ybug" with "quit"
```
app_stop 16
ps
quit
```
Note: If you have a SpiNN-1, SpiNN-2 or SpiNN-3 board and want all the LEDs on those boards to function (not just the green LED0), then one of the following alternative boot commands may be used:
```
boot scamp.boot spin1.conf
boot scamp.boot spin2.conf
boot scamp.boot spin3.conf
```
Now build a more complex application (called "simple"!) which runs on 4 chips
```
cd ../simple
make
```
Start ybug again to load and run the code. There should be no need to run the boot command again. The command to load a program to multiple chips is app_load and it has to be told which chips (all for all of them), which core (just 1 again) and an application ID (17 this time)

Because this application runs on multiple cores, the startup of the cores needs to be synchronised. When a core reaches its synchronisation point (or barrier) it enters a wait state known as SYNC0. To allow the cores to proceed beyond the barrier a "signal" has to be sent to all cores to cause them to proceed. The app_sig command is used to do this, sending the signal sync0. In this example, the cores reach their barrier very quickly and so it is OK to send the signal immediately after the app_load. In other cases, a delay may be necessary.
```
ybug 192.168.240.253
app_load simple.aplx all 1 17
ps
app_sig all 17 sync0
ps
```
This program runs for around 10 seconds, flashing red LEDs as it goes. (Your board may not have red LEDs). Output from this example goes to the internal I/O buffer and you can view it with the iobuf command. You can clean up with app_stop again and quit ybug.
```
iobuf 1
app_stop 17
ps
quit
```

Additional Resources

There are some more examples in the apps directory. Each of these has a xxx.ybug file which contains the ybug commands to run the application. It also contains, as comments, the commands you need to use to build the application and start any visualisation programs that the application needs. These examples include

hello - Hello World (as above)

simple - a contrived demo of the SpiNNaker API

data_abort - causes a data abort to demonstrate debugging

interrupt - demonstrates a bare-metal interrupt handler

ring - shows how to set up routing tables for core-to-core comms

random - random number generation and simple graphics

pt_demo - a path tracing program - quite complex - see its own README

heat_demo - a heat diffusion example (precompiled Linux visualiser)

gdb_test - shows how to use the GDB debugger with SpiNNaker. See the gdb-spin document in the docs directory for instructions.
There is documentation for SARK, ybug, gdb-spin and the Spin1 API in the docs directory.
SpiNNaker systems mostly communicate using UDP/IP with port numbers in the range 17890-17899. Specifically, 17892 is used for Tubotron and Tubogrid, 17893 is used by SpiNNaker (e.g., by ybug) and 17894 is often used by visualisers. If you have a firewall blocking any of these ports, you may encounter problems and it's worth checking the firewall before blaming anything else!

Similarly, you may only have one visualiser (such as Tubotron) on each port at any one time. So you may see a message to the effect that the port is in use if you try to start a visualiser when one is already running and using the same port.
Some visualisers (tubotron/tubogrid) use the Perl-Tk library which may not be installed by default on your machine. This can be installed as follows
```
sudo apt-get install perl-tk		# Ubuntu, etc
sudo yum     install perl-Tk		# Fedora, etc
```

spinnaker_tools's People

Contributors

Stargazers

Watchers

Forkers

lucazanatta larissafrog huzhoudaxia dieghernan sakuyui

spinnaker_tools's Issues

allow getting of router table entries as a block that can be dumped into sdram for future reading back

title says it all. But to add more detail:

talked to luis about this within the context of the data in speed up stuff. This requires to read the entire router table and store it in sdram without needing to change anything. At a later point, it needs to reload the table back into the router.

As there's no processing of the data from the speed up view point, it would be better to not have to read each entry and store manually. This said, there are issues with fragmentation which makes this a harder problem to solve, but leaving this here as a note for the future.

Add router diagnostic for "lost" local-sourced packets

One of the things that doesn't currently get counted in the router is packets that are sent by a local core but without routing entries. As these are not counted, it is possible for packets to simply "go missing"; default routed external packets will at least show up in the external multicast packets as I understand it. It could be useful to add to the default counters set up by SC&MP.

I believe that the settings of the counter should be "Local packet", "default routed" and "Multicast" (this seems to work when I tested it) - the hex value is 0x1ff75f1.

If it is considered an application issue, we can add it to our application instead, but it seemed like a general application diagnostic issue.

SCAMP: Don't throw away packets on send

When a core sends an SDP packet, it copies it to the SDP system RAM buffer and then notifies SCAMP. When SCAMP processes the notification, it first tries to allocate a local SDP buffer; if this fails, it throws away the SDP packet, but it still tell the core that it has processed it. It would be better to leave the message box busy; the core can then time-out waiting for the message to be sent, and choose itself whether it should be thrown away or retried.

Turn on warnings and fix

We noticed that warnings are not enabled on the C code, and that enabling them results in some interesting things that might need to be fixed.

SCAMP: SDP messages are dropped silently if no resources available

When an SDP message arrives, scamp needs an 'sdp_msg_t' mailbox to hold it and an 'event_t' event holder to schedule the message for processing. There is a limited number of mailboxes and event holders, therefore a new SDP message may not find one or the other available.

Currently, scamp simply drops the SDP message silently:

unavailable mailboxes are dealt with in 'scamp-3.c' lines [322-328].
unavailable event holders are dealt with in 'scamp-3.c' lines [236-239].

It would be better to return an error code, such as 'RC_BUF'. The problem, obviously, is that there is no mailbox or event holder to use for the return message.

Remove code that can not be released under Apache 2.0

We plan to do the next release (soon we hope) under the Apache 2.0 license
See https://www.apache.org/licenses/LICENSE-2.0

The Apache 2.0 license is more permissive then the current GPL version 3 license.

Please remove any of your contributions you do not agree can be released under Apache 2.0

Or at least add a comment on this issue.

Otherwise we will assume agreement of all contributors.

SCAMP: neighbouring chip link checks need a better approach

To probe the state of its chip-to-chip links, scamp reads system controller register SC_CHIP_ID of every neighbouring chip [function init_link_en(), scamp-3.c]. This allows it to check if the neighbour is alive and if the link is behaving correctly. The returned value is ignored, only the correct completion of the read operation is checked.

if the neighbour is alive and well then the read succeeds and the link is enabled.
if the neighbour is truly dead then the read fails and the link is disabled.
if the neighbour is alive but blacklisted, then the read may succeed, leaving the link enabled.

The last case can be avoided by blacklisting both ends of the link. This is possible only if the two chips are on the same board. If the two chips are on different boards then an inconsistent link state can only be avoided with a better link check.

This issue complements issues:

#143
SpiNNakerManchester/SpiNNaker_hardware_tests#2

BMP: FPGA occasionally configured with wrong bit file

Due to a problem in the FPGA bit-file loading mechanism, occasionally an FPGA is configured with the wrong bit file. The bmp should check that bit-file loading has completed correctly or re-try. This is easily done with the spiNNllink bit files because the bmp can read an FPGA register that identifies the FPGA ID. A more general mechanism is required if SpiNNaker users create their own FPGA bit files.

reserved multicast packet key space

Although multicast packets are an application resource, the system also uses multicast packets for certain tasks, such as barrier synchronisation. The system reserves a segment of the multicast packet key space for itself and should be documented. Application programmers should be aware of this reserved space when doing key allocation.

BMP: Serial flash reads from BMP occasionally fail

During tests, the BMP reads the blacklist from the serial flash. Occasionally, the list is read incorrectly, i.e., the serial flash returns 0xffffffff for all entries.

An immediate re-read returns the correct content.

This may be related to the serial flash being shared between the BMP and SpiNNaker.

Split bmp version number from the rest of spinnaker_tools

The bmp code is not strongly tied to the rest of spinnaker_tools and it would be too much work to re-flash all SpiNN-5 boards with a new version of the bmp code that is just the result of changes to other parts of the tools.

Even more complicated for boards that are not in Manchester.

SPIN1_API: spin1_flush_dma_queue function needed

Sort of self-evident what is wanted here: a function to dump any pending DMA transfers.

The PRAERIE I/O interface uses DMA to buffer incoming messages, but there is also a PRAERIE command: RESET, for the interface. If RESET is applied while there are pending DMAs this can create unexpected conditions, and existing TIDs become invalid. RESET should first flush the DMA queue then reinitialise the interface.

Similar functionality could be needed in any application where absolute receipt of all DMA transfers (triggered by input events) is less critical than ensuring the simulation doesn't grind to a halt

Gcc disable interrupt movement

Apparently, this bug (below) has been fixed. That should let us make the disable/re-enable interrupt code sequences be inline, which will help a lot in the sPyNNaker code.

https://bugs.launchpad.net/gcc-arm-embedded/+bug/1722849

SCAMP: Handling of data aborts

Currently, SCAMP shuts down on a data abort. Given that the most-likely cause of this exception is a user command referencing an invalid memory address, shutting down is not very helpful.

In this scenario, recording DABT-related data, abandoning the failing command and returning to the SCAMP dispatcher seems possible and relatively low cost. Any communication protocol issues with the command issuer must be addresses.

packet re-injection as a system task

Currently, application software takes care of dumped packet re-injection.

SCAMP: inter-board synchronisation needed

Clocks on different boards drift apart quickly. A mechanism is required to maintain "alignment".

BMP: blacklisting mechanism in FPGAs

Chip-to-chip links can be blacklisted for two reasons:

the link itself is misbehaving,
the chip in one end of the link is dead (or blacklisted).

Links on blacklists are always associated with a chip, which means that the two ends of a link are independent and can be treated differently. To avoid an inconsistent state, both ends of a link should be blacklisted. This is possible only if the two connected chips are on the same board, given that blacklists are local to a board and do not cross board boundaries. The broken link should be blacklisted in the board where the broken end lives but not in the other board, given that boards can move and the other board could end up connected to a board with a working link.

One way to address the issue of the two connected chips being on neighbouring boards is to disable the link in the FPGA on the board where the broken end of the blacklisted link lives, e.g., where the dead chip lives. This does not blacklist the other end but makes sure that the link is disabled by scamp in the chip on the other end during link probing. This results in a consistent disabled link state on both ends.

The mechanism to disable links is already implemented in the FPGAs (see command xreg). Support for blacklist management needs to be added to the BMP API.

This issue complements issues:

#142
SpiNNakerManchester/SpiNNaker_hardware_tests#2

bmpc command fstat should show new FPGA registers

bmpc command 'fstat' must be updated to show new FPGA registers RXEQ, TXDS and TXPE.

Name	Number	Offset	Access	Size	Description
RXEQ	7	0x1C	RW	8	rx equalization (default: 0x0A)
TXDS	8	0x20	RW	16	tx driver swing (default: 0x0066)
TXPE	9	0x24	RW	12	tx pre-emphasis (default: 0x012)

SCAMP: Split SDP heaps into incoming and outgoing

In SCAMP, the SDP packet heap is used when receiving or sending packet, and the packets are put in queues. This means that heaving incoming or outgoing traffic could stop traffic in the opposite direction. It might therefore help to have separate queues for incoming and outgoing traffic.

spinnaker_tools/setup can assign wrong SPINN_DIRS value

The following line of code:
https://github.com/SpiNNakerManchester/spinnaker_tools/blob/master/setup#L7
assigns the current directory to SPINN_DIRS. That works well when compiling from within spinnaker_tools, but...

The current directory might not be (as was the case for me) the spinnaker_tools directory when compiling multiple repositories' code with a single script. The result is SPINN_DIRS being set to the wrong location, causing:
/home/eric/Code/git/sPyNNaker/neural_modelling/make_lib/Makefile.SpiNNFrontEndCommon: No such file or directory

SOLUTION:
I got this solution from:
http://stackoverflow.com/questions/18136918/how-to-get-current-relative-directory-of-your-makefile
I tested it and it works perfectly.

Replace the line:
https://github.com/SpiNNakerManchester/spinnaker_tools/blob/master/setup#L7
with:
mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
SPINN_DIRS := $(notdir $(patsubst %/,%,$(dir $(mkfile_path))))

The solution will work regardless of the compile location.

-Eric.

Flood fill could use NN link read

Reminder for when I get back: Flood Fill could use NN link read which would potentially increase reliability through having fewer packets flying around.

BMP: bmpc commands reset and xreset do not trigger xreg

xreg is an FPGA configuration mechanism that is triggered when the FPGAs are powered-up. It should also be triggered when the FPGAs are reset.
It may be argued that xreg should not be triggered by xreset as the user is resetting the FPGAs. It should certainly be triggered by reset.

Undocumented entities

There remain quite a few things in the tools that are not documented even after #116. There's also a lot of warnings about return type of member not being documented, but that's a Doxygen bug (with the handling of static void functions) and can mostly be ignored as it will go away once we manage to switch builds to use the next version of Doxygen (which missed the Ubuntu 2020 LTS release by a few weeks); I've excluded them from the list below (check the Travis build logs for the full list).

$ make doxygen
doxygen
/home/travis/build/SpiNNakerManchester/spinnaker_tools/include/sark.h:69: warning: Member DRIFT_INT_MASK (macro definition) of file sark.h is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/include/sark.h:70: warning: Member DRIFT_FRAC_MASK (macro definition) of file sark.h is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/include/sark.h:71: warning: Member DRIFT_ONE (macro definition) of file sark.h is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/include/sark.h:72: warning: Member TIME_BETWEEN_SYNC_US (macro definition) of file sark.h is not documented.
[...]
/home/travis/build/SpiNNakerManchester/spinnaker_tools/spin1_api/spin1_isr.c:31: warning: Member user_pending (variable) of file spin1_isr.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/spin1_api/spin1_isr.c:32: warning: Member user_arg0 (variable) of file spin1_isr.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/spin1_api/spin1_isr.c:33: warning: Member user_arg1 (variable) of file spin1_isr.c is not documented.
(cd bmp; exec doxygen)
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_can.c:88: warning: Compound rx_desc is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp.h:764: warning: Member bmp_power_status_commands (enumeration) of file bmp.h is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp.h:792: warning: Member up_time (variable) of file bmp.h is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_net.c:53: warning: explicit link request to 'sdp_msg_t' could not be resolved
[...]
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_clock.c:35: warning: Member XTAL_CLK (macro definition) of file bmp_clock.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_clock.c:54: warning: Member bmp_clock_0_select (enumeration) of file bmp_clock.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_clock.c:71: warning: Member bmp_clock_1_select (enumeration) of file bmp_clock.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_clock.c:87: warning: Member bmp_clock_power_control (enumeration) of file bmp_clock.c is not documented.
[...]
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:454: warning: Member ADC_PDN (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:114: warning: Member P3_EN (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:115: warning: Member P3_INIT (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:122: warning: Member P4_EN (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:123: warning: Member P4_INIT (macro definition) of file bmp_hw.c is not documented.
[...]
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:51: warning: Member P0_EN (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:53: warning: Member P0_INIT (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:42: warning: Member gpio_bits_port_0 (enumeration) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:74: warning: Member P1_EN (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:77: warning: Member P1_INIT (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:60: warning: Member gpio_bits_port_1 (enumeration) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:102: warning: Member XINIT_B (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:104: warning: Member P2_EN (macro definition) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:84: warning: Member gpio_bits_port_2 (enumeration) of file bmp_hw.c is not documented.
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:42: warning: parameters of member gpio_bits_port_0 are not (all) documented
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:60: warning: parameters of member gpio_bits_port_1 are not (all) documented
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_hw.c:84: warning: parameters of member gpio_bits_port_2 are not (all) documented
[...]
/home/travis/build/SpiNNakerManchester/spinnaker_tools/bmp/bmp_net.c:53: warning: explicit link request to 'sdp_msg_t' could not be resolved
[...]

monitor delegation is needed if monitor is blacklisted

Currently, the whole chip is blacklisted. It should be possible to make the monitor delegate its function to a non-blacklisted core but there is no support for this.

SCAMP: multicast packet time phase implementation needed

This is a mechanism to catch errant packets. There is support in hardware and low-level software but the monitor cores need to use it. Low priority because, apparently, errant packets are not a big problem.

MAKE: optimisation flags not added automatically for apps

Found that between spinnaker_tools v 2.10 and v3.0.0 user application behaviour changes due to optimisation flags being removed from the app.make Makefile.

v2.1.0 make/app.make shows the -Os flag used for every compilation including user apps.

and now the latest code only adds this flag explicitly for sark and spin1_api - latest sark/Makefile and latest spin1_api/Makefile

My question is whether -Os should part of the common Makefile so it's included always for apps or is it up to the user to add that to their app's Makefile?

e.g.

CFLAGS += -Os

Found little documentation about this and yet it caused a ~2x speed improvement compared to without the flag!

BMP: the serial flash is re-initialised occasionally

During operation we have seem the flash re-initialise the content of the following areas:

xboot: empty
xreg: empty
blacklist: list with length = 0

This prevents spalloc_server from allocating the board because it checks that the FPGAs have been configured with appropriate bit files.

bmpc 'help reset' command does not indicate that FPGAs are also reset

It only indicates that SpiNNaker chips are reset.

BMP: concurrent accesses to a bmp can cause crash

Two concurrent accesses to the same bmp can cause it to crash and watchdog.
Sometimes the crash is so severe that it doesn't recover.
Cause has not been identified.

SCAMP: Review check for PHY ready status in Ethernet-connected chips

There is evidence that the time required by the Ethernet PHY to become ready depends on the local network setup and hardware equipment used.

We may want to review how scamp deals with PHY readiness. In particular, look at a potentially 'infinite' loop in 'scamp-3.c'.

ybug command app_sig manages regions incorrectly

For type-1 signals 'count', 'and' and 'or', the command does not transmit the (x, y) coordinates of the 'origin' of the region. This results in the origin being treated as (0, 0). Only level 0 regions are treated correctly, given that they are transmitted as part of the mask.
Type-0 signals, i.e., those transported using multicast packets, are sent to every chip in the machine and ignore the region information.

SPIN1_API: dispatcher runs with interrupts disabled too agressively

The only critical section is the access to priority queue head/tail registers, which are also accessed by the scheduler. The rest of the code can run outside the critical section.
The critical section only needs to disable 'irq' interrupts, which can start the scheduler. 'fiq' interrupts do not start the scheduler and, therefore, do not affect the priority queue registers.

SCAMP: boot image size is hard-coded in delegation-related DMA transfers

This size is hard-coded in several places (references are for branch 'master'):

function 'chk_bl_del()' in file scamp-3.c [line 1625].
function 'img_cp_exe()' in file scamp-del.c [lines 44 and 48].
function 'nn_cmd_biff()' in file scamp-nn.c [line 1124].

The current value is 0x7100, that corresponds to just over 28Kbytes.

This needs to be consistent with the size that can be obtained from the definitions in file scamp-boot.c [lines 42-44].

The suggestion is to move all constants related to boot image size to file scamp.h to guarantee consistency across all files.

make does _not_ produce disassembly files (both arm & gnu)

make failes to produce disassembly files for scamp, sark and spin1_api.

The problem sems to be related to the following line in make/Makefile.common:
$(OD) $(BUILD_DIR)$*.txt $@

Moving $(BUILD_DIR)$*.txt up to the definitions of OD for arm and gnu seems to solve the problem, but this could be the wrong solution.

BMP: non-slot 0 BMPs occasionally crash when communicating with BMP in slot 0

A BMP can occasionally crash when the BMP in slot 0 interrogates it through the CAN bus.

No proper diagnostics yet. The issue may not be related to the communication, this may be only the symptom.

The BMP watchdogs and recovers most of the time.

Documentation links broken

The documentation linked to in the README (following #2 ) has now been moved in the wiki and so the links are now broken.

Is there some way to generate permanent links to the documentation which will survive future reshuffles (especially considering wiki page names now contain the software version number and hand-written section numbering, changing either of which can break links)?

SCAMP: boot_aplx.s stack may overwrite boot image

boot_aplx.s places its stack at DTCM_TOP.
The boot image downloaded from the host is assembled in the TOP half of DTCM.
If the boot_image is close to 32 KiB in length then the boot_aplx.s stack may overwrite part of it.

The boot_aplx.s stack should be moved to the TOP of the bottom half of DTCM, as was done with the scamp stacks. The safety of this move needs verification because boot_aplx.s is also used to scatter load applications.

SPIN1_API: software transmit packet queue hazard

Function spin1_send_packet [spin1_api.c] is used to enqueue packets in tx_packet_queue. The function disables irqs, but not fiqs, while manipulating the start and end queue pointers.

There is a potential hazard in accessing these pointers if a fiq-level (-1) priority callback sends a packet. The hazard manifests when the callback interrupts spin1_send_packet and calls the same function to send its packet.

Although recommended practice is to reserve fiq-level priority for PACKET_RECEIVED events and restrict the callback to queueing the received packet, this is not enforced.

The obvious solution is to also disable fiqs in spin1_send_packet, potentially delaying PACKET_RECEIVED event processing . Needs evaluation.

BMP: 'proc_queue' gets full during power-on commands

Events scheduled on 'proc_queue' are dispatched from a while loop in main. They can be interrupted by messages arriving and commands being executed.

A 'power-on' command takes a long time to execute. During this time, many events can be scheduled. In particular, SysTick schedules 'proc_100hz' every 10 ms. In this situation, 'proc_queue' gets full and no further events can be scheduled. I've tested with a proc_queue length of 256 events and it gets full.

Delaying or missing 'proc_100hz' is usually not a big problem. 'uptime' will drift and A/D, temperature and fan measurements will be delayed, usually without consequences. Delaying or missing other events may have consequences more difficult to deal with.

This is a difficult problem to solve.

Allow multi-packet reads

An enhancement for SCAMP and SARK would be to allow multi-packet reads, which could speed up the extraction of data. This would work as follows:

The host sends a read request where the requested data is more than 256 bytes.
Instead of failing, the response is sent as a sequence of packets, each with a different sequential sequence number.
If any packets are lost, the host can simply request the data again, using the sequence numbers to work out where the holes in the data are.

This allows the whole protocol to be host-controlled and should be backwards-compatible; if a host requests 256 bytes, it will arrive as a single packet. The host can use a sliding window protocol by simply keeping the last sequential sequence number received and then re-requesting any packets after an appropriate time-out.

broken spin1_api.h

Commit 49e1eaa broke the build, as it introduced changes to spin1_api.h that the Perl preprocessor can't convert into assembler constants.

Weird buffer corruption issue: possible cause in shm_send_msg()?

Background: I'm hunting a problem that only happens rarely under intense load (during fast data loading) where buffers end up getting corrupted. The symptom is that the buffer ends up with what appears to be contents that are partially formatted as for (a part of) the inbound data stream and partially formatted as for (a part of) the outbound data stream. Another symptom is that communications subsequent to this are intensely flaky.

Possible Cause: In shm_send_msg() (in scamp/scamp-3.c), we see this:

uint shm_send_msg(uint dest, sdp_msg_t *msg) // Send msg AP
{
    vcpu_t *vcpu = sv_vcpu + dest;

    sdp_msg_t *shm_msg = sark_shmsg_get();
    if (shm_msg == NULL) {
        return RC_BUF;
    }

    sark_msg_cpy(shm_msg, msg);

    vcpu->mbox_ap_msg = shm_msg;
    vcpu->mbox_ap_cmd = SHM_MSG;

    sc[SC_SET_IRQ] = SC_CODE + (1 << v2p_map[dest]);

    volatile uchar flag = 0;
    event_t *e = event_new(proc_byte_set, (uint) &flag, 2);
    if (e == NULL) {
        sw_error(SW_OPT);
        sark_shmsg_free(shm_msg);
        return RC_BUF;          // !! not the right RC
    }

    uint id = e->ID;
    timer_schedule(e, 1000);    // !! const??

    while (vcpu->mbox_ap_cmd != SHM_IDLE && flag == 0) {
        continue;
    }

    if (flag != 0) {
        sark_shmsg_free(shm_msg);
        return RC_TIMEOUT;
    }

    timer_cancel(e, id);
    return RC_OK;
}

However, if a shared message is allocated but a timer event is not (which I'm not sure is impossible when SCAMP is heavily loaded) then the code path that activates the second return RC_BUF; occurs. We're not running in the mode where sw_error() will RTE, so the system will free the shared message buffer and leave it up to the caller to respond; since this is a one-way SDP message (because that's how the fast data in protocol works) the return value will be just dropped.

The problem? We've now freed the shared message for the next operation (and messages get reused rapidly because they're managed as a stack) despite the application core being signalled to handle it; it will continue to work with it and then will also free it when it is done. This corrupts the shared message stack (which will severely degrade the overall system reliability) and potentially generates shared buffers with two cores writing into them at once.

If I'm right (please tell me if I'm wrong!) then the fix for the corruption might be to move the lines:

    vcpu->mbox_ap_msg = shm_msg;
    vcpu->mbox_ap_cmd = SHM_MSG;

    sc[SC_SET_IRQ] = SC_CODE + (1 << v2p_map[dest]);

to below the point where we know that we have successfully allocated all resources to perform the message sending operation (i.e., to just above the timer_schedule() call).

Am I crazy in thinking that this might be the problem? Review of my reasoning is extremely welcome!

functions inside sark.h /.c return uints which are not named constants

methods such as rtr_mc_set, rtr_mc_load, rtr_mc_get and so on. return a uint where 0 is failure and 1 is success.

Could it be better to have these as named constants in sark so that application code can check these returns without using magic numbers?