Zig Version 0.12.0-dev.2833+8802ec583 Steps to

Thankyou for your answer. I understand that e.g. <code class="notran

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Missing atomic builtins for RP2040 (cortex_m0plus) about zig HOT 5 OPEN

captain-error commented on May 30, 2024

Missing atomic builtins for RP2040 (cortex_m0plus)

from zig.

Comments (5)

drew-gpf commented on May 30, 2024

This isn't a Zig problem, since it happens in equivalent C code (i.e. using stdatomic.h), at least on my end:

#include <pico/sync.h>
#include <stdatomic.h>

static _Atomic(int) whatever = 0;

int change(void) {
    return atomic_exchange(&whatever, 1);
}

Will produce the same error if change is called. Zig can't provide support for this because it does not recognize the rp2040's synchronization primitives, which you should regardless be using through pico_sync or pico_multicore. Even if Zig could do that, it would have to keep track of changes to the Pico SDK, since it would need to allocate a hardware spinlock without failing, for example: the only atomic read/write operations that exist are for peripheral pages.

from zig.

captain-error commented on May 30, 2024

Thankyou for your answer.

I understand that e.g. atomic_exchange has this problem because cortex M0 has no special instructions for atomics.
However, I think @atomicLoad and @atomicStore do not share this problem.
As far as I understand, the normal Store and Load assembly instructions for cortex m0 are all atomic in the sense that the thread on CPU 1 will either see the old value or the new value when a Store is executed by CPU 0. There is also no memory cache which could hide some of the stores but not others.

So my naive assumption was that @atomicStore(u32, &variable, 42, .Monotonic) compiles down to a single STR instruction.

This would mean that the difference between variable = 42; and @atomicStore(u32, &variable, 42, .Monotonic); is that the latter guarantees that the Store instruction is not optimized away (e.g. because the variable ist held in a register).

(Maybe this board is not the right place for the following question. If so please let me know.)
My question is now, how do I work around the missing atomic builtins.

One of my use cases is a shared ring buffer. CPU 0 writes into it. The writing function must guarantee that the Store of a new value happens before the Store of the advanced write pointer. The reader on CPU 1 can then poll by executing a Load on the write pointer and compare it to its stored copy. I they differ, the buffer contains new values, which can then by read by Loads.

I have 2 solution ideas:

make all variables (all being 4 bytes or smaller) which are shared between both cores volatile, i.e. in my example the write pointer and the array for the values. I know that this is a hack because volatile is meant for memory mapped IO only.
I think this should work and be equivalent to accessing the variables using @atomicStore(T, &variable, value, .SeqCst) and @atomicLoad(T, &variable, .SeqCst).
Implement the Store and Load in assembly (one instruction each). Make the optimizer not optimize them away (I don't know how to do that right now.)

from zig.

drew-gpf commented on May 30, 2024

I understand that e.g. atomic_exchange has this problem because cortex M0 has no special instructions for atomics.
However, I think @atomicLoad and @AtomicStore do not share this problem.

Actually, I tried to use atomic_load and it linked and produced "reasonable" assembly:

#include <pico/sync.h>
#include <stdatomic.h>

static _Atomic(int) whatever = 0;

int change(void) {
    return atomic_load(&whatever);
}

(Note this is an object file, so the compiler isn't even telling the linker that it's using an atomic builtin)

However, it would only compile if I used the GCC arm-none-eabi toolchain. If I used Clang with the same sysroot, it produced the same undefined call to libatomic. I also tried and got mostly the same result by manually using each possible macro expansion in stdatomic. GCC is probably substituting the builtin at the callsite, while LLVM wants to use libatomic. I assume the (GCC) toolchain isn't expecting this, so they just aren't provided.

The equivalent Zig code would be something like: (function is noinline to make it easier to spot)

var whatever: usize = undefined;

noinline fn change() usize {
    @fence(.SeqCst); // fence before
    const result = @as(*volatile usize, @ptrCast(&whatever)).*; // grab result
    @fence(.SeqCst); // fence after

    return result;
}

Which becomes

For some reason the option to DMB is different. Reading the v6-m ARM, I think the GCC example would actually throw an illegal instruction, since opt has to be SY, but is ISH (1111 vs 1011). This makes me think that none of this is intended to be supported and the GCC output is a miscompilation, but the documentation may be incomplete. Even stranger is that some atomic builtins would use fences while others would not.

You probably only want a compiler fence, anyways, so you could get away with something like this:

var whatever: usize = undefined;

noinline fn change() usize {
    asm volatile ("" ::: "memory"); // fence before
    const result = @as(*volatile usize, @ptrCast(&whatever)).*; // grab result
    asm volatile ("" ::: "memory"); // fence after

    return result;
}

The pointer is marked as volatile in this case to prevent LLVM optimizing it out entirely since it's only ever undefined.. it might be better practice to just define a function called compilerFence (which the std lib removed, for some reason) and stick that where you don't want anything to be reordered.

from zig.

captain-error commented on May 30, 2024

Thank you @drew-gpf !
You are very helpful.

Combining what you have written, here is my suggestion for the implementation of @atomicLoad and @atomicStore.

const AtomicOrder = @import("std").builtin.AtomicOrder;

pub fn atomicLoad(comptime T: type, ptr: *const T, comptime order: AtomicOrder) T {
    if (@sizeOf(T) > 4) @compileError("Only supported for types with 4 or less bytes.");
    if (order == .SeqCst) @compileError("AtomicOrder.SeqCst not supported.");

    if (order == .Release or order == .AcqRel) asm volatile ("" ::: "memory"); // compiler fence
    const result = @as(*const volatile T, @ptrCast(ptr)).*;
    if (order == .Acquire or order == .AcqRel) asm volatile ("" ::: "memory"); // compiler fence

    return result;
}

pub fn atomicStore(comptime T: type, ptr: *T, value: T, comptime order: AtomicOrder) void {
    if (@sizeOf(T) > 4) @compileError("Only supported for types with 4 or less bytes.");
    if (order == .SeqCst) @compileError("AtomicOrder.SeqCst not supported.");

    if (order == .Release or order == .AcqRel) asm volatile ("" ::: "memory"); // compiler fence
    @as(*volatile T, @ptrCast(ptr)).* = value;
    if (order == .Acquire or order == .AcqRel) asm volatile ("" ::: "memory"); // compiler fence
}

Does that make sense?

Maybe AtomicOrder.SeqCst could be implemented using the ISB, DMB, and DSB instructions?
I don't feel comfortable implementing it. These concurrency problems are usually more tricky than they appear at first.

from zig.

drew-gpf commented on May 30, 2024

What I'm moreso getting at is that it's unclear if any compiler should be expected to support atomic rmw for arbitrary addresses if Clang generates the same linker error and GCC generates a potentially bad fence instruction. That is, if you want atomic stores and loads, you need to keep in mind the specific SOC you're using and especially the address you're writing to and optimize from there--that's why I suggested to use compiler fences for all memory ordering types, since I'm pretty sure a DMB is a 3-cycle no-op in this case. You're better off asking on the raspberry pi forums, though. I suspect they will tell you to just use a critical section.

from zig.

Missing atomic builtins for RP2040 (cortex_m0plus) about zig HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent