All modules need some scratch memory, global (shared) memory. What is the "correct" wa

Here's a proposal that I could implement: A fun

Fixed in PR <a class="issue-link js-issue-link" data-error-text="Failed to load title"

How do we properly use the NODE_MEMREQ interface? about audiomark HOT 9 CLOSED

petertorelli commented on August 19, 2024

How do we properly use the NODE_MEMREQ interface?

from audiomark.

Comments (9)

llefaucheur commented on August 19, 2024

Ideally, the software component provider will ask for memory, to be allocated by the framework, tagged with different requirements : static/working; prog/data; speed; alignment.
Ideally, the framework will allocate different banks for this and will "negotiate" which component has access to fast ones. This can be a complex :
either you have a tool letting the system integrator manually putting emphasis and priority access to critical memory for specific components,
or you let the framework decide from the information it receives from the components (a declaration of the estimated complexity).
The framework will manage the optional clearing of working memory banks when switching to a new component.
Other notes
-some architecture requires the DMA memory is in a specific internal SRAM.
-we should let the DMA buffer either be allocated by the device driver, or the application

The code we talked before :
enum mem_mapping_type
{ mem_type_static = 0, /* (LSB) memory content is preserved (default ) /
mem_type_working = 1, / scratch memory content is not preserved between two calls /
mem_type_pseudo_working = 2, / static only during the uncompleted execution state of the SWC, see “NODE_RUN” /
mem_type_periodic_backup = 3, / parameters to reload for warm boot after a crash, holding for example
long-term estimators. This memory area is cleared at cold NODE_RESET and
refreshed for warm NODE_RESET. The SWC should not reset it (there is
no "warm-boot reset" entry point */
};

enum mem_speed_type /* memory requirements associated to enum memory_banks /
{ mem_speed_req_any = 0, / best effort /
mem_speed_req_normal = 1, / can be external memory /
mem_speed_req_fast = 2, / will be internal SRAM when possible /
mem_speed_req_critical_fast = 3, / will be TCM when possible */
};

from audiomark.

petertorelli commented on August 19, 2024

I don't understand why the memory manager needs to know all of this, because it has no control over it. Unless the developer provides the memory manager component? Wouldn't the XDAIS structure simply provide enumerated pointers and leave it up to the component to remember what each pointer means?

Let us imagine an SoC that implements this benchmark with a DSP accelerator peripheral.

The MCU must run the benchmark and provide the L/R/Downlink buffers to the pipeline.

The ABF and AEC run on the DSP, the ANR and KWS run on the MCU.

A shared buffer is needed to get the L/R data to the ABF, and the Downlink tothe AEC. These 3 live in MCU SRAM.

The ABF initializes a buffer locally as scratch memory, and a buffer to pass to the AEC, locally. Not within SRAM, but within its own peripheral space.

The scratch buffer is never needed by anything else. It does not need an XDIAS pointer, it can be global. Is this a violation?

The output from the AEC needs to be MCU SRAM space, so the AEC component needs to allocate that, and it has meaning to the MCU because that buffer is now read by the ANR.

The ABF and AEC can allocate their own scratch memory pointers, because no other components will ever see them. So it doesn't make sense to use XDAIS. This means we need a th_abf_init and a th_aec_init functions where this can be allocated. These pointers are passed with XDAIS structures by the MCU because there is no "extern" linkage between the ABF and AEC files, because they are of type "ee_*".

The input buffers to the ABF for L/R and AEC for downlink are allocated by the ee_audiomark.c application MCU process and passed as XDAIS structures.

But I don't understand WHO owns the allocation of the shared buffer between the AEC and the ANR. (In fact, today the ANR is coded as using in-place buffer, which is a problem).

It is not clear to me who writes the memory management framework.

I think each component should have its own allocator as a th_ function so that the develoepr can pick and choose which XDAIS buffers are needed, and we would NOT need any of the enum values you listed because we do not provide an allocator that can do anything meaningful with these paramters.

from audiomark.

llefaucheur commented on August 19, 2024

"edit"
My business motivation is to prepare the arrival of a software component (SWC) "store".
Each SWC are developed by separate companies. And delivered in binary formats, signed against a specific key (from contracts, architectures..). A SWC cannot anticipate if it will run on a multi processor with external/internal RAM. It can just tell the system integrator that an amount of memory allocation should be reserved in fast RAM (if possible) and larger one without speed constraint, for example.
I remember, by May 2022, members of the committee standing on the break for introducing complexity in the framework. At that time my focus was to get acceptance of the subroutine single-entry format with "XDAIS" access to buffers. I am now working hard to have the full set of features released in "CMSIS-LINK" early next year. And yes, this introduces a CPU load overhead, out of the initial scope of audioMark.
"end edit"
ABF receives a pointer to the base address of the L/R samples, in and out

The framework allocates the buffers in between software components
ABF do not define where are those I/O buffers or if the buffer is static or working

ABF should tell if it will modify the input buffer (today we assume it does)
ABF (and other components) should tell if the processing is "in-place", and if memory instance are relocatable between two calls (this option exists but was never implemented in TI's XDAIS).

ABF declares static/working RAM only for its own purpose, for its instance, and (later) will add those speed constraints to the framework, and the framework can ignore those details and implement a single flat memory bank for all.

from audiomark.

petertorelli commented on August 19, 2024

Here's a proposal that I could implement:

A function called ee_status_t th_memreq(uint nbytes, enum mem_type, enum speed_type, void **ptr). The developer implements this function. It returns pass/fail, and if pass, fills in ptr with the address of the memory location. We leave it up to the developer to implement this function. EEMBC will tell them what sizes to expect, since this is known ahead of time. There are numerous ways to implement this: pre-allocate from the heap like we do now, but with different types of memory that are placed by the linker file; use raw addresses and keep a counter and use the linker map to set those addresses; i don't know the best way for each platform, it is up to the developer. On a positive note: we'll know the memory request sizes in advance, and there is no need for a free() function.
Every component needs to be scrutinized and must have ZERO bytes of global memory. Stack only, with memreq allocations. Since every component has dozens of globals, how do we manage this? Only arrays need to be memreq'd? I can't imaging having to memreq every single pointer or counter that is a global variable, especially with libspeex? I assume the memreq is only for buffers and arrays, and not structs or counters?

from audiomark.

llefaucheur commented on August 19, 2024

From experience, you need minimum 3 memory banks : fast+static, fast+working, slow+static.
You are proposing more combinations like slow+working, which is fine, and under the system integrator responsibility.
" ZERO bytes of global memory " is a must, if you want to create several instances of the same component.

from audiomark.

petertorelli commented on August 19, 2024

I just spent an hour trying to prototype what this might look like, and it is nontrivial.

First problem: the structs required by BF static or working areas are structs of array pointers. I tried the naive approach and instantly recognized the problem:

int
th_memreq(void **pp_dst, unsigned long n, ee_mem_type_t type, ee_mem_speed_t speed)
{
    *pp_dst = malloc(n);
    if (!*pp_dst)
    {
        return 1; 
    }
    return 0;
}

:
:
:

    switch (command)
    {
        case NODE_MEMREQ: {
            #define CHECK(x) if (x) { return x; }
            PTR_INT *p_memreqs = (PTR_INT *)data;
            CHECK(th_memreq((void **)&(p_memreqs[0]), sizeof(ee_abf_f32_params_t), ee_mem_type_static, ee_mem_speed_critical));
            CHECK(th_memreq((void **)&(p_memreqs[1]), sizeof(ee_abf_f32_mem_t), ee_mem_type_static, ee_mem_speed_critical));
            CHECK(th_memreq((void **)&(p_memreqs[2]), sizeof(ee_abf_f32_static_t), ee_mem_type_static, ee_mem_speed_critical));
            CHECK(th_memreq((void **)&(p_memreqs[3]), sizeof(ee_abf_f32_working_t), ee_mem_type_working, ee_mem_speed_critical));
            CHECK(th_memreq((void **)&(p_memreqs[4]), HANNING_SIZE * sizeof(ee_f32_t), ee_mem_type_static, ee_mem_speed_critical));
            CHECK(th_memreq((void **)&(p_memreqs[5]), ROTATION_SIZE * sizeof(ee_f32_t), ee_mem_type_working, ee_mem_speed_critical));

            break;

If we try to allocate a block, we still don't have control over where the 2D arrays are placed because the structures need to be allocated iteratively:

typedef struct
{
    ee_f32_t states_BM_ADF[NFFT / 2 + 1][LEN_BM_ADF * 2];
    ee_f32_t coefs_BM_ADF[NFFT / 2 + 1][LEN_BM_ADF * 2];
    ee_f32_t Norm_out_BM[NFFT / 2 + 1];
    ee_f32_t lookBF_out[NFFT / 2 + 1];
    ee_f32_t GSC_det_avg;
    uint8_t  adptBF_coefs_update_enable;
} ee_abf_f32_mem_t;

We cannot simply allocate a block here because the first two 2D arrays are not guaranteed to be contiguous. C 2D arrays are not memory friendly, so to accomplish this we would need to allocate a 1D array and re-write the code to compute the linear index of this new array. :)

Second problem, we would need to create a th_abf_memreq_init() function for every components so that the developer can pick and choose which structures to place in various memory types. This is identical to playing with the linker map, except dynamically.

Third problem, this still doesn't account for the XDAIS buffers we create in between the components. There may be benefit by moving around the XDAIS structs in different memories as well.

I was hoping to give it a try, implementing the correct way to manage allocations, but programmatically it seems prohibitively complex based on how we built the components. I'm willing to accept that I may have made a gross error in my architecture assumptions somewhere. :)

from audiomark.

petertorelli commented on August 19, 2024

Looking at your code, I completely forgot how 2D arrays are stored in memory. I forgot it was a contiguous block of memory, for some reason I thought it was an array of pointers to arrays. Whoops.

I also see you allocated everything as "fastest" memory type, rather than applying any other constraints. The structs are named static and working, but no distinction is made in the current version, so I will assume everything is just fastest mem type and ignore static/working. That makes everything a LOT easier.

The MEMREQ still comes from the ~28K of heap memory compiled in audiomark.c. This will still need a th_memreq function, but first I will fix KWS memory and worry about the allocator AFTER the PR is merged.

Thanks!!!

from audiomark.

petertorelli commented on August 19, 2024

Ok, merged from PR into main. Since you already computed the AEC and ANR requests, I will use those and bypass the speex_alloc() functions.

from audiomark.

petertorelli commented on August 19, 2024

Fixed in PR #26.

from audiomark.

How do we properly use the NODE_MEMREQ interface? about audiomark HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent