Comments (9)
Ideally, the software component provider will ask for memory, to be allocated by the framework, tagged with different requirements : static/working; prog/data; speed; alignment.
Ideally, the framework will allocate different banks for this and will "negotiate" which component has access to fast ones. This can be a complex :
either you have a tool letting the system integrator manually putting emphasis and priority access to critical memory for specific components,
or you let the framework decide from the information it receives from the components (a declaration of the estimated complexity).
The framework will manage the optional clearing of working memory banks when switching to a new component.
Other notes
-some architecture requires the DMA memory is in a specific internal SRAM.
-we should let the DMA buffer either be allocated by the device driver, or the application
The code we talked before :
enum mem_mapping_type
{ mem_type_static = 0, /* (LSB) memory content is preserved (default ) /
mem_type_working = 1, / scratch memory content is not preserved between two calls /
mem_type_pseudo_working = 2, / static only during the uncompleted execution state of the SWC, see “NODE_RUN” /
mem_type_periodic_backup = 3, / parameters to reload for warm boot after a crash, holding for example
long-term estimators. This memory area is cleared at cold NODE_RESET and
refreshed for warm NODE_RESET. The SWC should not reset it (there is
no "warm-boot reset" entry point */
};
enum mem_speed_type /* memory requirements associated to enum memory_banks /
{ mem_speed_req_any = 0, / best effort /
mem_speed_req_normal = 1, / can be external memory /
mem_speed_req_fast = 2, / will be internal SRAM when possible /
mem_speed_req_critical_fast = 3, / will be TCM when possible */
};
from audiomark.
I don't understand why the memory manager needs to know all of this, because it has no control over it. Unless the developer provides the memory manager component? Wouldn't the XDAIS structure simply provide enumerated pointers and leave it up to the component to remember what each pointer means?
Let us imagine an SoC that implements this benchmark with a DSP accelerator peripheral.
The MCU must run the benchmark and provide the L/R/Downlink buffers to the pipeline.
The ABF and AEC run on the DSP, the ANR and KWS run on the MCU.
A shared buffer is needed to get the L/R data to the ABF, and the Downlink tothe AEC. These 3 live in MCU SRAM.
The ABF initializes a buffer locally as scratch memory, and a buffer to pass to the AEC, locally. Not within SRAM, but within its own peripheral space.
The scratch buffer is never needed by anything else. It does not need an XDIAS pointer, it can be global. Is this a violation?
The output from the AEC needs to be MCU SRAM space, so the AEC component needs to allocate that, and it has meaning to the MCU because that buffer is now read by the ANR.
The ABF and AEC can allocate their own scratch memory pointers, because no other components will ever see them. So it doesn't make sense to use XDAIS. This means we need a th_abf_init
and a th_aec_init
functions where this can be allocated. These pointers are passed with XDAIS structures by the MCU because there is no "extern" linkage between the ABF and AEC files, because they are of type "ee_*".
The input buffers to the ABF for L/R and AEC for downlink are allocated by the ee_audiomark.c application MCU process and passed as XDAIS structures.
But I don't understand WHO owns the allocation of the shared buffer between the AEC and the ANR. (In fact, today the ANR is coded as using in-place buffer, which is a problem).
It is not clear to me who writes the memory management framework.
I think each component should have its own allocator as a th_
function so that the develoepr can pick and choose which XDAIS buffers are needed, and we would NOT need any of the enum
values you listed because we do not provide an allocator that can do anything meaningful with these paramters.
from audiomark.
"edit"
My business motivation is to prepare the arrival of a software component (SWC) "store".
Each SWC are developed by separate companies. And delivered in binary formats, signed against a specific key (from contracts, architectures..). A SWC cannot anticipate if it will run on a multi processor with external/internal RAM. It can just tell the system integrator that an amount of memory allocation should be reserved in fast RAM (if possible) and larger one without speed constraint, for example.
I remember, by May 2022, members of the committee standing on the break for introducing complexity in the framework. At that time my focus was to get acceptance of the subroutine single-entry format with "XDAIS" access to buffers. I am now working hard to have the full set of features released in "CMSIS-LINK" early next year. And yes, this introduces a CPU load overhead, out of the initial scope of audioMark.
"end edit"
ABF receives a pointer to the base address of the L/R samples, in and out
The framework allocates the buffers in between software components
ABF do not define where are those I/O buffers or if the buffer is static or working
ABF should tell if it will modify the input buffer (today we assume it does)
ABF (and other components) should tell if the processing is "in-place", and if memory instance are relocatable between two calls (this option exists but was never implemented in TI's XDAIS).
ABF declares static/working RAM only for its own purpose, for its instance, and (later) will add those speed constraints to the framework, and the framework can ignore those details and implement a single flat memory bank for all.
from audiomark.
Here's a proposal that I could implement:
-
A function called
ee_status_t th_memreq(uint nbytes, enum mem_type, enum speed_type, void **ptr)
. The developer implements this function. It returns pass/fail, and if pass, fills in ptr with the address of the memory location. We leave it up to the developer to implement this function. EEMBC will tell them what sizes to expect, since this is known ahead of time. There are numerous ways to implement this: pre-allocate from the heap like we do now, but with different types of memory that are placed by the linker file; use raw addresses and keep a counter and use the linker map to set those addresses; i don't know the best way for each platform, it is up to the developer. On a positive note: we'll know the memory request sizes in advance, and there is no need for afree()
function. -
Every component needs to be scrutinized and must have ZERO bytes of global memory. Stack only, with memreq allocations. Since every component has dozens of globals, how do we manage this? Only arrays need to be memreq'd? I can't imaging having to memreq every single pointer or counter that is a global variable, especially with libspeex? I assume the memreq is only for buffers and arrays, and not structs or counters?
from audiomark.
From experience, you need minimum 3 memory banks : fast+static, fast+working, slow+static.
You are proposing more combinations like slow+working, which is fine, and under the system integrator responsibility.
" ZERO bytes of global memory " is a must, if you want to create several instances of the same component.
from audiomark.
I just spent an hour trying to prototype what this might look like, and it is nontrivial.
First problem: the structs required by BF static or working areas are structs of array pointers. I tried the naive approach and instantly recognized the problem:
int
th_memreq(void **pp_dst, unsigned long n, ee_mem_type_t type, ee_mem_speed_t speed)
{
*pp_dst = malloc(n);
if (!*pp_dst)
{
return 1;
}
return 0;
}
:
:
:
switch (command)
{
case NODE_MEMREQ: {
#define CHECK(x) if (x) { return x; }
PTR_INT *p_memreqs = (PTR_INT *)data;
CHECK(th_memreq((void **)&(p_memreqs[0]), sizeof(ee_abf_f32_params_t), ee_mem_type_static, ee_mem_speed_critical));
CHECK(th_memreq((void **)&(p_memreqs[1]), sizeof(ee_abf_f32_mem_t), ee_mem_type_static, ee_mem_speed_critical));
CHECK(th_memreq((void **)&(p_memreqs[2]), sizeof(ee_abf_f32_static_t), ee_mem_type_static, ee_mem_speed_critical));
CHECK(th_memreq((void **)&(p_memreqs[3]), sizeof(ee_abf_f32_working_t), ee_mem_type_working, ee_mem_speed_critical));
CHECK(th_memreq((void **)&(p_memreqs[4]), HANNING_SIZE * sizeof(ee_f32_t), ee_mem_type_static, ee_mem_speed_critical));
CHECK(th_memreq((void **)&(p_memreqs[5]), ROTATION_SIZE * sizeof(ee_f32_t), ee_mem_type_working, ee_mem_speed_critical));
break;
If we try to allocate a block, we still don't have control over where the 2D arrays are placed because the structures need to be allocated iteratively:
typedef struct
{
ee_f32_t states_BM_ADF[NFFT / 2 + 1][LEN_BM_ADF * 2];
ee_f32_t coefs_BM_ADF[NFFT / 2 + 1][LEN_BM_ADF * 2];
ee_f32_t Norm_out_BM[NFFT / 2 + 1];
ee_f32_t lookBF_out[NFFT / 2 + 1];
ee_f32_t GSC_det_avg;
uint8_t adptBF_coefs_update_enable;
} ee_abf_f32_mem_t;
We cannot simply allocate a block here because the first two 2D arrays are not guaranteed to be contiguous. C 2D arrays are not memory friendly, so to accomplish this we would need to allocate a 1D array and re-write the code to compute the linear index of this new array. :)
Second problem, we would need to create a th_abf_memreq_init()
function for every components so that the developer can pick and choose which structures to place in various memory types. This is identical to playing with the linker map, except dynamically.
Third problem, this still doesn't account for the XDAIS buffers we create in between the components. There may be benefit by moving around the XDAIS structs in different memories as well.
I was hoping to give it a try, implementing the correct way to manage allocations, but programmatically it seems prohibitively complex based on how we built the components. I'm willing to accept that I may have made a gross error in my architecture assumptions somewhere. :)
from audiomark.
Looking at your code, I completely forgot how 2D arrays are stored in memory. I forgot it was a contiguous block of memory, for some reason I thought it was an array of pointers to arrays. Whoops.
I also see you allocated everything as "fastest" memory type, rather than applying any other constraints. The structs are named static and working, but no distinction is made in the current version, so I will assume everything is just fastest mem type and ignore static/working. That makes everything a LOT easier.
The MEMREQ still comes from the ~28K of heap memory compiled in audiomark.c
. This will still need a th_memreq
function, but first I will fix KWS memory and worry about the allocator AFTER the PR is merged.
Thanks!!!
from audiomark.
Ok, merged from PR into main. Since you already computed the AEC and ANR requests, I will use those and bypass the speex_alloc()
functions.
from audiomark.
Fixed in PR #26.
from audiomark.
Related Issues (20)
- Variables idx_microphone_L, idx_microphone_R, idx_downlink, idx_for_asr in hard_coded_demo.c should be initialized HOT 2
- Speexdsp component is set in fixed-point not agree with the top level API HOT 3
- No allocator present; assigns inaccessible memory HOT 2
- Crash when run using MS Visual Studio HOT 11
- Why do we have so many FFT APIs? HOT 1
- There are too many `#define` macros. HOT 1
- What is the correct license header for the source files? HOT 1
- Reduce the # of warnings during compilation HOT 1
- Document memory allocation scheme HOT 1
- Add valgrind/callgrind/(sp)lint static checking HOT 1
- Are the provided three input audio files sufficient? HOT 5
- Fix inconsistent float types, use of `int` types, and signed constants HOT 1
- Understand if we should be using `restricted` on our function calls HOT 1
- Update to official CMSIS versions in reference libraries HOT 1
- MISRA 2012 compliance HOT 1
- Move reference th_api.c to it's own area; change CMake files HOT 1
- Can we use submodule for CMSIS-NN and CMSIS-DSP inside the port/arm/lib folder HOT 1
- Incorrect restrict qualifiers on th_memmove HOT 1
- Missing Model.hpp in the ds_cnn_model.hpp
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audiomark.