How to move the FreeRTOS heap onto STM32's CCM RAM

Scott

Many STM32 microcontrollers have a bank of special core-coupled memory (CCM) with a dedicated connection to the core. This allows the core to access data in the CCM with zero-wait states, offering potential performance improvements if other peripherals are contending for access to the memory bus (i.e. during DMA transfers with typical SRAM sections).

STM32 memory bus matrix diagram from STM32F429 datasheet

In situations where the memory is 'lightly loaded', such as running some simple tasks alongside I2C/UART buffer handling, I've struggled to reliably measure a performance difference.

However in situations where memory access patterns are more intricate and using the DMA peripherals is vital, i.e. handling high-speed ADC streams or moving I2S audio around, I've seen meaningful reductions in task timing jitter and execution of specific hot-loops in background tasks.

So the obvious choice is to use the CCM wherever possible, ideally with hot data that's heavily accessed for the largest potential improvement. As part of this search for performance improvements, I wanted to put my FreeRTOS heap in CCM.

Unfortunately most Google/StackOverflow search results for moving the FreeRTOS heap are out of date or incorrect, so hopefully this post saves someone some time!

The FreeRTOS docs now have a small note covering custom heap locations in the customisation section.

Requirements

Before configuring FreeRTOS, I'll quickly skim over setting up a CCM RAM section in the linker, how to manually position variable allocations, and how to validate the change.

Linker Region Setup

For the STM32 cores I'm using, the 64K sized CCM region starts at 0x10000000.

For GCC, we configure the CCMRAM region in the linker file:

/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 192K
CCMRAM (rw) : ORIGIN = 0x10000000, LENGTH = 64K
}
SECTION
{
// ...
_siccmram = LOADADDR(.ccmram);
/* CCM-RAM section
*
* IMPORTANT NOTE!
* If initialized variables will be placed in this section,
* the startup code needs to be modified to copy the init-values.
*/
.ccmram (NOLOAD):
{
. = ALIGN(4);
_sccmram = .; /* create a global symbol at ccmram start */
*(.ccmram)
*(.ccmram*)
. = ALIGN(4);
_eccmram = .; /* create a global symbol at ccmram end */
} >CCMRAM AT> FLASH
// ...
}

For IAR, the linker file needs to define the section and location in the .icf:

/* Define CCM RAM memory region. Adjust these values to match your specific microcontroller. */
define region CCMRAM_region = mem:[from 0x10000000 to 0x10001FFF];
/* Place the section in the memory region */
place in CCMRAM_region { section .ccmram };

For Keil MDK, you need a scatter file, and can add the extra region like this:

IRAM2 0x10000000 0x00010000 {
.ANY (ccmram)
}

Manually annotating variables

You need to instruct the linker to put your compile-time variable declaration in the right memory section. This differs subtly by toolchain but generally does the same thing.

CompilerMagic Incantation
GCC__attribute__((section(".ccmram")));
IAR#pragma location = ".ccmram"
Keil MDK__attribute__((section(".ccmram")));

As a usage example for GCC and MDK,

uint32_t sample_buffer[10] __attribute__((section(".ccmram"))) = { ... };

Be aware that the variable might have some additional restrictions when positioned in this manner - some linker configuration or startup code may be needed to copy the values from flash into CCM.

For the use-case of providing a pool of memory for the heap allocator, we don't need any data to be moved to the CCM section.

Compile-time memory statistics

This is optional, but a good general flag to use during development and helpful for checking that the changes are doing what we expect. This section only covers GCC.

Configure the linker to enable --print-memory-usage.

For my CMake project, it looks like this:

target_link_options(${PROJ_NAME} PRIVATE -Wl,--print-memory-usage)

Once configured, build output should show a more detailed flash and memory usage table,

Memory region Used Size Region Size %age Used
FLASH: 122260 B 512 KB 23.32%
RAM: 69928 B 192 KB 35.57%
CCMRAM: 0 B 64 KB 0.00%

FreeRTOS Configuration

I'm using FreeRTOS V10.4.1, with heap4. We can simply enable the custom heap in FreeRTOSConfig.h by adding:

#define configAPPLICATION_ALLOCATED_HEAP 1

Then, somewhere sensible in the code-base, declare some memory for the heap which will be in our CCMRAM section.

I decided that putting this in my main.c near the various startup calls made the most sense, and wrapping it with the same flag makes it easier to toggle as well.

// Provide the heap storage manually to control placement location
// Enabled via configAPPLICATION_ALLOCATED_HEAP in FreeRTOSConfig.h
#ifdef configAPPLICATION_ALLOCATED_HEAP
uint8_t ucHeap[ configTOTAL_HEAP_SIZE ] __attribute__((section(".ccmram")));
#endif
/* -------------------------------------------------------------------------- */
int main(void)
{
app_startup_init();
app_startup_tasks();
vTaskStartScheduler();
for (;;);
return 0;
}

Validation

As expected, we've moved the heap to a different memory section, which opens up a larger chunk of 'normal' SRAM for other uses.

Memory region Used Size Region Size %age Used
FLASH: 122260 B 512 KB 23.32%
RAM: 20776 B 192 KB 10.57%
CCMRAM: 49480 B 64 KB 75.50%

We can validate this by looking at the .map file for allocations with addresses in the 0x10000000 area.

If GCC isn't generating map files already, just specify a path with the -Map=filename.map argument. With CMake, this looks like:

target_link_options(${PROJ_NAME} PUBLIC LINKER:-Map=${PROJ_NAME}.map)

By looking in the map file for 0x10000000, I was able to find the ucHeap entry and also my pubsub singleton object that's declared in my startup code.

.ccmram 0x10000000 0xc000 CMakeFiles/delta-control.dir/src/main.c.obj
0x10000000 ucHeap
.ccmram 0x1000c000 0x148 src/application/app_startup/libapp_startup_lib.a(app_startup.c.obj)

When I inspect the firmware with a debugger, I get debug information that shows the heap allocations are in the right region:

CLion debugger view of FreeRTOS's heap allocations

Job done!

As mentioned in the opening background info, measuring the performance impact of these kinds of changes can be quite non-trival unless you have the right kind of workload - frequent, long running memory access patterns with background DMA transfers should benefit the most from the reduced bus contention.