Kinetis Microcontroller SRAM Region Hard Faults

I am doing a project that involves a K10DX128 microcontroller from Freescale, which is advertised to have 128 KB of flash memory and 16 KB of SRAM memory. It’s similar to the microcontroller used by the Teensy 3.0 platform. The project involves a lot of dynamically allocated memory because it deals with a lot of files inside a file system.

I ran into one of those “sometimes it happens, sometimes it doesn’t happen” bugs that causes a hard fault. Tracing the source of the hard fault lead to a few ordinary SRAM storage instructions, and apparently it happened half way through processing the list of files. This made me suspect that the memory was allocated incorrectly, and I checked all the things I should check(the address of the allocation, how much memory I should have, the status of my stack, the linker script, etc).

I added some verbose debug output of the failure, which looked like

file 1960 memcpy 0 addr 0x1FFFFCC0...done
file 1970 memcpy 1 addr 0x1FFFFCE6...done
file 1980 memcpy 2 addr 0x1FFFFD0C...done
file misc2 memcpy 3 addr 0x1FFFFD32...done
file 1990 memcpy 4 addr 0x1FFFFD58...done
file ABC_YA~1 memcpy 5 addr 0x1FFFFD7E...done
file AMBSTR~1 memcpy 6 addr 0x1FFFFDA4...done
file amoeba memcpy 7 addr 0x1FFFFDCA...done
file commod64 memcpy 8 addr 0x1FFFFDF0...done
file CUBES_~1 memcpy 9 addr 0x1FFFFE16...done
file EXPRIBB memcpy 10 addr 0x1FFFFE3C...done
file finlshot memcpy 11 addr 0x1FFFFE62...done
file GRAF_4KS memcpy 12 addr 0x1FFFFE88...done
file haduken memcpy 13 addr 0x1FFFFEAE...done
file intro memcpy 14 addr 0x1FFFFED4...done
file kickstar memcpy 15 addr 0x1FFFFEFA...done
file KS_HAL~1 memcpy 16 addr 0x1FFFFF20...done
file KS_TOGET memcpy 17 addr 0x1FFFFF46...done
file KSFINA~1 memcpy 18 addr 0x1FFFFF6C...done
file KSHELLOS memcpy 19 addr 0x1FFFFF92...done
file KSNINT memcpy 20 addr 0x1FFFFFB8...done
file lavapixe memcpy 21 addr 0x1FFFFFDE...

 Exception Handler, source: 1
r0: 0x00000000, r1: 0x1FFFFA80, r2: 0x20000002, r3: 0x00000000, r12: 0x20001D8F
LR: 0x0000482F, PC: 0x0000484C, PSR: 0x61000000,

Notice the address 0x1FFFFFDE, which is pretty close to 0x20000000. I double check the Kinetis reference manual for the memory map to make sure that 0x20000000 exists. What I found out was that the 16 KB of SRAM is divided into two regions: SRAM_L is 8 KB occupying 0x1FFFFFFF and lower, SRAM_U is 8 KB occupying 0x20000000 and higher.

My linker script only declares RAM as one region of 16 KB starting from 0x1FFFE000, it is not aware of this split.

I am suspecting that the default stdlib memcpy uses 32 bit write operations so that it runs as fast as possible, and one of the write operations crosses the boundary between SRAM_L and SRAM_U. I confirm this suspicion by running two piece of code

volatile uint32_t* foo;
foo = (volatile uint32_t*)0x1FFFFFFF; // right at the boundary between SRAM_L and SRAM_U
*foo = 0x12345678; // perform a 32 bit write
// this piece of code cause a hard fault

////////////////////////////////////////////////////////////

volatile uint32_t* foo;
foo = (volatile uint32_t*)(0x1FFFFFFF - 4);
*foo = 0x12345678; // perform a 32 bit write
// this piece of code will run fine

I have narrowed down the problem and I am now almost certain that it is caused by the write to multiple bytes across the boundary between SRAM_L and SRAM_U.

To prove this even further, I replaced the default stdlib memcpy function with my own that does 8 bit writes instead of 32 bit writes

_PTR memcpy_safe(void* dest, void* src, int cnt)
{
  int start = (int)dest;
  int end = start + cnt;
  if (start = 0x20000000) // if memory lies over SRAM_L and SRAM_U boundary
  {
    volatile uint8_t useless = 0; // force 8 bit copy instead of optimized 32 bit copy
    volatile uint8_t c;
    volatile int i = 0;
    volatile uint8_t* src_p = (uint8_t*)src;
    volatile uint8_t* dest_p = (uint8_t*)dest;
    for (i = 0; i < cnt && useless == 0; i++) {
      c = src_p[i];
      dest_p[i] = c;
    }
  }
  else
  {
    return memcpy(dest, src, cnt);
  }
}

and then my application stopped crashing, the debug now looked like

file KSFINA~1 memcpy 18 addr 0x1FFFFF6C...done
file KSHELLOS memcpy 19 addr 0x1FFFFF92...done
file KSNINT memcpy 20 addr 0x1FFFFFB8...done
file lavapixe memcpy 21 addr 0x1FFFFFDE...done
file leon memcpy 22 addr 0x20000004...done
file lilypad memcpy 23 addr 0x2000002A...done
file LOGOAN01 memcpy 24 addr 0x20000050...done
file LOGOAN02 memcpy 25 addr 0x20000076...done

However, this only fixes memcpy, other tasks such as adding to a struct member will still cause a hard fault if the struct member sits across both SRAM_L and SRAM_U.

The proper fix would be to ensure that the memory is allocated in such a way that the no variable sits across both SRAM_L and SRAM_U by taking into account the size of the list elements during allocation. The size of the elements must also be a multiple of 32 bits so that the default stdlib utilities (such as memcpy, memset, etc) are safe to use. This wastes some memory because you need to pad your data structures and also move your pointer a bit after allocating it.

Another fix is to detect that the next allocation is too close to the boundary, and if it will be, then waste the memory from the top of the heap to the end of SRAM_L, which means the next allocation must start at the start of SRAM_U, making it impossible for any useful data to sit across the boundary between SRAM_L and SRAM_U.

You need to pick and choose between these two solutions based on your situation. Sometimes adjusting the size of the data structures with padding will waste more memory than simply wasting a bit of SRAM_L first and keeping your structures small.

A third solution might be to switch to fixed memory allocation or modifying the linker script, but that might lead to more waste and more code complexity.

This is a really elusive bug, it is simply bad luck that the memory for my file list just so happens to be allocated this way. If you run into a hard fault and the usual causes and solutions don’t seem to apply, then maybe you are experiencing the problem I am describing here. I hope my article here is helpful and the words I used made it easy to be found via a search engine.

It feels almost as if this is a flaw of the Kinetis silicon design but I want to give Freescale the benefit of the doubt. The PIC microcontroller uses multiple “memory banks” and you don’t really see GCC compilers being used for PIC because of that (the official PIC compilers are optimized to deal with memory banks). AVR microcontrollers are designed with a continuous block of SRAM with three integer registers dedicated as pointers, making it extremely friendly for C compilers.

I cannot recommend Kinetis microcontrollers to anybody. There are already many annoyances regarding Kinetis microcontrollers and this issue just adds to that list.

Update: it has been quite a few days since I’ve posted to the Freescale support forum regarding this issue, and still no response.

5 thoughts on “Kinetis Microcontroller SRAM Region Hard Faults

  1. Rusty

    The same problem exists on the larger parts. It is rather annoying. Somewhere the doc does suggest not to cross banks, but I can’t remember where.
    Overall, the doc really doesn’t say too much about how some of the peripherals work which ends up creating more questions than it answers – like how do you get the i2c peripheral out of an unwanted state? Reset it of course!

    Parts like the LPC1768 have two ram banks, but they are not contiguous, thus avoiding the problem.

    Whilst we’re having a bitch about things, the IAR ARM tools have some major defects in the debugger. Only late last year did the release fix some of the bigger problems. Now, it crashes less, but it is not uncommon when doing some heavy debugging that it will crash a few times a day and just when you’re getting close to nailing the bug. The F word gets used often.

    Reply
  2. Trevor Woerner

    From the “ARM Cortex -M4 Processor, r0p1, Technical Reference Manual”, section 3.4.2 “Unaligned accesses that cross regions”, page 3-16:

    “Unaligned support is only available for load/store singles ( LDR , LDRH, STR, STRH ). Load/store
    double already supports word aligned accesses, but does not permit other unaligned accesses,
    and generates a fault if this is attempted.”

    Crossing from addresses just below 0x2000 0000 to addresses just above is considered to be “crossing regions” because addresses just below are in the “code” region and addresses just above are in the SRAM region. If nothing else, addresses in the code region are using the DCode bus and addresses in the SRAM region are using the system bus. This is defined at the architecture level therefore all ARMv6-M and ARMv7-M processors use this same layout; in other words all Cortex-M chips (this is one of the reasons why ARM claims Cortex-M MCUs are easy to program, because they all share the same memory map layout).

    I’m curious to know from where you obtained your linker script (aka scatter file). My belief is the linker script is wrong, and that’s where the bug lies. Your linker script should be declaring SRAM to start at 0x2000 0000, I believe.

    Reply
    1. Admin Post author

      Thank you very much for pointing that out.

      I obtained my linker script from Teensy 3.0’s distribution. The linker script declares the start of RAM to be at 0x1FFFE000.

      If I declare the start of RAM to be at 0x20000000, then wouldn’t I lose half of the RAM available to use? The reference manual clearly states “the on-chip RAM is split evenly among SRAM_L and SRAM_U”, and “accesses to the SRAM_L and SRAM_U memory ranges outside the amount of RAM on the device causes the bus cycle to be terminated with an error followed by the appropriate response in the requesting bus master.” Section 3.5.3.2 also clearly states “SRAM_U = 0x2000_0000 to [0x2000_0000+(SRAM_size/2)-1]”

      I believe the linker script is designed to maximize RAM use, and I really REALLY need this RAM because I need to be able to sort as many files as possible.

      I’m not blaming ARM for this, I’m blaming Freescale.

      Reply
      1. Trevor Woerner

        Wow, I can hardly believe my eyes! I never looked closely at the Kinetis (I have been using the STM32 and Tiva products) but I see exactly what you mean. The reference manual clearly states that SRAM starts below 0x2000.0000 and continues above, which (to me) would seem to contradict ARM’s architecture reference manual (the other products don’t do this).

        In that case, the linker script you’re using is correct, and your code will require the work-arounds you describe. I downloaded some sample code from Freescale’s website and looking (quickly) through the linker scripts and code associated with malloc() it seems as though they define really small heaps (0x4000 in size) which are always contained entirely either below or above 0x2000.0000.

        Reply
  3. Angel Genchev

    Did you read this – it’s from AN4745.pdf:
    All Kinetis K-series devices include two blocks of on-chip SRAM. The first block (SRAM_L) is mapped to the CODE bus, and the second block (SRAM_U) is mapped to the system bus. The memory itself can be accessed in a single cycle, but because instruction accesses to the system bus incurs a one clock delay at the core, SRAM_U instruction accesses take at least two clocks.
    SRAM_L is the only memory where code or data can be stored and the core is almost always guaranteed a single cycle access. For this reason, it makes sense to use the SRAM_L block as much as possible. This is a good area for storing critical code…..

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *