In my last post I took a quick look at ARM calling convention on the Raspberry Pi running Raspbian with GNU tool chain. In this post let us take a look at how the same code looks on a LPCXpresso built around a Cortex-M3 based LPC1769FBD100 from NXP Semiconductor. LPCXpresso comes with a GNU toolchain and arm-none-eabi-gcc reports version 4.6.2 20121016.

The call to the simple square function looks like the following:1

0x000001ce: 002af04f: mov.w r0, #42//number_to_square
0x000001d2: ffebf7ff: bl 0x1ac <square>//address of square function

The square function itself looks like the following (in debug build):

0x000001ac:     b480: push  {r7}//save previous r7
0x000001ae:     b083: sub sp, #12//claim stack space for 12 bytes (locals)
0x000001b0:     af00: add   r7, sp, #0//save the current sp in r7
0x000001b2:     6078: str   r0, [r7, #4]//store first argument (r0) in local at r7+4
0x000001b4:     687b: ldr   r3, [r7, #4]//r3 = [r7+4] = number_to_square
0x000001b6:     687a: ldr   r2, [r7, #4]//r2 = [r7+4] = number_to_square
0x000001b8: f303fb02: mul.w r3, r2, r3//multiply r2 and r3, store in r3
0x000001bc:     4618: mov   r0, r3//r0 = function return = number_to_square**2
0x000001be: 070cf107: add.w r7, r7, #12//point r7 to location above function locals
0x000001c2:     46bd: mov   sp, r7//restore sp (clean up locals)
0x000001c4:     bc80: pop   {r7}//restore r7 to original
0x000001c6:     4770: bx    lr//jump back to caller

Interestingly register r7 is being used like the frame pointer except that the locals are being accessed at positive offsets from r7. This is a point of difference from how x86 frame pointer is setup typically and the Raspberry Pi code we saw earlier that was using r11 as frame pointer. Instead of saving sp before claiming space for function locals, r7 stores the top of stack after claiming space for locals. Then in the epilogue, instead of an add instruction on sp to clean up locals, we see an add on r7 and a mov to restore sp.

Additionally the the mul.w and add.w2 instructions are 32-bit and all other instructions are 16-bit. The instruction set should be Thumb-2, since that is the only instruction set that allows intermixing of 16/32-bit instructions. However the iPhone strategy we formulated earlier to look to the CPSR to verify that the CPU is indeed in thumb mode, would not work here. In Cortex-M3, there is no CPSR.

Tagged with →  
Share →

Leave a Reply

Your email address will not be published. Required fields are marked *


Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop us a note so we can take care of it!

Visit our friends!

A few highly recommended friends...

Set your Twitter account name in your settings to use the TwitterBar Section.