
So far we have been looking at debug mode assembly on iOS and the Raspberry Pi. If one invokes -O2 on the Pi (and switches to release build on iPhone), the optimized version of square function reduces to the following two ARM instructions :
mul r0, r0, r0 //r0 = r0*r0
bx lr //branch back to caller
1This is sad because for a ARM learning platform, nothing could be more important than a toolchain that supports thumb.
2If we force Xcode to produce ARM instructions by specifying -mno-thumb, then we see the following code for square function:
0x101af4: sub sp, sp, #4
0x101af8: str r0, [sp]
0x101afc: ldr r0, [sp]
0x101b00: ldr r1, [sp]
0x101b04: mul r0, r0, r1
0x101b08: add sp, sp, #4
0x101b0c: bx lr
3The other interesting bit to note here is 12 bytes reservation when just 4 bytes of locals are needed. It is not apparent to me why that is (if any of you know more about this, please leave a comment below). ARM AAPCS/EABI requires upto a max of 8 byte alignment which would have been still preserved since push {r11} would have decremented our presumably 8 byte aligned sp.
4Apparently Windows also uses r11 as the frame pointer when emitting ARM instructions. In thumb mode, Windows uses r7 as the frame pointer.