The difference in register usage is interesting given the effort to have only one calling convention in x64 (vis-a-vis 32-bit). Calling conventions in 32-bit such as fastcall6 were not standardized per se, yet if you did __fastcall in Windows or __attribute__(fastcall) on Linux – ECX and EDX are used to pass first two arguments. General cross-platform C developers for x64 should be aware of both register conventions. On the positive side, this presents bit of a challenge to pull off buffer overflow attacks across 64-bit Windows and Linux.

1 Debugging tools for Windows – x64 Architecture (MSDN)

2 x64 Primer – Everything You Need To Know To Start Programming 64-Bit Windows Systems by Matt Pietrek

3The parameters have to be fixed point numbers or references (pointers) that fit in 8 bytes. Floating point parameters are passed through XMM0-4 registers instead.

4such as when a structure is returned by value. I could not find a 128 bit data type I could use for this. Note that a structure returned by reference, which is most often how structures are passed around, does not pose a problem since it will fit fine in RAX.

5BSD and MacOS follow the same standard as well. The choice of registers and the order of the usage is rather strange. It seems weird to have RDI as the first parameter since that register along with RSI are generally involved in string copies. And wouldn’t you think of using RCX before using RDX ? If you do use RDX before RCX, would you use R8 before R9 or the other way around ?

6fastcall markings are ignored of course when doing 64-bit compiling in gcc or Microsoft compiler.

Tagged with →  
Share →

2 Responses to x64 calling convention

  1. Eldad says:

    I am trying to use CL 16.0 for x64 (VS 2010) to produce some readable ASM code for an example, and CL insists on preallocating a ton of stack space (28h bytes), with the following line:

    sub rsp, 40 ; 00000028H

    Question is, how can I disable this behavior? It is difficult to explain to the class and I like to show them clean, explicable code…

    Surely it doesn’t need that space. On x86, this seems to be controlled by the edit-and-continue switches (/Zi vs. /ZI), but these don’t have any effect in the x64 case. Any idea how to make CL only allocate as much stack as it actually needs?

  2. Satya Das says:

    If you are making any direct or indirect call to another function, that is normal. If you write a leaf function you may have better luck in minimizing stack requirements.

Leave a Reply

Your email address will not be published. Required fields are marked *


Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop us a note so we can take care of it!

Visit our friends!

A few highly recommended friends...

Set your Twitter account name in your settings to use the TwitterBar Section.