C compiler – assembly code generation
What is intriguing to understand in the previous section is the reason the compiler understands this C code. First, the main task for the compiler is to convert the C code into assembly code according to the C/C++ calling convention, as shown in Figure 1.1:
Figure 1.1 – x86 calling convention
Important note
For convenience and practicability, the following examples will be presented with x86 instructions. However, the methods and principles described in this book are common to all Windows systems, and the compiler examples are based on GNU Compiler Collection (GCC) for Windows (MinGW
).
As different system functions (and even third-party modules) have the expected in-memory access to the memory level of the assembly code, there are several mainstream application binary interface (ABI) calling conventions for ease of management. Interested readers can refer to Argument Passing and Naming Conventions by Microsoft (https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-and-naming-conventions).
These calling conventions mainly deal with several issues:
- The position where the parameters are placed in order (e.g., on a stack, in a register such as ECX, or mixed to speed up performance)
- The memory space occupied by parameters if parameters are need to be stored
- The occupied memory to be released by the caller or callee
When the compiler generates the assembly code, it will recognize the calling conventions of the system, arrange the parameters in memory according to its preference, and then call the memory address of the function with the call
command. Therefore, when the thread jumps into the system instruction, it can correctly obtain the function parameter at its expected memory address.
Take Figure 1.1 as an example: we know that the USER32!MessageBoxA
function prefers WINAPI
calling conventions. In this calling convention, the parameter content is pushed into the stack from right to left, and the memory released for this calling convention is chosen by the callee. So after pushing 4 parameters into the stack to occupy 16 bytes in the stack (sizeof(uint32_t) x 4), it will be executed in USER32!MessageBoxA
. After executing the function request, return to the next line of the Call MessageBoxA
instruction with ret 0x10
and release 16 bytes of memory space from the stack (i.e., xor
eax, eax
).
Important note
The book here only focuses on the process of how the compiler generates single-chip instructions and encapsulates the program into an executable file. It does not include the important parts of advanced compiler theory, such as semantic tree generation and compiler optimization. These parts are reserved for readers to explore for further learning.
In this section, we learned about the C/C++ calling convention, how parameters are placed in memory in order, and how memory space is released when the program is finished.