LLVM in action
In this section, let's use LLVM's Clang compiler to compile native code into LLVM IR. This will give a better idea of how LLVM works and will be useful for understanding how the compilers use LLVM in future chapters.
We first create a C file called sum.c
and enter the following contents:
$ touch sum.c // sum.c unsigned sum(unsigned a, unsigned b) { return a + b; }
The sum.c
file contains a simple sum
function that takes in two unsigned integers and returns the sum of them. LLVM provides the Clang LLVM compiler to compile the C source code. In order to generate the LLVM IR, run the following command:
$ clang -S -O3 -emit-llvm sum.c
We provided the Clang compiler with the -S
, -O3
, and -emit-llvm
options:
- The
-S
option specifies for the compiler to only run the preprocess and compilation steps. - The
-O3
option specifies for the compiler to generate a well-optimized binary. - The
-emit-llvm
option specifies for the compiler to emit the LLVM IR while generating the machine code.
The preceding code will print out the following LLVM IR:
define i32 @sum(i32, i32) local_unnamed_addr #0 { %3 = add i32 %1, %0 ret i32 %3 }
The syntax of the LLVM IR is structurally much closer to C. The define
keyword defines the beginning of a function. Next to that is the return type of the function, i32
. Next, we have the name of the function, @sum
.
Important Note
Note the @
symbol there? LLVM uses @
to identify the global variables and function. It uses %
to identify the local variables.
After the function name, we state the types of the input argument (i32
in this case). The local_unnamed_addr
attribute indicates that the address is known not to be significant within the module. The variables in the LLVM IR are immutable. That is, once you define them, you cannot change them. So inside the `block`, we create a new local value, %3
, and assign it the value of add
. add
is an opcode that takes in the `type` of the arguments followed by the two arguments, %0
and %1
. %0
and %1
denote the first and second local variables. Finally, we return %3
with the ret
keyword followed by the `type`.
This IR is transformable; that is, the IR can be transformed from the textual representation into memory and then into actual bit code that run on the bare metal. Also, from bit code, you can transform them back to the textual representation.
Imagine that you are writing a new language. The success of the language depends on how versatile the language is at performing on various architectures. Generating optimized byte codes for various architectures (such as x86, ARM, and others) takes a long time and it is not easy. LLVM provides an easy way to achieve it. Instead of targeting the different architecture, create a compiler frontend that converts the source code into an LLVM compatible IR. Then, LLVM will convert the IR into efficient and optimized byte code that runs on any architecture.
Note
LLVM is an umbrella project. It has so many components that you could write a set of books on them. Covering the whole of LLVM and how to install and run them is beyond the scope of this book. If you are interested in learning more about various components of LLVM, how they work, and how to use them, then check out the website: https://llvm.org.