Understanding modular design
LLVM is designed as a set of libraries unlike other compilers such as GNU Compiler Collection (GCC). In this recipe, LLVM optimizer will be used to understand this design. As LLVM optimizer's design is library-based, it allows you to order the passes to be run in a specified order. Also, this design allows you to choose which optimization passes you can run—that is, there might be a few optimizations that might not be useful to the type of system you are designing, and only a few optimizations will be specific to the system. When looking at traditional compiler optimizers, they are built as a tightly interconnected mass of code, that is difficult to break down into small parts that you can understand and use easily. In LLVM, you need not know about how the whole system works to know about a specific optimizer. You can just pick one optimizer and use it without having to worry about other components attached to it.
Before we go ahead and look into this recipe, we must also know a little about LLVM assembly language. The LLVM code is represented in three forms: in memory compiler Intermediate Representation (IR), on disk bitcode representation, and as human readable assembly. LLVM is a Static Single Assignment (SSA)-based representation that provides type safety, low level operations, flexibility, and the capability to represent all the high-level languages cleanly. This representation is used throughout all the phases of LLVM compilation strategy. The LLVM representation aims to be a universal IR by being at a low enough level that high-level ideas may be cleanly mapped to it. Also, LLVM assembly language is well formed. If you have any doubts about understanding the LLVM assembly mentioned in this recipe, refer to the link provided in the See also section at the end of this recipe.
Getting ready
We must have installed the LLVM toolchain on our host machine. Specifically, we need the
opt
tool.
How to do it...
We will run two different optimizations on the same code, one-by-one, and see how it modifies the code according to the optimization we choose.
- First of all, let us write a code we can input for these optimizations. Here we will write it into a file named
testfile.ll:
$ cat testfile.ll define i32 @test1(i32 %A) { %B = add i32 %A, 0 ret i32 %B } define internal i32 @test(i32 %X, i32 %dead) { ret i32 %X } define i32 @caller() { %A = call i32 @test(i32 123, i32 456) ret i32 %A }
- Now, run the
opt
tool for one of the optimizations—that is, for combining the instruction:$ opt –S –instcombine testfile.ll –o output1.ll
- View the output to see how
instcombine
has worked:$ cat output1.ll ; ModuleID = 'testfile.ll' define i32 @test1(i32 %A) { ret i32 %A } define internal i32 @test(i32 %X, i32 %dead) { ret i32 %X } define i32 @caller() { %A = call i32 @test(i32 123, i32 456) ret i32 %A }
- Run the opt command for dead argument elimination optimization:
$ opt –S –deadargelim testfile.ll –o output2.ll
- View the output, to see how
deadargelim
has worked:$ cat output2.ll ; ModuleID = testfile.ll' define i32 @test1(i32 %A) { %B = add i32 %A, 0 ret i32 %B } define internal i32 @test(i32 %X) { ret i32 %X } define i32 @caller() { %A = call i32 @test(i32 123) ret i32 %A }
How it works...
In the preceding example, we can see that, for the first command, the instcombine
pass is run, which combines the instructions and hence optimizes %B = add i32 %A, 0; ret i32 %B
to ret i32 %A
without affecting the code.
In the second case, when the deadargelim pass
is run, we can see that there is no modification in the first function, but the part of code that was not modified last time gets modified with the function arguments that are not used getting eliminated.
LLVM optimizer is the tool that provided the user with all the different passes in LLVM. These passes are all written in a similar style. For each of these passes, there is a compiled object file. Object files of different passes are archived into a library. The passes within the library are not strongly connected, and it is the LLVM PassManager that has the information about dependencies among the passes, which it resolves when a pass is executed. The following image shows how each pass can be linked to a specific object file within a specific library. In the following figure, the PassA references LLVMPasses.a for PassA.o, whereas the custom pass refers to a different library MyPasses.a for the MyPass.o object file.
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
There's more...
Similar to the optimizer, the LLVM code generator also makes use of its modular design, splitting the code generation problem into individual passes: instruction selection, register allocation, scheduling, code layout optimization, and assembly emission. Also, there are many built-in passes that are run by default. It is up to the user to choose which passes to run.
See also
- In the upcoming chapters, we will see how to write our own custom pass, where we can choose which of the optimization passes we want to run and in which order. Also, for a more detailed understanding, refer to http://www.aosabook.org/en/llvm.html.
- To understand more about LLVM assembly language, refer to http://llvm.org/docs/LangRef.html.