Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
LLVM Essentials

You're reading from   LLVM Essentials Become familiar with the LLVM infrastructure and start using LLVM libraries to design a compiler

Arrow left icon
Product type Paperback
Published in Dec 2015
Publisher
ISBN-13 9781785280801
Length 166 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (4):
Arrow left icon
Suyog Sarda Suyog Sarda
Author Profile Icon Suyog Sarda
Suyog Sarda
Mayur Pandey Mayur Pandey
Author Profile Icon Mayur Pandey
Mayur Pandey
David Farago David Farago
Author Profile Icon David Farago
David Farago
John Criswell John Criswell
Author Profile Icon John Criswell
John Criswell
Arrow right icon
View More author details
Toc

Getting familiar with LLVM IR

LLVM Intermediate Representation (IR) is the heart of the LLVM project. In general every compiler produces an intermediate representation on which it runs most of its optimizations. For a compiler targeting multiple-source languages and different architectures the important decision while selecting an IR is that it should neither be of very high-level, as in very closely attached to the source language, nor it should be very low-level, as in close to the target machine instructions. LLVM IR aims to be a universal IR of a kind, by being at a low enough level that high-level ideas may be cleanly mapped to it. Ideally the LLVM IR should have been target-independent, but it is not so because of the inherent target dependence in some of the programming languages itself. For example, when using standard C headers in a Linux system, the header files itself are target dependent, which may specify a particular type to an entity so that it matches the system calls of the particular target architecture.

Most of the LLVM tools revolve around this Intermediate Representation. The frontends of different languages generate this IR from the high-level source language. The optimizer tool of LLVM runs on this generated IR to optimize the code for better performance and the code generator makes use of this IR for target specific code generation. This IR has three equivalent forms:

  • An in-memory compiler IR
  • An on-disk bitcode representation
  • A Human readable form (LLVM Assembly)

Now let's take an example to see how this LLVM IR looks like. We will take a small C code and convert it into LLVM IR using clang and try to understand the details of LLVM IR by mapping it back to the source language.

$ cat add.c
int globvar = 12;

int add(int a) {
return globvar + a;
}

Use the clang frontend with the following options to convert it to LLVM IR:

$ clang -emit-llvm -c -S add.c
$ cat add.ll
; ModuleID = 'add.c'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@globvar = global i32 12, align 4

; Function Attrs: nounwind uwtable
define i32 @add(i32 %a) #0 {
  %1 = alloca i32, align 4
  store i32 %a, i32* %1, align 4
  %2 = load i32, i32* @globvar, align 4
  %3 = load i32, i32* %1, align 4
  %4 = add nsw i32 %2, %3
  ret i32 %4
}

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

Now let's look at the IR generated and see what it is all about. You can see the very first line giving the ModuleID, that it defines the LLVM module for add.c file. An LLVM module is a top–level data structure that has the entire contents of the input LLVM file. It consists of functions, global variables, external function prototypes, and symbol table entries.

The following lines show the target data layout and target triple from which we can know that the target is x86_64 processor with Linux running on it. The datalayout string tells us what is the endianess of machine ('e' meaning little endian), and the name mangling (m : e denotes elf type). Each specification is separated by ''and each following spec gives information about the type and size of that type. For example, i64:64 says 64 bit integer is of 64 bits.

Then we have a global variable globvar. In LLVM IR all globals start with '@' and all local variables start with '%'. There are two main reasons why the variables are prefixed with these symbols. The first one being, the compiler won't have to bother about a name clash with reserved words, the other being that the compiler can come up quickly with a temporary name without having to worry about a conflict with symbol table conflicts. This second property is useful for representing the IR in static single assignment (SSA) from where each variable is assigned only a single time and every use of a variable is preceded by its definition. So, while converting a normal program to SSA form, we create a new temporary name for every redefinition of a variable and limit the range of earlier definition till this redefinition.

LLVM views global variables as pointers, so an explicit dereference of the global variable using load instruction is required. Similarly, to store a value, an explicit store instruction is required.

Local variables have two categories:

  • Register allocated local variables: These are the temporaries and allocated virtual registers. The virtual registers are allocated physical registers during the code generation phase which we will see in a later chapter of the book. They are created by using a new symbol for the variable like:
    %1 = some value
    
  • Stack allocated local variables: These are created by allocating variables on the stack frame of a currently executing function, using the alloca instruction. The alloca instruction gives a pointer to the allocated type and explicit use of load and store instructions is required to access and store the value.
    %2 = alloca i32
    

Now let's see how the add function is represented in LLVM IR. define i32 @add(i32 %a) is very similar to how functions are declared in C. It specifies the function returns integer type i32 and takes an integer argument. Also, the function name is preceded by '@', meaning it has global visibility.

Within the function is actual processing for functionality. Some important things to note here are that LLVM uses a three-address instruction, that is a data processing instruction, which has two source operands and places the result in a separate destination operand (%4 = add i32 %2, %3). Also the code is in SSA form, that is each value in the IR has a single assignment which defines the value. This is useful for a number of optimizations.

The attributes string that follows in the generated IR specifies the function attributes which are very similar to C++ attributes. These attributes are for the function that has been defined. For each function defined there is a set of attributes defined in the LLVM IR.

The code that follows the attributes is for the ident directive that identifies the module and compiler version.

You have been reading a chapter from
LLVM Essentials
Published in: Dec 2015
Publisher:
ISBN-13: 9781785280801
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image