Converting a C source code to LLVM assembly
Here we will convert a C code to intermediate representation in LLVM using the C frontend Clang.
Getting ready
Clang must be installed in the PATH.
How to do it...
- Lets create a C code in the
multiply.c
file, which will look something like the following:$ cat multiply.c int mult() { int a =5; int b = 3; int c = a * b; return c; }
- Use the following command to generate LLVM IR from the C code:
$ clang -emit-llvm -S multiply.c -o multiply.ll
- Have a look at the generated IR:
$ cat multiply.ll ; ModuleID = 'multiply.c' target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: nounwind uwtable define i32 @mult() #0 { %a = alloca i32, align 4 %b = alloca i32, align 4 %c = alloca i32, align 4 store i32 5, i32* %a, align 4 store i32 3, i32* %b, align 4 %1 = load i32* %a, align 4 %2 = load i32* %b, align 4 %3 = mul nsw i32 %1, %2 store i32 %3, i32* %c, align 4 %4 = load i32* %c, align 4 ret i32 %4 }
We can also use the
cc1
for generating IR:$ clang -cc1 -emit-llvm testfile.c -o testfile.ll
How it works...
The process of C code getting converted to IR starts with the process of lexing, wherein the C code is broken into a token stream, with each token representing an Identifier, Literal, Operator, and so on. This stream of tokens is fed to the parser, which builds up an abstract syntax tree with the help of Context free grammar (CFG) for the language. Semantic analysis is done afterwards to check whether the code is semantically correct, and then we generate code to IR.
Here we use the Clang frontend to generate the IR file from C code.
See also
- In the next chapter, we will see how the lexer and parser work and how code generation is done. To understand the basics of LLVM IR, you can refer to http://llvm.org/docs/LangRef.html.