What do you get with a Packt Subscription?

Free for first 7 days. ₹800 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

LLVM Techniques, Tips, and Best Practices Clang and Middle-End Libraries

Chapter 1: Saving Resources When Building LLVM

LLVM is the state-of-the-art compiler optimization and code generation framework adopted by many amazing industrial and academic projects, such as the Just-In-Time (JIT) compiler in JavaScript engines and machine learning (ML) frameworks. It is a useful toolbox for building programming languages and binary file tools. However, despite the project's robustness, its learning resources are scattered, and it doesn't have the best documentation either. Due to this, it has a pretty steep learning curve, even for developers with some LLVM experience. This book aims to tackle these issues by providing you with knowledge of common and important domains in LLVM in a pragmatic fashion – showing you some useful engineering tips, pointing out lesser-known but handy features, and illustrating useful examples.

As an LLVM developer, building LLVM from source has always been the first thing you should do. Given the scale of LLVM nowadays, this task can take hours to finish. Even worse, rebuilding the project to reflect changes might also take a long time and hinder your productivity. Therefore, it's crucial to know how to use the right tools and how to find the best build configurations for your project for the sake of saving various resources, especially your precious time.

In this chapter, we are going to cover the following topics:

Cutting down building resources with better tooling
Saving building resources by tweaking CMake arguments
Learning how to use GN, an alternative LLVM build system, and its pros and cons

Cutting down building resources with better tooling

As we mentioned at the beginning of this chapter, if you build LLVM with the default (CMake) configurations, by invoking CMake and building the project in the following way, there is a high chance that the whole process will take hours to finish:

$ cmake ../llvm
$ make all

This can be avoided by simply using better tools and changing some environments. In this section, we will cover some guidelines to help you choose the right tools and configurations that can both speed up your building time and improve memory footprints.

Replacing GNU Make with Ninja

The first improvement we can do is using the Ninja build tool (https://ninja-build.org) rather than GNU Make, which is the default build system generated by CMake on major Linux/Unix platforms.

Here are the steps you can use to set up Ninja on your system:

On Ubuntu, for example, you can install Ninja by using this command:
```
$ sudo apt install ninja-build
```
Ninja is also available in most Linux distributions.
Then, when you're invoking CMake for your LLVM build, add an extra argument:
```
$ cmake -G "Ninja" ../llvm
```
Finally, use the following build command instead:
```
$ ninja all
```

Ninja runs significantly faster than GNU Make on large code bases such as LLVM. One of the secrets behind Ninja's blazing fast running speed is that while the majority of build scripts such as Makefile are designed to be written manually, the syntax of Ninja's build script, build.ninja, is more similar to assembly code, which should not be edited by developers but generated by other higher-level build systems such as CMake. The fact that Ninja uses an assembly-like build script allows it to do many optimizations under the hood and get rid of many redundancies, such as slower parsing speeds, when invoking the build. Ninja also has a good reputation for generating better dependencies among build targets.

Ninja makes clever decisions in terms of its degree of parallelization; that is, how many jobs you want to execute in parallel. So, usually, you don't need to worry about this. If you want to explicitly assign the number of worker threads, the same command-line option used by GNU Make still works here:

$ ninja -j8 all

Let's now see how you can avoid using the BFD linker.

Avoiding the use of the BFD linker

The second improvement we can do is using linkers other than the BFD linker, which is the default linker used in most Linux systems. The BFD linker, despite being the most mature linker on Unix/Linux systems, is not optimized for speed or memory consumption. This would create a performance bottleneck, especially for large projects such as LLVM. This is because, unlike the compiling phase, it's pretty hard for the linking phase to do file-level parallelization. Not to mention the fact that the BFD linker's peak memory consumption when building LLVM usually takes about 20 GB, causing a burden on computers with small amounts of memory. Fortunately, there are at least two linkers in the wild that provide both good single-thread performance and low memory consumption: the GNU gold linker and LLVM's own linker, LLD.

The gold linker was originally developed by Google and donated to GNU's binutils. You should have it sitting in the binutils package by default in modern Linux distributions. LLD is one of LLVM's subprojects with even faster linking speed and an experimental parallel linking technique. Some of the Linux distributions (newer Ubuntu versions, for example) already have LLD in their package repository. You can also download the prebuilt version from LLVM's official website.

To use the gold linker or LLD to build your LLVM source tree, add an extra CMake argument with the name of the linker you want to use.

For the gold linker, use the following command:

$ cmake -G "Ninja" -DLLVM_USE_LINKER=gold ../llvm

Similarly, for LLD, use the following command:

$ cmake -G "Ninja" -DLLVM_USE_LINKER=lld ../llvm

Limiting the number of parallel threads for Linking

Limiting the number of parallel threads for linking is another way to reduce (peak) memory consumption. You can achieve this by assigning the LLVM_PARALLEL_LINK_JOBS=<N> CMake variable, where N is the desired number of working threads.

With that, we've learned that by simply using different tools, the building time could be reduced significantly. In the next section, we're going to improve this building speed by tweaking LLVM's CMake arguments.

Tweaking CMake arguments

This section will show you some of the most common CMake arguments in LLVM's build system that can help you customize your build and achieve maximum efficiency.

Before we start, you should have a build folder that has been CMake-configured. Most of the following subsections will modify a file in the build folder; that is, CMakeCache.txt.

Choosing the right build type

LLVM uses several predefined build types provided by CMake. The most common types among them are as follows:

Release: This is the default build type if you didn't specify any. It will adopt the highest optimization level (usually -O3) and eliminate most of the debug information. Usually, this build type will make the building speed slightly slower.
Debug: This build type will compile without any optimization applied (that is, -O0). It preserves all the debug information. Note that this will generate a huge number of artifacts and usually take up ~20 GB of space, so please be sure you have enough storage space when using this build type. This will usually make the building speed slightly faster since no optimization is being performed.
RelWithDebInfo: This build type applies as much compiler optimization as possible (usually -O2) and preserves all the debug information. This is an option balanced between space consumption, runtime speed, and debuggability.

You can choose one of them using the CMAKE_BUILD_TYPE CMake variable. For example, to use the RelWithDebInfo type, you can use the following command:

$ cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo …

It is recommended to use RelWithDebInfo first (if you're going to debug LLVM later). Modern compilers have gone a long way to improve the debug information's quality in optimized program binaries. So, always give it a try first to avoid unnecessary storage waste; you can always go back to the Debug type if things don't work out.

In addition to configuring build types, LLVM_ENABLE_ASSERTIONS is another CMake (Boolean) argument that controls whether assertions (that is, the assert(bool predicate) function, which will terminate the program if the predicate argument is not true) are enabled. By default, this flag will only be true if the build type is Debug, but you can always turn it on manually to enforce stricter checks, even in other build types.

Avoiding building all targets

The number of LLVM's supported targets (hardware) has grown rapidly in the past few years. At the time of writing this book, there are nearly 20 officially supported targets. Each of them deals with non-trivial tasks such as native code generation, so it takes a significant amount of time to build. However, the chances that you're going to be working on all of these targets at the same time are low. Thus, you can select a subset of targets to build using the LLVM_TARGETS_TO_BUILD CMake argument. For example, to build the X86 target only, we can use the following command:

$ cmake -DLLVM_TARGETS_TO_BUILD="X86" …

You can also specify multiple targets using a semicolon-separated list, as follows:

$ cmake -DLLVM_TARGETS_TO_BUILD="X86;AArch64;AMDGPU" …

Surround the list of targets with double quotes!

In some shells, such as BASH, a semicolon is an ending symbol for a command. So, the rest of the CMake command will be cut off if you don't surround the list of targets with double-quotes.

Let's see how building shared libraries can help tweak CMake arguments.

Building as shared libraries

One of the most iconic features of LLVM is its modular design. Each component, optimization algorithm, code generation, and utility libraries, to name a few, are put into their own libraries where developers can link individual ones, depending on their usage. By default, each component is built as a static library (*.a in Unix/Linux and *.lib in Windows). However, in this case, static libraries have the following drawbacks:

Linking against static libraries usually takes more time than linking against dynamic libraries (*.so in Unix/Linux and *.dll in Windows).
If multiple executables link against the same set of libraries, like many of the LLVM tools do, the total size of these executables will be significantly larger when you adopt the static library approach compared to its dynamic library counterpart. This is because each of the executables has a copy of those libraries.
When you're debugging LLVM programs with debuggers (GDB, for example), they usually spend quite some time loading the statically linked executables at the very beginning, hindering the debugging experience.

Thus, it's recommended to build every LLVM component as a dynamic library during the development phase by using the BUILD_SHARED_LIBS CMake argument:

$ cmake -DBUILD_SHARED_LIBS=ON …

This will save you a significant amount of storage space and speed up the building process.

Splitting the debug info

When you're building a program in debug mode – adding the -g flag when using you're GCC and Clang, for example – by default, the generated binary contains a section that stores debug information. This information is essential for using a debugger (for example, GDB) to debug that program. LLVM is a large and complex project, so when you're building it in debug mode – using the cmAKE_BUILD_TYPE=Debug variable – the compiled libraries and executables come with a huge amount of debug information that takes up a lot of disk space. This causes the following problems:

Due to the design of C/C++, several duplicates of the same debug information might be embedded in different object files (for example, the debug information for a header file might be embedded in every library that includes it), which wastes lots of disk space.
The linker needs to load object files AND their associated debug information into memory during the linking stage, meaning that memory pressure will increase if the object file contains a non-trivial amount of debug information.

To solve these problems, the build system in LLVM provides allows us to split debug information into separate files from the original object files. By detaching debug information from object files, the debug info of the same source file is condensed into one place, thus avoiding unnecessary duplicates being created and saving lots of disk space. In addition, since debug info is not part of the object files anymore, the linker no longer needs to load them into memory and thus saves lots of memory resources. Last but not least, this feature can also improve our incremental building speed – that is, rebuild the project after a (small) code change – since we only need to update the modified debug information in a single place.

To use this feature, please use the LLVM_USE_SPLIT_DWARF cmake variable:

$ cmake -DcmAKE_BUILD_TYPE=Debug -DLLVM_USE_SPLIT_DWARF=ON …

Note that this CMake variable only works for compilers that use the DWARF debug format, including GCC and Clang.

Building an optimized version of llvm-tblgen

TableGen is a Domain-Specific Language (DSL) for describing structural data that will be converted into the corresponding C/C++ code as part of LLVM's building process (we will learn more about this in the chapters to come). The conversion tool is called llvm-tblgen. In other words, the running time of llvm-tblgen will affect the building time of LLVM itself. Therefore, if you're not developing the TableGen part, it's always a good idea to build an optimized version of llvm-tblgen, regardless of the global build type (that is, CMAKE_BUILD_TYPE), making llvm-tblgen run faster and shortening the overall building time.

The following CMake command, for example, will create build configurations that build a debug version of everything except the llvm-tblgen executable, which will be built as an optimized version:

$ cmake -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_BUILD_TYPE=Debug …

Lastly, you'll see how you can use Clang and the new PassManager.

Using the new PassManager and Clang

Clang is LLVM's official C-family frontend (including C, C++, and Objective-C). It uses LLVM's libraries to generate machine code, which is organized by one of the most important subsystems in LLVM – PassManager. PassManager puts together all the tasks (that is, the Passes) required for optimization and code generation.

In Chapter 9, Working with PassManager and AnalysisManager, will introduce LLVM's new PassManager, which builds from the ground up to replace the existing one somewhere in the future. The new PassManager has a faster runtime speed compared to the legacy PassManager. This advantage indirectly brings better runtime performance for Clang. Therefore, the idea here is pretty simple: if we build LLVM's source tree using Clang, with the new PassManager enabled, the compilation speed will be faster. Most of the mainstream Linux distribution package repositories already contain Clang. It's recommended to use Clang 6.0 or later if you want a more stable PassManager implementation. Use the LLVM_USE_NEWPM CMake variable to build LLVM with the new PassManager, as follows:

$ env CC=`which clang` CXX=`which clang++` \
  cmake -DLLVM_USE_NEWPM=ON …

LLVM is a huge project that takes a lot of time to build. The previous two sections introduced some useful tricks and tips for improving its building speed. In the next section, we're going to introduce an alternative build system to build LLVM. It has some advantages over the default CMake build system, which means it will be more suitable in some scenarios.

Using GN for a faster turnaround time

CMake is portable and flexible, and it has been battle-tested by many industrial projects. However, it has some serious issues when it comes to reconfigurations. As we saw in the previous sections, you can modify some of the CMake arguments once build files have been generated by editing the CMakeCache.txt file in the build folder. When you invoke the build command again, CMake will reconfigure the build files. If you edit the CMakeLists.txt files in your source folders, the same reconfiguration will also kick in. There are primarily two drawbacks of CMake's reconfiguration process:

In some systems, the CMake configuration process is pretty slow. Even for reconfiguration, which theoretically only runs part of the process, it still takes a long time sometimes.
Sometimes, CMake will fail to resolve the dependencies among different variables and build targets, so your changes will not reflect this. In the worst case, it will just silently fail and take you a long time to dig out the problem.

Generate Ninja, better known as GN, is a build file generator used by many of Google's projects, such as Chromium. GN generates Ninja files from its own description language. It has a good reputation for having a fast configuration time and reliable argument management. LLVM has brought GN support as an alternative (and experimental) building method since late 2018 (around version 8.0.0). GN is especially useful if your developments make changes to build files, or if you want to try out different building options in a short period.

Perform the following steps to use GN to build LLVM:

LLVM's GN support is sitting in the llvm/utils/gn folder. After switching to that folder, run the following get.py script to download GN's executable locally:
```
$ cd llvm/utils/gn
$ ./get.py
```
Using a specific version of GN
If you want to use a custom GN executable instead of the one fetched by get.py, simply put your version into the system's PATH. If you are wondering what other GN versions are available, you might want to check out the instructions for installing depot_tools at https://dev.chromium.org/developers/how-tos/install-depot-tools.
Use gn.py in the same folder to generate build files (the local version of gn.py is just a wrapper around the real gn, to set up the essential environment):
```
$ ./gn.py gen out/x64.release
```
out/x64.release is the name of the build folder. Usually, GN users will name the build folder in <architecture>.<build type>.<other features> format.
Finally, you can switch into the build folder and launch Ninja:
```
$ cd out/x64.release
$ ninja <build target>
```
Alternatively, you can use the -C Ninja option:
```
$ ninja -C out/x64.release <build target>
```

You probably already know that the initial build file generation process is super fast. Now, if you want to change some of the build arguments, please navigate to the args.gn file under the build folder (out/x64.release/args.gn, in this case); for example, if you want to change the build type to debug and change the targets to build (that is, the LLVM_TARGETS_TO_BUILD CMake argument) into X86 and AArch64. It is recommended to use the following command to launch an editor to edit args.gn:

$ ./gn.py args out/x64.release

In the editor of args.gn, input the following contents:

# Inside args.gn
is_debug = true
llvm_targets_to_build = ["X86", "AArch64"]

Once you've saved and exited the editor, GN will do some syntax checking and regenerate the build files (of course, you can edit args.gn without using the gn command and the build files won't be regenerated until you invoke the ninja command). This regeneration/reconfiguration will also be fast. Most importantly, there won't be any infidelity behavior. Thanks to GN's language design, relationships between different build arguments can be easily analyzed with little ambiguity.

The list of GN's build arguments can be found by running this command:

$ ./gn.py args --list out/x64.release

Unfortunately, at the time of writing this book, there are still plenty of CMake arguments that haven't been ported to GN. GN is not a replacement for LLVM's existing CMake build system, but it is an alternative. Nevertheless, GN is still a decent building method if you want a fast turnaround time in your developments that involve many build configuration changes.

Key benefits

Explore Clang, LLVM’s middle-end and backend, in a pragmatic way

Develop your LLVM skillset and get to grips with a variety of common use cases

Engage with real-world LLVM development through various coding examples

Description

Every programmer or engineer, at some point in their career, works with compilers to optimize their applications. Compilers convert a high-level programming language into low-level machine-executable code. LLVM provides the infrastructure, reusable libraries, and tools needed for developers to build their own compilers. With LLVM’s extensive set of tooling, you can effectively generate code for different backends as well as optimize them. In this book, you’ll explore the LLVM compiler infrastructure and understand how to use it to solve different problems. You’ll start by looking at the structure and design philosophy of important components of LLVM and gradually move on to using Clang libraries to build tools that help you analyze high-level source code. As you advance, the book will show you how to process LLVM IR – a powerful way to transform and optimize the source program for various purposes. Equipped with this knowledge, you’ll be able to leverage LLVM and Clang to create a wide range of useful programming language tools, including compilers, interpreters, IDEs, and source code analyzers. By the end of this LLVM book, you’ll have developed the skills to create powerful tools using the LLVM framework to overcome different real-world challenges.

Who is this book for?

This book is for software engineers of all experience levels who work with LLVM. If you are an academic researcher, this book will help you learn useful LLVM skills in a short time and enable you to build your prototypes and projects quickly. Programming language enthusiasts will also find this book useful for building a new programming language with the help of LLVM.

What you will learn

Find out how LLVM's build system works and how to reduce the building resource

Get to grips with running custom testing with LLVM's LIT framework

Build different types of plugins and extensions for Clang

Customize Clang's toolchain and compiler flags

Write LLVM passes for the new PassManager

Discover how to inspect and modify LLVM IR

Understand how to use LLVM's profile-guided optimizations (PGO) framework

Create custom compiler sanitizers

What do you get with a Packt Subscription?

Free for first 7 days. ₹800 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Frequently bought together

₹4096.99

₹4096.99

LLVM Techniques, Tips, and Best Practices Clang and Middle-End Libraries

₹3649.99

Total ₹ 11,843.97

Filter reviews by

All

Amazon verified reviews

Mitchel Dickerson May 25, 2021

I have been working hacking on LLVM full time for my research for just under two years at this point, and I can honestly say that I wish I had such good introduction material when I first started. The most common approach for learning LLVM previously has been to read the documentation and then learn from code already in LLVM that performs tasks similar to the ones you hope to achieve.This book provides what I feel is a much better approach to starting on what can be a very daunting code base. The sections to the book are well thought out and give a good overview of where to approach the compiler depending on what particular problem you are trying to tackle. If you have a project that requires work in C++ front end, the Clang sections are an excellent starting point for getting access to the tools you need to start tackling the problems you actually want to solve.If your project involves some other language that uses LLVM, then the "Middle-end" Development will get you all the tools you need to modify the IR passed on to LLVM (as a note to people starting with languages other than C or C++ that use LLVM, you will likely have to refer to the language's compiler if you want to delve in to "front end" work for that language, Rust being a notable example). Fairly quickly within the section the author presented modules and functionality that helped me to better implement my own work in LLVM.Ultimately, if you are looking to get started in LLVM and need a place to start, this is the best starting point there is. The documentation, source code, and community are important, but this book will get you started quickly and effectively.

Amazon Verified review

Amazon Customer

The book is very easy to follow and provides an introduction to many useful LLVM abstractions. It is very well written, you can see a lot of effort was put into into it, i.e. well edited. Definitely one of the best of the books I've read about compilers.

Paul Kirth May 28, 2021

This book is a very nice introduction to development with LLVM. Many of the subtle points raised in this book can be difficult to find without extensive knowledge about the codebase. The author provides practical guidance for setting up a variety of projects within the compiler infrastructure and explains in detail the normal way compiler developers organize and modify their projects to achieve a variety of results. In particular he makes some of the more challenging aspects of starting new projects simple and straight forward. In particular, I found his descirption of the runtime components to be straight forward, when most explanations I've seen within the documentation and across various blog posts to be overly complex.LLVM is a big project, an unless you've worked across many of its various components, i.e., frontend development, analysis, instrumentation, code generation, and the runtime components getting started in a new area can be a challenge, even if you've been working on the compiler for a long time. Even though I've worked on LLVM for a long time, I'll probably reach for this as a reference for how to do a few things.While I would have liked more advanced topics to explored a bit more deeply, I look forward to the author's inevitable follow up, which will no doubt be focused on more complex compiler problems and advanced use of LLVM. Overall a must have for new LLVM developers, students, and researchers.

Amazon Customer Jun 24, 2023

This book is a phenomenal resource for novice programmers aiming to deepen their understanding of compiler design and optimization using the LLVM infrastructure. The content is comprehensive and addresses every aspect of LLVM in a practical, relatable manner.Right from the first chapter, which efficiently explains how to reduce resource consumption when building LLVM from source, to the sections on handling AST and customizing Clang's toolchain, this book presents complex concepts in an easy-to-grasp format. Each chapter is peppered with real-world examples, code snippets, and practical use cases, enabling the reader to instantly apply the acquired knowledge.One of the book's key strengths is the way it demystifies LLVM IR (Intermediate Representation), which is a critical aspect of compiler design. The chapter on "Processing IR in a proper way" delivers insights into how to inspect and modify LLVM IR, providing a valuable foundation for developing tools for transforming and optimizing source programs.Another highlight is the exploration of LLVM's build system and the LLVM LIT framework. The book provides step-by-step guidance on building plugins, extensions, and custom compiler sanitizers. Even advanced topics like profile-guided optimization (PGO) are discussed with clarity and precision.In terms of accessibility, this book does an excellent job of catering to a wide audience. Whether you are a seasoned software engineer, an academic researcher, or a programming language enthusiast, there's something valuable to extract from this book. Furthermore, the information is presented in a logical, sequential manner, building on previous topics and facilitating a smooth learning curve.

Jay May 21, 2021

If you’re looking to extend Clang, this is a wonderful resource.

LLVM Techniques, Tips, and Best Practices Clang and Middle-End Libraries: Design powerful and reliable compilers using the latest libraries and tools from LLVM

What do you get with a Packt Subscription?

LLVM Techniques, Tips, and Best Practices Clang and Middle-End Libraries

Chapter 1: Saving Resources When Building LLVM

Technical requirements

Cutting down building resources with better tooling

Replacing GNU Make with Ninja

Avoiding the use of the BFD linker

Tweaking CMake arguments

Choosing the right build type

Avoiding building all targets

Building as shared libraries

Splitting the debug info

Building an optimized version of llvm-tblgen

Using the new PassManager and Clang

Using GN for a faster turnaround time

Summary

Further reading

Page 1 of 7

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs

LLVM Techniques, Tips, and Best Practices Clang and Middle-End Libraries: Design powerful and reliable compilers using the latest libraries and tools from LLVM

What do you get with a Packt Subscription?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the author

FAQs