Preface
LLVM is an inspiring software project that started with the passion for compilers of a single person, Chris Lattner. The events that followed the first versions of LLVM and how it became widely adopted later reveal a pattern that may be observed across the history of other successful open source projects: they did not start within a company, but instead they are the product of simple human curiosity with respect to a given subject. For example, the first Linux kernel was the result of a Finnish student being intrigued by the area of operating systems and being motivated to understand and see in practice how a real operating system should work.
For Linux or LLVM, the contribution of many other programmers matured and leveraged the project to a first-class software that rivals, in quality, any other established competitor. It is unfair, therefore, to attribute the success of any big project to a single person. However, in the open source community, the leap from a student's project to an incredibly complex yet robust software depends on a key factor: attracting contributors and programmers who enjoy spending their time on the project.
Schools create a fascinating atmosphere because education involves the art of teaching people how things work. For these people, the feeling of unraveling how intricate mechanisms work and surpassing the state of being puzzled to finally mastering them is full of victory and overcoming. In this environment, at the University of Illinois at Urbana-Champaign (UIUC), the LLVM project grew by being used both as a research prototype and as a teaching framework for compiler classes lectured by Vikram Adve, Lattner's Master's advisor. Students contributed to the first bug reports, setting in motion the LLVM trajectory as a well-designed and easy-to-study piece of software.
The blatant disparity between software theory and practice befuddles many Computer Science students. A clean and simple concept in computing theory may involve so many levels of implementation details such that they disguise real-life software projects to become simply too complex for the human mind to grasp, especially all of its nuances. A clever design with powerful abstractions is the key to aid the human brain to navigate all the levels of a project: from the high-level view, which implements how the program works in a broader sense, to the lowest level of detail.
This is particularly true for compilers. Students who have a great passion to learn how compilers work often face a tough challenge when it comes to understanding the factual compiler implementation. Before LLVM, GCC was one of the few open source options for hackers and curious students to learn how a real compiler is implemented, despite the theory taught in schools.
However, a software project reflects, in its purest sense, the view of the programmers who created it. This happens through the abstractions employed to distinguish modules and data representation across several components. Programmers may have different views about the same topic. In this way, old and large software bases such as GCC, which is almost 30 years old, frequently embody a collection of different views of different generation of programmers, which makes the software increasingly difficult for newer programmers and curious observers to understand.
The LLVM project not only attracted experienced compiler programmers, but also a lot of young and curious minds that saw in it a much cleaner and simpler hackable piece of software, which represented a compiler with a lot of potential. This was clearly observed by the incredible number of scientific papers that chose LLVM as a prototype to do research. The reason is simple; in academia, students are frequently in charge of the practical aspects of the implementation, and thus, it is of paramount importance for research projects that the student be able to master its experimental framework code base. Seduced by its newer design using the C++ language (instead of C used in GCC), modularity (instead of the monolithic structure of GCC), and concepts that map more easily to the theory being taught in modern compiler courses, many researchers found it easy to hack LLVM in order to implement their ideas, and they were successful. The success of LLVM in academia, therefore, was a consequence of this reduced gap between theory and practice.
Beyond an experimental framework for scientific research, the LLVM project also attracted industry interest due to its considerably more liberal license in comparison with the GPL license of GCC. As a project that grew in academia, a big frustration for researchers who write code is the fear that it will only be used for a single experiment and be immediately discarded afterwards. To fight this fate, Chris Lattner, in his Master's project at UIUC that gave birth to LLVM, decided to license the project under the University of Illinois/NCSA Open Source License, allowing its use, commercial or not, as long as the copyright notice is maintained. The goal was to maximize LLVM adoption, and this goal was fulfilled with honor. In 2012, LLVM was awarded the ACM Software System Award, a highly distinguished recognition of notable software that contributed to science.
Many companies embraced the LLVM project with different necessities and performed different contributions, widening the range of languages that an LLVM-based compiler can operate with as well as the range of machines for which the compiler is able to generate code. This new phase of the project provided an unprecedented level of maturity to the library and tools, allowing it to permanently leave the state of experimental academia software to enter the status of a robust framework used in commercial products. With this, the name of the project also changed from Low Level Virtual Machine to the acronym LLVM.
The decision to retire the name Low Level Virtual Machine in favor of just LLVM reflects the change of goals of the project across its history. As a Master's project, LLVM was created as a framework to study lifelong program optimizations. These ideas were initially published in a 2003 MICRO (International Symposium on Microarchitecture) paper entitled LLVA: A Low-level Virtual Instruction Set Architecture, describing its instruction set, and in a 2004 CGO (International Symposium on Code Generation and Optimization) paper entitled LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation.
Outside of an academic context, LLVM became a well-designed compiler with the interesting property of writing its intermediate representation to disk. In commercial systems, it was never truly used as a virtual machine such as the Java Virtual Machine (JVM), and thus, it made little sense to continue with the Low Level Virtual Machine name. On the other hand, some other curious names remained as a legacy. The file on the disk that stores a program in the LLVM intermediate representation is referred to as the LLVM bitcode, a parody of the Java bytecode, as a reference to the amount of space necessary to represent programs in the LLVM intermediate representation versus the Java one.
Our goal in writing this book is twofold. First, since the LLVM project grew a lot, we want to present it to you in small pieces, a component at a time, making it as simple as possible to understand while providing you with the joy of working with a powerful compiler library. Second, we want to evoke the spirit of an open source hacker, inspiring you to go far beyond the concepts presented here and never stop expanding your knowledge.
Happy hacking!