Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Extreme C

You're reading from   Extreme C Taking you to the limit in Concurrency, OOP, and the most advanced capabilities of C

Arrow left icon
Product type Paperback
Published in Oct 2019
Publisher Packt
ISBN-13 9781789343625
Length 822 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Kamran Amini Kamran Amini
Author Profile Icon Kamran Amini
Kamran Amini
Arrow right icon
View More author details
Toc

Table of Contents (27) Chapters Close

Preface 1. Essential Features FREE CHAPTER 2. From Source to Binary 3. Object Files 4. Process Memory Structure 5. Stack and Heap 6. OOP and Encapsulation 7. Composition and Aggregation 8. Inheritance and Polymorphism 9. Abstraction and OOP in C++ 10. Unix – History and Architecture 11. System Calls and Kernels 12. The Most Recent C 13. Concurrency 14. Synchronization 15. Thread Execution 16. Thread Synchronization 17. Process Execution 18. Process Synchronization 19. Single-Host IPC and Sockets 20. Socket Programming 21. Integration with Other Languages 22. Unit Testing and Debugging 23. Build Systems 24. Other Books You May Enjoy
25. Leave a review - let other readers know what you think
26. Index

Compiler

As we discussed in the previous sections, the compiler accepts the translation unit prepared by the preprocessor and generates the corresponding assembly instructions. When multiple C sources are compiled into their equivalent assembly code, the existing tools in the platform, such as the assembler and the linker, manage the rest by making relocatable object files out of the generated assembly code and finally linking them together (and possibly with other object files) to form a library or an executable file.

As an example, we spoke about as and ld as two examples among the many available tools in Unix for C development. These tools are mainly used to create platform-compatible object files. These tools exist necessarily outside of gcc or any other compiler. By existing outside of any compiler, we actually mean that they are not developed as a part of gcc (we have chosen gcc as an example) and they should be available on any platform even without having gcc installed. gcc only uses them in its compilation pipeline, and they are not embedded into gcc.

That is because the platform itself is the most knowledgeable entity that knows about the instruction set accepted by its processor and the operating system-specific formats and restrictions. The compiler is not usually aware of these constraints unless it wants to do some optimization on the translation unit. Therefore, we can conclude that the most important task that gcc does is to translate the translation unit into assembly instructions. This is what we actually call compilation.

One of the challenges in C compilation is to generate correct assembly instructions that can be accepted by the target architecture. It is possible to use gcc to compile the same C code for various architectures such as ARM, Intel x86, AMD, and many more. As we discussed before, each architecture has an instruction set that is accepted by its processor, and gcc (or any C compiler) is the sole responsible entity that should generate correct assembly code for a specific architecture.

The way that gcc (or any other C compiler) overcomes this difficulty is to split the mission into two steps, first parsing the translation unit into an relocatable and C-independent data structure called an Abstract Syntax Tree (AST), and then using the created AST to generate the equivalent assembly instructions for the target architecture. The first part is architecture-independent and can be done regardless of the target instruction set. But the second step is architecture-dependent, and the compiler should be aware of the target instruction set. The subcomponent that performs the first step is called a compiler frontend, and the subcomponent that performs the later step is called a compiler backend.

In the following sections, we are going to discuss these steps in more depth. First, let's talk about the AST.

Abstract syntax tree

As we have explained in the previous section, a C compiler frontend should parse the translation unit and create an intermediate data structure. The compiler creates this intermediate data structure by parsing the C source code according to the C grammar and saving the result in a tree-like data structure that is not architecture-dependent. The final data structure is commonly referred to as an AST.

ASTs can be generated for any programming language, not only C, so the AST structure must be abstract enough to be independent of C syntax.

This is enough to change the compiler frontend to support other languages. This is exactly why you can find GNU Compiler Collection (GCC), which gcc is a part of as the C compiler, or Low-Level Virtual Machine (LLVM), which clang is a part of as the C compiler, as a collection of compilers for many languages beyond just C and C++ such as Java, Fortran, and so on.

Once the AST is produced, the compiler backend can start to optimize the AST and generate assembly code based on the optimized AST for a target architecture. To get a better understanding of ASTs, we are going to take a look at a real AST. In this example, we have the following C source code:

int main() {
  int var1 = 1;
  double var2 = 2.5;
  int var3 = var1 + var2;
  return 0;
}

Code Box 2-7 [ExtremeC_examples_chapter2_2.c]: Simple C code whose AST is going to be generated

The next step is to use clang to dump the AST within the preceding code. In the following figure, Figure 2-1, you can see the AST:

Figure 2-1: The AST generated and dumped for example 2.2

So far, we have used clang in various places as a C compiler, but let's introduce it properly. clang is a C compiler frontend developed by the LLVM Developer Group for the llvm compiler backend. The LLVM Compiler Infrastructure Project uses an intermediate representation – or LLVM IR – as its abstract data structure used between its frontend and its backend. LLVM is famous for its ability to dump its IR data structure for research purposes. The preceding tree-like output is the IR generated from the source code of example 2.2.

What we have done here is introduce you to the basics of AST. We are not going through the details of the preceding AST output because each compiler has its own AST implementation. We would require several chapters to cover all of the details on this, and that is beyond the scope of this book.

However, if you pay attention to the above figure, you can find a line that starts with -FunctionDecl. This represents the main function. Before that, you can find meta information regarding the translation unit passed to the compiler.

If you continue after FunctionDecl, you will find tree entries – or nodes – for declaration statements, binary operator statements, the return statement, and even implicit cast statements. There are lots of interesting things residing in an AST, with countless things to learn!

Another benefit of having an AST for source code is that you can rearrange the order of instructions, prune some unused branches, and replace branches so that you have better performance but preserve the purpose of the program. As we pointed out before, it is called optimization and it is usually done to a certain configurable extent by any C compiler.

The next component that we are going to discuss in more detail is the assembler.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image