Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Extreme C

You're reading from   Extreme C Taking you to the limit in Concurrency, OOP, and the most advanced capabilities of C

Arrow left icon
Product type Paperback
Published in Oct 2019
Publisher Packt
ISBN-13 9781789343625
Length 822 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Kamran Amini Kamran Amini
Author Profile Icon Kamran Amini
Kamran Amini
Arrow right icon
View More author details
Toc

Table of Contents (27) Chapters Close

Preface 1. Essential Features FREE CHAPTER 2. From Source to Binary 3. Object Files 4. Process Memory Structure 5. Stack and Heap 6. OOP and Encapsulation 7. Composition and Aggregation 8. Inheritance and Polymorphism 9. Abstraction and OOP in C++ 10. Unix – History and Architecture 11. System Calls and Kernels 12. The Most Recent C 13. Concurrency 14. Synchronization 15. Thread Execution 16. Thread Synchronization 17. Process Execution 18. Process Synchronization 19. Single-Host IPC and Sockets 20. Socket Programming 21. Integration with Other Languages 22. Unit Testing and Debugging 23. Build Systems 24. Other Books You May Enjoy
25. Leave a review - let other readers know what you think
26. Index

Linker

The first big step in building a C project is compiling all the source files to their corresponding relocatable object files. This step is a necessary step in preparing the final products, but alone, it is not enough, and one more step is still needed. Before going through the details of this step, we need to have a quick look at the possible products (sometimes referred to as artifacts) in a C project.

A C/C++ project can lead to the following products:

  • A number of executable files that usually have the .out extension in most Unix-like operating systems. These files usually have the .exe extension in Microsoft Windows.
  • A number of static libraries that usually have the .a extension in most Unix-like operating systems. These files have the .lib extension in Microsoft Windows.
  • A number of dynamic libraries or shared object files that usually have the .so extension in most Unix-like operating systems. These files have the .dylib extension in macOS, and .dll in Microsoft Windows.

Relocatable object files are not considered as one of these products; hence, you cannot find them in the preceding list. Relocatable object files are temporary products simply because they only take part in the linking step to produce the preceding products, and after that, we don't need them anymore. The linker component has the sole responsibility of producing the preceding products from the given relocatable object files.

One final and important note about the used terminology: all these three products are called object files. Therefore, it is best to use the term relocatable before the term object file when referring to an object file produced by the assembler as an intermediate product.

We'll now briefly describe each of the final products. The upcoming chapter is totally dedicated to the object files and it will discuss these final products in greater detail.

An executable object file can be run as a process. This file usually contains a substantial portion of the features provided by a project. It must have an entry point where the machine-level instructions are executed. While the main function is the entry point of a C program, the entry point of an executable object file is platform-dependent, and it is not the main function. The main function will eventually be called after some preparations made by a group of platform-specific instructions, which have been added by the linker as the result of the linking step.

A static library is nothing more than an archive file that contains several relocatable object files. Therefore, a static library file is not produced by the linker directly. Instead, it is produced by the default archive program of the system, which on a Unix-like system is the ar program.

Static libraries are usually linked to other executable files, and they then become part of those executable files. They are the simplest and easiest way to encapsulate a piece of logic so that you can use it at a later point. There is an enormous number of static libraries that exist within an operating system, with each of them containing a specific piece of logic that can be used to access a certain functionality within that operating system.

Shared object files, which have a more complicated structure rather than simply being an archive, are created directly by the linker. They are also used differently; namely, before they are used, they need to be loaded into a running process at runtime.

This is in opposition to static libraries that are used at link time to become part of the final executable file. In addition, a single shared object file can be loaded and used by multiple different processes at the same time. As part of the next chapter, we demonstrate how shared object files can be loaded and used by a C program at runtime.

In the upcoming section, we explain what happens in the linking step and what elements are involved and used by the linker to produce the final products, especially executable files.

How does the linker work?

In this section, we are going to explain how the linker component works and what we exactly mean by linking. Suppose that you are building a C project that contains five source files, with the final product being an executable. As part of the build process, you have compiled all the source files, and now you have five relocatable object files. What you now need is a linker to complete the last step and produce the final executable file.

Based on what we have said so far, to put it simply, a linker combines all of the relocatable object files, in addition to specified static libraries, in order to create the final executable object file. However, you would be wrong if you thought that this step was straightforward.

There are a few concerns, which come from the contents of the object files, that need to be considered when we are combining the object files in order to produce a working executable object file. In order to see how the linker works, we need to know how it uses the relocatable object files, and for this purpose, we need to find out what is inside an object file.

The simple answer is that an object file contains the equivalent machine-level instructions for a translation unit. However, these instructions are not put into the file in random order. Instead, they are grouped under sections called symbols.

In fact, there are many things in an object file, but symbols are one component that explains how the linker works and how it ties some object files together to produce a larger one. In order to explain symbols, let's talk about them in the context of an example: example 2.3. Using this example, we want to demonstrate how some functions are compiled and placed in the corresponding relocatable object file. Take a look at the following code, which contains two functions:

int average(int a, int b) {
  return (a + b) / 2;
}
int sum(int* numbers, int count) {
  int sum = 0;
  for (int i = 0; i < count; i++) {
    sum += numbers[i];
  }
  return sum;
}

Code Box 2-8 [ExtremeC_examples_chapter2_3.c]: A code with two function definitions

Firstly, we need to compile the preceding code in order to produce the corresponding object file. The following command produces the object file, target.o. We are compiling the code on our default platform:

$ gcc -c ExtremeC_examples_chapter2_3.c -o target.o
$

Shell Box 2-12: Compiling the source file in example 2.3

Next, we use the nm utility to look into the target.o object file. The nm utility allows us to see the symbols that can be found inside an object file:

$ nm target.o
0000000000000000 T average
000000000000001d T sum
$

Shell Box 2-13: Using the nm utility to see the defined symbols in a relocatable object file

The preceding shell box shows the symbols defined in the object file. As you can see, their names are exactly the same as the function defined in Code Box 2-8.

If you use the readelf utility, like we have done in the following shell box, you can see the symbol table existing in the object file. A symbol table contains all the symbols defined in an object file and it can give you more information about the symbols:

$ readelf -s target.o
Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ExtremeC_examples_chapter
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     8: 0000000000000000    29 FUNC    GLOBAL DEFAULT    1 average
     9: 000000000000001d    69 FUNC    GLOBAL DEFAULT    1 sum
$

Shell Box 2-14: Using the readelf utility to see the symbol table of a relocatable object file

As you can see in the output of readelf, there are two function symbols in the symbol table. There are also other symbols in the table that refer to different sections within the object file. We will discuss some of these symbols in this chapter and the next chapter.

If you want to see the disassembly of the machine-level instructions, under each function symbol, then you can use the objdump tool:

$ objdump -d target.o
target.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <average>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   89 7d fc                mov    %edi,-0x4(%rbp)
   7:   89 75 f8                mov    %esi,-0x8(%rbp)
   a:   8b 55 fc                mov    -0x4(%rbp),%edx
   d:   8b 45 f8                mov    -0x8(%rbp),%eax
  10:   01 d0                   add    %edx,%eax
  12:   89 c2                   mov    %eax,%edx
  14:   c1 ea 1f                shr    $0x1f,%edx
  17:   01 d0                   add    %edx,%eax
  19:   d1 f8                   sar    %eax
  1b:   5d                      pop    %rbp
  1c:   c3                      retq
000000000000001d <sum>:
  1d:   55                      push   %rbp
  1e:   48 89 e5                mov    %rsp,%rbp
  21:   48 89 7d e8             mov    %rdi,-0x18(%rbp)
  25:   89 75 e4                mov    %esi,-0x1c(%rbp)
  28:   c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%rbp)
  2f:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  36:   eb 1d                   jmp    55 <sum+0x38>
  38:   8b 45 fc                mov    -0x4(%rbp),%eax
  3b:   48 98                   cltq
  3d:   48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
  44:   00
  45:   48 8b 45 e8             mov    -0x18(%rbp),%rax
  49:   48 01 d0                add    %rdx,%rax
  4c:   8b 00                   mov    (%rax),%eax
  4e:   01 45 f8                add    %eax,-0x8(%rbp)
  51:   83 45 fc 01             addl   $0x1,-0x4(%rbp)
  55:   8b 45 fc                mov    -0x4(%rbp),%eax
  58:   3b 45 e4                cmp    -0x1c(%rbp),%eax
  5b:   7c db                   jl     38 <sum+0x1b>
  5d:   8b 45 f8                mov    -0x8(%rbp),%eax
  60:   5d                      pop    %rbp
  61:   c3                      retq
$

Shell Box 2-15: Using the objdump utility to see the instructions of the symbols defined in a relocatable object file

Based on what we see, each function symbol corresponds to a function that has been defined in the source code. When you need to link several relocatable object files, in order to produce an executable object file, this shows that each of the relocatable object files contains only a portion of the whole required function symbols needed to build a complete executable program.

Now, going back to the topic of this section, the linker gathers all the symbols from the various relocatable object files before putting them together in a bigger object file to form a complete executable binary. In order to demonstrate this in a real scenario, we need a different example that has some functions distributed in a number of source files. This way, we can show how the linker looks up the symbols in the given relocatable object files, in order to produce an executable file.

Example 2.4 consists of four C files – three source files and one header file. In the header file, we have declared two functions, with each one defined in its own source file. The third source file contains the main function.

The functions in example 2.4 are amazingly simple, and after compilation, each function will contain a few machine-level instructions within their corresponding object files. In addition, example 2.4 will not include any of the standard C header files. We have chosen this in order to have a small translation unit for each source file.

The following code box shows the header file:

#ifndef EXTREMEC_EXAMPLES_CHAPTER_2_4_DECLS_H
#define EXTREMEC_EXAMPLES_CHAPTER_2_4_DECLS_H
int add(int, int);
int multiply(int, int);
#endif

Code Box 2-9 [ExtremeC_examples_chapter2_4_decls.h]: The declaration of the functions in example 2.4

Looking at that code, you can see that we used the header guard statements to prevent double inclusion. More than that, two functions with similar signatures are declared. Each of them receives two integers as input and will return another integer as a result.

As we said before, each of these functions are implemented in separate source files. The first source file looks as follows:

int add(int a, int b) {
  return a + b;
}

Code Box 2-10 [ExtremeC_examples_chapter2_4_add.c]: The definition of the add function

We can clearly see that the source file has not included any other header files. However, it does define a function that follows the exact same signature that we have declared in the header file.

As we can see next, the second source file is similar to the first one. This one contains the definition of the multiply function:

int multiply(int a, int b) {
  return a * b;
}

Code Box 2-11 [ExtremeC_examples_chapter2_4_multiply.c]: The definition of the multiply function

We can now move onto the third source file, which contains the main function:

#include "ExtremeC_examples_chapter2_4_decls.h"
int main(int argc, char** argv) {
  int x = add(4, 5);
  int y = multiply(9, x);
  return 0;
}

Code Box 2-12 [ExtremeC_examples_chapter2_4_main.c]: The main function of example 2.4

The third source file has to include the header file in order to obtain the declarations of both functions. Otherwise, the source file will not be able to use the add and multiply functions, simply because they are not declared, and this may result in a compilation failure.

In addition, the main function does not know anything about the definitions of either add or multiply. Therefore, we need to ask an important question: how does the main function find these definitions when it does not even know about the other source files? Note that the file shown in Code Box 2-12 has only included one header file, and therefore it has no relationship with the other two source files.

The above question can be resolved by bringing the linker into consideration. The linker will gather the required definitions from various object files and put them together, and this way, the code written in the main function can finally use the code written in another function.

Note:

To compile a source file that uses a function, the declaration is enough. However, to actually run your program, the definition should be provided to the linker in order to be put into the final executable file.

Now, it's time to compile example 2.4 and demonstrate what we've said so far. Using the following commands, we create corresponding relocatable object files. You need to remember that we only compile source files:

$ gcc -c ExtremeC_examples_chapter2_4_add.c -o add.o
$ gcc -c ExtremeC_examples_chapter2_4_multiply.c -o multiply.o
$ gcc -c ExtremeC_examples_chapter2_4_main.c -o main.o
$

Shell Box 2-16: Compiling all sources in example 2.4 to their corresponding relocatable object files

For the next step, we are going to look at the symbol table contained in each relocatable object file:

$ nm add.o
0000000000000000 T add
$

Shell Box 2-17: Listing the symbols defined in add.o

As you see, the add symbol has been defined. The next object file:

$ nm multiply.o
0000000000000000 T multiply
$

Shell Box 2-18: Listing the symbols defined in multiply.o

The same happens to the multiply symbol within multiply.o. And the final object file:

$ nm main.o
                 U add
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T main
                 U multiply
$

Shell Box 2-19: Listing the symbols defined in main.o

Despite the fact that the third source file, Code Box 2-12, has only the main function, we see two symbols for add and multiply in its corresponding object file. However, they are different from the main symbol, which has an address inside the object file. They are marked as U, or unresolved. This means that while the compiler has seen these symbols in the translation unit, it has not been able to find their actual definitions. And this is exactly what we expected and explained before.

The source file containing the main function, Code Box 2-12, should not know anything about the definitions of other functions if they are not defined in the same translation unit, but the fact that the main definition is dependent on the declarations of add and multiply should be somehow pointed out in the corresponding relocatable object file.

To summarize where we are now, we have three intermediate object files, with one of them having two unresolved symbols. This has now made the job of the linker clear; we need to give the linker the necessary symbols that can be found in other object files. After having found all of the required symbols, the linker can continue to combine them in order to create a final executable binary that works.

If the linker is not able to find the definition of an unresolved symbol, it will fail, and inform us by printing a linkage error.

For the next step, we want to link the preceding object files together. The following command will do that:

$ gcc add.o multiply.o main.o
$

Shell Box 2-20: Linking all object files together

We should note here that running gcc with a list of object files, without passing any option, will result in the linking step trying to create an executable object file out of the input object files. Actually, it calls the linker in the background with the given object files, together with some other static libraries and object files, that are required on the platform.

To examine what happens if the linker fails to find proper definitions, we are going to provide the linker with only two intermediate object files, main.o and add.o:

$ gcc add.o main.o
main.o: In function 'main':
ExtremeC_examples_chapter2_4_main.c:(.text+0x2c): undefined reference to 'multiply'
collect2: error: ld returned 1 exit status
$

Shell Box 2-21: Linking only two of the object files: add.o and main.o

As you can see, the linker has failed because it could not find the multiply symbol in the provided object files.

Moving on, let's provide the other two object files, main.o and multiply.o:

$ gcc main.o multiply.o
main.o: In function 'main':
ExtremeC_examples_chapter2_4_main.c:(.text+0x1a): undefined reference to 'add'
collect2: error: ld returned 1 exit status
$

Shell Box 2-22: Linking only two of the object files, multiply.o and main.o

As expected, the same thing occurred. This happened since the add symbol could not be found in the provided object files.

Finally, let's provide the only remaining combination of two object files, add.o and multiply.o. Before we run it, we should expect it to work since neither object file has unresolved symbols in their symbol tables. Let's see what happens:

$ gcc add.o multiply.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In function '_start':
(.text+0x20): undefined reference to 'main'
collect2: error: ld returned 1 exit status
$

Shell Box 2-23: Linking only two of the object files, add.o and multiply.o

As you see, the linker has failed again! Looking at the output, we can see the reason was that none of the object files contain the main symbol that is necessary to create an executable. The linker needs an entry point for the program, which is the main function according to the C standard.

At this point – and I cannot emphasize this enough – pay attention to the place where a reference to the main symbol has been made. It has been made in the _start function in a file located at /usr/lib/gcc/x86_64-Linux-gnu/7/../../../x86_64-Linux-gnu/Scrt1.o.

The Scrt1.o file seems to be a relocatable object file that has not been created by us. Scrt1.o is actually a file that is part of a group of default C object files. These default object files have been compiled for Linux as a part of the gcc bundle and are linked to any program in order to make it runnable.

As you have just seen, there are a lot of different things that are happening around your source code that can cause conflicts. Not only that, but there are a number of other object files that need to be linked to your program in order to make it executable.

Linker can be fooled!

To make our current discussion even more interesting, there are rare scenarios when the linking step will perform as we planned, but the final binary step does not work as expected. In this section, we are going to look at an example of this occurring.

Example 2.5 is based on an incorrect definition having been gathered by the linker and put into the final executable object file.

This example has two source files, one of which contains the definition of a function with the same name, but a different signature from the declaration used by the main function. The following code boxes are the contents of these two source files. Here's the first source file:

int add(int a, int b, int c, int d) {
  return a + b + c + d;
}

Code Box 2-13 [ExtremeC_examples_chapter2_5_add.c]: Definition of the add function in example 2.5

And, following is the second source file:

#include <stdio.h>
int add(int, int);
int main(int argc, char** argv) {
  int x = add(5, 6);
  printf("Result: %d\n", x);
  return 0;
}

Code Box 2-14 [ExtremeC_examples_chapter2_5_main.c]: The main function in example 2.5

As you can see, the main function is using another version of the add function with a different signature, accepting two integers, but the add function defined in the first source file, Code Box 2-13, is accepting four integers.

These functions are usually said to be the overloads of each other. For sure, there should be something wrong if we compile and link these source files. It's interesting to see if we can build the example successfully.

The next step is to compile and link the relocatable object files, which we can do by running the following code:

$ gcc -c ExtremeC_examples_chapter2_5_add.c -o add.o
$ gcc -c ExtremeC_examples_chapter2_5_main.c -o main.o
$ gcc add.o main.o -o ex2_5.out
$

Shell Box 2-24: Building example 2.5

As you can see in the shell output, the linking step went well, and the final executable has been produced! This clearly shows that the symbols can fool the linker. Now let's look at the output after running the executable:

$ ./ex2_5.out
Result: -1885535197
$ ./ex2_5.out
Result: 1679625283
$

Shell Box 2-25: Running example 2.5 twice and the strange results!

As you can see, the output is wrong; it even changes in different runs! This example shows that bad things can happen when the linker picks up the wrong version of a symbol. Regarding the function symbols, they are just names and they don't carry any information regarding the signature of the corresponding function. Function arguments are nothing more than a C concept; in fact, they do not truly exist in either assembly code or machine-level instructions.

In order to investigate more, we are going to look at the disassembly of the add functions in a different example. In example 2.6, we have two add functions with the same signatures that we had in example 2.5.

To study this, we are going to work from the idea that we have the following source files in example 2.6:

int add(int a, int b, int c, int d) {
  return a + b + c + d;
}

Code Box 2-15 [ExtremeC_examples_chapter2_6_add_1.c]: The first definition of add in example 2.6

The following code is the other source file:

int add(int a, int b) {
  return a + b;
}

Code Box 2-16 [ExtremeC_examples_chapter2_6_add_2.c]: The second definition of add in example 2.6

The first step, just like before, is to compile both source files:

$ gcc -c ExtremeC_examples_chapter2_6_add_1.c -o add_1.o
$ gcc -c ExtremeC_examples_chapter2_6_add_2.c -o add_2.o
$

Shell Box 2-26: Compiling the source files in example 2.6 to their corresponding object files

We then need to have a look at the disassembly of the add symbol in different object files. Therefore, we start with the add_1.o object file:

$ objdump -d add_1.o
add_1.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <add>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   89 7d fc                mov    %edi,-0x4(%rbp)
   7:   89 75 f8                mov    %esi,-0x8(%rbp)
   a:   89 55 f4                mov    %edx,-0xc(%rbp)
   d:   89 4d f0                mov    %ecx,-0x10(%rbp)
  10:   8b 55 fc                mov    -0x4(%rbp),%edx
  13:   8b 45 f8                mov    -0x8(%rbp),%eax
  16:   01 c2                   add    %eax,%edx
  18:   8b 45 f4                mov    -0xc(%rbp),%eax
  1b:   01 c2                   add    %eax,%edx
  1d:   8b 45 f0                mov    -0x10(%rbp),%eax
  20:   01 d0                   add    %edx,%eax
  22:   5d                      pop    %rbp
  23:   c3
$

Shell Box 2-27: Using objdump to look at the disassembly of the add symbol in add_1.o

The following shell box shows us the disassembly of the add symbol found in the other object file, add_2.o:

$ objdump -d add_2.o
add_2.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <add>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   89 7d fc                mov    %edi,-0x4(%rbp)
   7:   89 75 f8                mov    %esi,-0x8(%rbp)
   a:   8b 55 fc                mov    -0x4(%rbp),%edx
   d:   8b 45 f8                mov    -0x8(%rbp),%eax
  10:   01 d0                   add    %edx,%eax
  12:   5d                      pop    %rbp
  13:   c3                      retq
$

Shell Box 2-28: Using objdump to look at the disassembly of the add symbol in add_2.o

When a function call takes place, a new stack frame is created on top of the stack. This stack frame contains both the arguments passed to the function and the return address. You will read more about the function call mechanism in Chapter 4, Process Memory Structure, and Chapter 5, Stack and Heap.

In shell boxes 2-27 and 2-28, you can clearly see how the arguments are collected from the stack frame. In the disassembly of add_1.o, Shell Box 2-27, you can see the following lines:

4:  89 7d fc                mov    %edi,-0x4(%rbp)
7:  89 75 f8                mov    %esi,-0x8(%rbp)
a:  89 55 f4                mov    %edx,-0xc(%rbp)
d:  89 4d f0                mov    %ecx,-0x10(%rbp)

Code Box 2-17: The assembly instructions to copy the arguments from the stack frame to the registers for the first add function

These instructions copy four values from the memory addresses, which have been pointed by the %rbp register, and put them into the local registers.

Note:

Registers are locations within a CPU that can be accessed quickly. Therefore, it would be highly efficient for the CPU to bring the values from main memory into its registers first, and then perform calculations on them. The register %rbp is the one that points to the current stack frame, containing the arguments passed to a function.

If you look at the disassembly of the second object file, while it is very similar, it differs by not having the copy operation four times:

4:  89 7d fc                mov    %edi,-0x4(%rbp)
7:  89 75 f8                mov    %esi,-0x8(%rbp)

Code Box 2-18: The assembly instructions to copy the arguments from the stack frame to the registers for the second add function

These instructions copy two values simply because the function only expects two arguments. This is why we saw those strange values in the output of example 2.5. The main function only puts two values into the stack frame while calling the add function, but the add definition was actually expecting four arguments. So, it is likely that the wrong definition continues to go beyond the stack frame to read the missing arguments, which results in the wrong values for the sum operation.

We could prevent this by changing the function symbol names based on the input types. This is usually referred to as name mangling and is mostly used in C++ because of its function overloading feature. We discuss this briefly in the last section of the chapter.

C++ name mangling

To highlight how name mangling works in C++, we are going to compile example 2.6 using a C++ compiler. Therefore, we will use the GNU C++ compiler g++ for this purpose.

Once we have done that, we can use readelf to dump the symbol tables for each generated object file. By doing this, we can see how C++ has changed the name of the function symbols based on the types of input parameters.

As we have noted before, the compilation pipelines of C and C++ are very similar. Therefore, we can expect to have relocatable object files as a result of C++ compilation. Let's look at both of the object files produced as part of compiling example 2.6:

$ g++ -c ExtremeC_examples_chapter2_6_add_1.o
$ g++ -c ExtremeC_examples_chapter2_6_add_2.o
$ readelf -s ExtremeC_examples_chapter2_6_add_1.o
Symbol table '.symtab' contains 9 entries:
  Num:    Value          Size Type    Bind   Vis      Ndx Name
   0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
   1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ExtremeC_examples_chapter
   2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
   3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
   4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
   5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
   6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
   7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
   8: 0000000000000000    36 FUNC    GLOBAL DEFAULT    1 _Z3addiiii
$ readelf -s ExtremeC_examples_chapter2_6_add_2.o
Symbol table '.symtab' contains 9 entries:
  Num:    Value          Size Type    Bind   Vis      Ndx Name
   0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
   1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ExtremeC_examples_chapter
   2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
   3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2
   4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
   5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
   6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
   7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
   8: 0000000000000000    20 FUNC    GLOBAL DEFAULT    1 _Z3addii
$

Shell Box 2-29: Using readelf the see the symbol tables of the object files produced by a C++ compiler

As you can see in the output, we have two different symbol names for different overloads of the add function. The overload that accepts four integers has the symbol name _Z3addiiii, and the other overload, which accepts two integers, has the symbol name _Z3addii.

Every i in the symbol name refers to one of the integer input parameters.

From that, you can see the symbol names are different, and if you try to use the wrong one, you will get a linking error as a result of the linker not being able to find the definition of a wrong symbol. Name mangling is the technique that enables C++ to support function overloading and it helps to prevent the problems we encountered in the previous section.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image