You're reading from C++ High Performance Master the art of optimizing the functioning of your C++ code

Product type Paperback

Published in Dec 2020

Publisher Packt

ISBN-13 9781839216541

Length 544 pages

Edition 2nd Edition

Languages

C++

Concepts

High Performance Programming

Authors (2):

Viktor Sehr

Björn Andrist

View More author details

Table of Contents (17) Chapters

Preface

1. A Brief Introduction to C++

2. Essential C++ Techniques FREE CHAPTER

3. Analyzing and Measuring Performance

4. Data Structures

5. Algorithms

6. Ranges and Views

7. Memory Management

8. Compile-Time Programming

9. Essential Utilities

10. Proxy Objects and Lazy Evaluation

11. Concurrency

12. Coroutines and Lazy Generators

13. Asynchronous Programming with Coroutines

14. Parallel Algorithms

15. Other Books You May Enjoy

16. Index

Objects in memory

All the objects we use in a C++ program reside in memory. Here, we will explore how objects are created and deleted from memory, and also describe how objects are laid out in memory.

Creating and deleting objects

In this section, we will dig into the details of using new and delete. Consider the following way of using new to create an object on the free store and then deleting it using delete:

auto* user = new User{"John"};  // allocate and construct 
user->print_name();             // use object 
delete user;                    // destruct and deallocate

I don't recommend that you call new and delete explicitly in this manner, but let's ignore that for now. Let's get to the point; as the comments suggest, new actually does two things, namely:

Allocates memory to hold a new object of the User type
Constructs a new User object in the allocated memory space by calling the constructor of the User class

The same thing goes with delete, it:

Destructs the User object by calling its destructor
Deallocates/frees the memory that the User object was placed in

It is actually possible to separate these two actions (memory allocation and object construction) in C++. This is rarely used but has some important and legitimate use cases when writing library components.

Placement new

C++ allows us to separate memory allocation from object construction. We could, for example, allocate a byte array with malloc() and construct a new User object in that region of memory. Have a look at the following code snippet:

auto* memory = std::malloc(sizeof(User));
auto* user = ::new (memory) User("john");

The perhaps unfamiliar syntax that's using ::new (memory) is called placement new. It is a non-allocating form of new, which only constructs an object. The double colon (::) in front of new ensures that the resolution occurs from the global namespace to avoid picking up an overloaded version of operator new.

In the preceding example, placement new constructs the User object and places it at the specified memory location. Since we are allocating the memory with std::malloc() for a single object, it is guaranteed to be correctly aligned (unless the class User has been declared to be overaligned). Later on, we will explore cases where we have to take alignment into account when using placement new.

There is no placement delete, so in order to destruct the object and free the memory, we need to call the destructor explicitly and then free the memory:

user->~User();
std::free(memory);

This is the only time you should call a destructor explicitly. Never call a destructor like this unless you have created an object with placement new.

C++17 introduces a set of utility functions in <memory> for constructing and destroying objects without allocating or deallocating memory. So, instead of calling placement new, it is now possible to use some of the functions from <memory> whose names begin with std::uninitialized_ for constructing, copying, and moving objects to an uninitialized memory area. And instead of calling the destructor explicitly, we can now use std::destroy_at() to destruct an object at a specific memory address without deallocating the memory.

The previous example could be rewritten using these new functions. Here is how it would look:

auto* memory = std::malloc(sizeof(User));
auto* user_ptr = reinterpret_cast<User*>(memory);
std::uninitialized_fill_n(user_ptr, 1, User{"john"});
std::destroy_at(user_ptr);
std::free(memory);

C++20 also introduces std::construct_at(), which makes it possible to replace the std::uninitialized_fill_n() call with:

std::construct_at(user_ptr, User{"john"});        // C++20

Please keep in mind that we are showing these naked low-level memory facilities to get a better understanding of memory management in C++. Using reinterpret_cast and the memory utilities demonstrated here should be kept to an absolute minimum in a C++ code base.

Next, you will see what operators are called when we use the new and delete expressions.

The new and delete operators

The function operator new is responsible for allocating memory when a new expression is invoked. The new operator can be either a globally defined function or a static member function of a class. It is possible to overload the global operators new and delete. Later in this chapter, we will see that this can be useful when analyzing memory usage.

Here is how to do it:

auto operator new(size_t size) -> void* { 
  void* p = std::malloc(size); 
  std::cout << "allocated " << size << " byte(s)\n"; 
  return p; 
} 
 
auto operator delete(void* p) noexcept -> void { 
  std::cout << "deleted memory\n"; 
  return std::free(p); 
}

We can verify that our overloaded operators are actually being used when creating and deleting a char object:

auto* p = new char{'a'}; // Outputs "allocated 1 byte(s)"
delete p;                // Outputs "deleted memory"

When creating and deleting an array of objects using the new[] and delete[] expressions, there is another pair of operators that are being used, namely operator new[] and operator delete[]. We can overload these operators in the same way:

auto operator new[](size_t size) -> void* {
  void* p = std::malloc(size); 
  std::cout << "allocated " << size << " byte(s) with new[]\n"; 
  return p; 
} 
 
auto operator delete[](void* p) noexcept -> void { 
  std::cout << "deleted memory with delete[]\n"; 
  return std::free(p); 
}

Keep in mind that if you overload operator new, you should also overload operator delete. Functions for allocating and deallocating memory come in pairs. Memory should be deallocated by the allocator that the memory was allocated by. For example, memory allocated with std::malloc() should always be freed using std::free(), while memory allocated with operator new[] should be deallocated using operator delete[].

It is also possible to override a class-specific operator new or operator delete. This is probably more useful than overloading the global operators, since it is more likely that we need a custom dynamic memory allocator for a specific class.

Here, we are overloading operator new and operator delete for the Document class:

class Document { 
// ...
public:  
  auto operator new(size_t size) -> void* {
    return ::operator new(size);
  } 
  auto operator delete(void* p) -> void {
    ::operator delete(p); 
  } 
};

The class-specific version of new will be used when we create new dynamically allocated Document objects:

auto* p = new Document{}; // Uses class-specific operator new
delete p;

If we instead want to use global new and delete, it is still possible by using the global scope (::):

auto* p = ::new Document{}; // Uses global operator new
::delete p;

We will discuss memory allocators later in this chapter and we will then see the overloaded new and delete operators in use.

To summarize what we have seen so far, a new expression involves two things: allocation and construction. operator new allocates memory and you can overload it globally or per class to customize dynamic memory management. Placement new can be used to construct an object in an already allocated memory area.

Another important, but rather low-level, topic that we need to understand in order to use memory efficiently is the alignment of memory.

Memory alignment

The CPU reads memory into its registers one word at a time. The word size is 64 bits on a 64-bit architecture, 32 bits on a 32-bit architecture, and so forth. For the CPU to work efficiently when working with different data types, it has restrictions on the addresses where objects of different types are located. Every type in C++ has an alignment requirement that defines the addresses at which an object of a certain type should be located in memory.

If the alignment of a type is 1, it means that the objects of that type can be located at any byte address. If the alignment of a type is 2, it means that the number of bytes between successive allowed addresses is 2. Or to quote the C++ standard:

"An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated."

We can use alignof to find out the alignment of a type:

// Possible output is 4  
std::cout << alignof(int) << '\n';

When I run this code, it outputs 4, which means that the alignment requirement of the type int is 4 bytes on my platform.

The following figure shows two examples of memory from a system with 64-bit words. The upper row contains three 4-byte integers, which are located on addresses that are 4 bytes aligned. The CPU can load these integers into registers in an efficient way and never need to read multiple words when accessing one of the int members. Compare this with the second row, which contains two int members, which are located at unaligned addresses. The second int even spans over two-word boundaries. In the best case, this is just inefficient, but on some platforms, the program will crash:

Figure 7.6: Two examples of memory that contain ints at aligned and unaligned memory addresses

Let's say that we have a type with an alignment requirement of 2. The C++ standard doesn't say whether the valid addresses are 1, 3, 5, 7... or 0, 2, 4, 6.... All platforms that we are aware of start counting addresses at 0, so, in practice we could check if an object is correctly aligned by checking if its address is a multiple of the alignment using the modulo operator (%).

However, if we want to write fully portable C++ code, we need to use std::align() and not modulo to check the alignment of an object. std::align() is a function from <memory> that will adjust a pointer according to an alignment that we pass as an argument. If the memory address we pass to it is already aligned, the pointer will not be adjusted. Therefore, we can use std::align() to implement a small utility function called is_aligned(), as follows:

bool is_aligned(void* ptr, std::size_t alignment) {
  assert(ptr != nullptr);
  assert(std::has_single_bit(alignment)); // Power of 2
  auto s = std::numeric_limits<std::size_t>::max();
  auto aligned_ptr = ptr;
  std::align(alignment, 1, aligned_ptr, s);
  return ptr == aligned_ptr;
}

At first, we make sure that the ptr argument isn't null and that alignment is a power of 2, which is stated as a requirement in the C++ standard. We are using C++20 std::has_single_bit() from the <bit> header to check this. Next, we are calling std::align(). The typical use case for std::align() is when we have a memory buffer of some size in which we want to store an object with some alignment requirement. In this case, we don't have a buffer, and we don't care about the size of the objects, so we say that the object is of size 1 and the buffer is the maximum value of a std::size_t. Then, we can compare the original ptr and the adjusted aligned_ptr to see if the original pointer was already aligned. We will have use for this utility in the examples to come.

When allocating memory with new or std::malloc(), the memory we get back should be correctly aligned for the type we specify. The following code shows that the memory allocated for int is at least 4 bytes aligned on my platform:

auto* p = new int{};
assert(is_aligned(p, 4ul)); // True

In fact, new and malloc() are guaranteed to always return memory suitably aligned for any scalar type (if it manages to return memory at all). The <cstddef> header provides us with a type called std::max_align_t, whose alignment requirement is at least as strict as all the scalar types. Later on, we will see that this type is useful when writing custom memory allocators. So, even if we only request memory for char on the free store, it will be aligned suitably for std::max_align_t.

The following code shows that the memory returned from new is correctly aligned for std::max_align_t and also for any scalar type:

auto* p = new char{}; 
auto max_alignment = alignof(std::max_align_t);
assert(is_aligned(p, max_alignment)); // True

Let's allocate char two times in a row with new:

auto* p1 = new char{'a'};
auto* p2 = new char{'b'};

Then, the memory may look something like this:

Figure 7.7: Memory layout after two separate allocations of one char each

The space between p1 and p2 depends on the alignment requirements of std::max_align_t. On my system, it was 16 bytes and, therefore, there are 15 bytes between each char instance, even though the alignment of a char is only 1.

It is possible to specify custom alignment requirements that are stricter than the default alignment when declaring a variable using the alignas specifier. Let's say we have a cache line size of 64 bytes and that we, for some reason, want to ensure that two variables are placed on separate cache lines. We could do the following:

alignas(64) int x{};
alignas(64) int y{};
// x and y will be placed on different cache lines

It's also possible to specify a custom alignment when defining a type. The following is a struct that will occupy exactly one cache line when being used:

struct alignas(64) CacheLine {
    std::byte data[64];
};

Now, if we were to create a stack variable of the type CacheLine, it would be aligned according to the custom alignment of 64 bytes:

int main() {
  auto x = CacheLine{};
  auto y = CacheLine{};
  assert(is_aligned(&x, 64));
  assert(is_aligned(&y, 64));
  // ...
}

The stricter alignment requirements are also satisfied when allocating objects on the heap. In order to support dynamic allocation of types with non-default alignment requirements, C++17 introduced new overloads of operator new() and operator delete() which accept an alignment argument of type std::align_val_t. There is also an aligned_alloc() function defined in <cstdlib> which can be used to manually allocate aligned heap memory.

As follows is an example in which we allocate a block of heap memory that should occupy exactly one memory page. In this case, the alignment-aware versions of operator new() and operator delete() will be invoked when using new and delete:

constexpr auto ps = std::size_t{4096};      // Page size
struct alignas(ps) Page {
    std::byte data_[ps];
};
auto* page = new Page{};                    // Memory page
assert(is_aligned(page, ps));               // True
// Use page ...
delete page;

Memory pages are not part of the C++ abstract machine, so there is no portable way to programmatically get hold of the page size of the currently running system. However, you could use boost::mapped_region::get_page_size() or a platform-specific system call, such as getpagesize(), on Unix systems.

A final caveat to be aware of is that the supported set of alignments are defined by the implementation of the standard library you are using, and not the C++ standard.

Padding

The compiler sometimes needs to add extra bytes, padding, to our user-defined types. When we define data members in a class or struct, the compiler is forced to place the members in the same order as we define them.

However, the compiler also has to ensure that the data members inside the class have the correct alignment; hence, it needs to add padding between data members if necessary. For example, let's assume we have a class defined as follows:

class Document { 
  bool is_cached_{}; 
  double rank_{}; 
  int id_{}; 
};
std::cout << sizeof(Document) << '\n'; // Possible output is 24

The reason for the possible output being 24 is that the compiler inserts padding after bool and int, to fulfill the alignment requirements of the individual data members and the entire class. The compiler converts the Document class into something like this:

class Document {
  bool is_cached_{};
  std::byte padding1[7]; // Invisible padding inserted by compiler
  double rank_{};
  int id_{};
  std::byte padding2[4]; // Invisible padding inserted by compiler
};

The first padding between bool and double is 7 bytes, since the rank_ data member of the double type has an alignment of 8 bytes. The second padding that is added after int is 4 bytes. This is needed in order to fulfill the alignment requirements of the Document class itself. The member with the largest alignment requirement also determines the alignment requirement for the entire data structure. In our example, this means that the total size of the Document class must be a multiple of 8, since it contains a double value that is 8-byte aligned.

We now realize that we can rearrange the order of the data members in the Document class in a way that minimizes the padding inserted by the compiler, by starting with types with the biggest alignment requirements. Let's create a new version of the Document class:

// Version 2 of Document class
class Document {
  double rank_{}; // Rearranged data members
  int id_{};
  bool is_cached_{};
};

With the rearrangement of the members, the compiler now only needs to pad after the is_cached_ data member to adjust for the alignment of Document. This is how the class will look after padding:

// Version 2 of Document class after padding
class Document { 
  double rank_{}; 
  int id_{}; 
  bool is_cached_{}; 
  std::byte padding[3]; // Invisible padding inserted by compiler 
};

The size of the new Document class is now only 16 bytes, compared to the first version, which was 24 bytes. The insight here should be that the size of an object can change just by changing the order in which its members are declared. We can also verify this by using the sizeof operator again on our updated version of Document:

std::cout << sizeof(Document) << '\n'; // Possible output is 16

The following image shows the memory layout of version 1 and version 2 of the Document class:

Figure 7.8: Memory layouts of the two versions of the Document class. The size of an object can change just by changing the order in which its members are declared.

As a general rule, you can place the biggest data members at the beginning and the smallest members at the end. In this way, you can minimize the memory overhead caused by padding. Later on, we will see that we need to think about alignment when placing objects in memory regions that we have allocated, before we know the alignment of the objects that we are creating.

From a performance perspective, there can also be cases where you want to align objects to cache lines to minimize the number of cache lines an object spans over. While we are on the subject of cache friendliness, it should also be mentioned that it can be beneficial to place multiple data members that are frequently used together next to each other.

Keeping your data structures compact is important for performance. Many applications are bound by memory access time. Another important aspect of memory management is to never leak or waste memory for objects that are no longer needed. We can effectively avoid all sorts of resource leaks by being clear and explicit about the ownership of resources. This is the topic of the following section.