You're reading from The Art of Writing Efficient Programs An advanced programmer's guide to efficient hardware utilization and compiler optimizations using C++ examples

Product type Paperback

Published in Oct 2021

Publisher Packt

ISBN-13 9781800208117

Length 464 pages

Edition 1st Edition

Languages

C++

Tools

Cmake

Concepts

High Performance Programming

Author (1):

Fedor G. Pikus

View More author details

Table of Contents (18) Chapters

Preface

1. Section 1 – Performance Fundamentals

2. Chapter 1: Introduction to Performance and Concurrency FREE CHAPTER

3. Chapter 2: Performance Measurements

4. Chapter 3: CPU Architecture, Resources, and Performance

5. Chapter 4: Memory Architecture and Performance

6. Chapter 5: Threads, Memory, and Concurrency

7. Section 2 – Advanced Concurrency

8. Chapter 6: Concurrency and Performance

9. Chapter 7: Data Structures for Concurrency

10. Chapter 8: Concurrency in C++

11. Section 3 – Designing and Coding High-Performance Programs

12. Chapter 9: High-Performance C++

13. Chapter 10: Compiler Optimizations in C++

14. Chapter 11: Undefined Behavior and Performance

15. Chapter 12: Design for Performance

16. Assessments

17. Other Books You May Enjoy

Understanding the cost of memory synchronization

The last section was all about running multiple threads on the same machine without any interaction between these threads. If you can split the work your program does between threads in a way that makes such implementation possible, by all means, do it. You cannot beat the performance of such an embarrassingly parallel program.

More often than not, threads must interact with each other because they are contributing work to a common result. Such interactions happen by means of threads communicating with each other through the one resource they share, the memory. We must now understand the performance implications of this.

Let us start with a trivial example. Say we want to compute a sum of many values. We have many numbers to add, but, in the end, only one result. We have so many numbers to add that we want to split the work of adding them between several threads. But there is only one result value, so the threads have...