Probing performance with micro-benchmarks
The outcome of the previous section may leave you somewhat daunted: the processor is very complex and, apparently, needs a lot of hand-holding on the part of the programmer to operate at peak efficiency. Let us start small and see how fast a processor can do some basic operations. To that end, we will use the same Google Benchmark tool we have used in the last chapter. Here is a benchmark for the simple addition of two arrays:
01_superscalar.C
#include "benchmark/benchmark.h" void BM_add(benchmark::State& state) { srand(1); const unsigned int N = state.range(0); std::vector<unsigned long> v1(N), v2(N); for (size_t i = 0; i < N; ++i) { v1[i] = rand(); v2[i] = rand(...