111. Benchmarking the Vector API
Benchmarking the Vector API can be accomplished via JMH. Let’s consider three Java arrays (x
, y
, z
) each of 50,000,000 integers, and the following computation:
z[i] = x[i] + y[i];
w[i] = x[i] * z[i] * y[i];
k[i] = z[i] + w[i] * y[i];
So, the final result is stored in a Java array named k
. And, let’s consider the following benchmark containing four different implementations of this computation (using a mask, no mask, unrolled, and plain scalar Java with arrays):
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode({Mode.AverageTime, Mode.Throughput})
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
@State(Scope.Benchmark)
@Fork(value = 1, warmups = 0,
jvmArgsPrepend = {"--add-modules=jdk.incubator.vector"})
public class Main {
private static final VectorSpecies<Integer> VS
= IntVector.SPECIES_PREFERRED;
...
@Benchmark
public void computeWithMask(Blackhole blackhole) {…}
@Benchmark
public void computeNoMask(Blackhole blackhole) {…}
@Benchmark
public void computeUnrolled(Blackhole blackhole) {…}
@Benchmark
public void computeArrays(Blackhole blackhole) {…}
}
Running this benchmark on an Intel(R) Core(TM) i7-3612QM CPU @ 2.10GHz machine running Windows 10 produced the following results:
Figure 5.9: Benchmark results
Overall, executing the computation using data-parallel capabilities gives the best performance, highest throughput, and best average time.