The latency numbers that every programmer should know
Hardware and software have progressed over the years. Latencies for various operations put things in perspective. The latency numbers for the year 2015, reproduced with the permission of Aurojit Panda and Colin Scott of Berkeley University (http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html). Latency numbers that every programmer should know are as shown in the following table:
Operation |
Time taken as of 2015 |
---|---|
L1 cache reference |
1ns (nano second) |
Branch mispredict |
3 ns |
L2 cache reference |
4 ns |
Mutex lock/unlock |
17 ns |
Compress 1KB with Zippy (Zippy/Snappy: http://code.google.com/p/snappy/) |
2μs (1000 ns = 1μs: micro second) |
Send 2000 bytes over the commodity network |
200ns (that is, 0.2μs) |
SSD random read |
16 μs |
Round-trip in the same datacenter |
500 μs |
Read 1,000,000 bytes sequentially from SSD |
200 μs |
Disk seek |
4 ms (1000 μs = 1 ms) |
Read 1,000,000 bytes sequentially from disk |
2 ms |
Packet roundtrip CA to Netherlands |
150 ms |
The preceding table shows the operations in a computer vis-a-vis the latency incurred due to the operation. When a CPU core processes some data in a CPU register, it may take a few CPU cycles (for reference, a 3 GHz CPU runs 3000 cycles per nanosecond), but the moment it has to fall back on L1 or L2 cache, the latency becomes thousands of times slower. The preceding table does not show main memory access latency, which is roughly 100 ns (it varies, based on the access pattern)—about 25 times slower than the L2 cache.