Processor cache fundamentals
Most processors used in modern SoCs include a Level 1 Instruction Cache (L1I$) and a Level 1 Data Cache (L1D$). Some processors also include an L2 Shared Cache (L2S$), which is used for both processor instructions and data per processor core, or like in the ARM Cortex-A9 cluster where it is used for both instructions and data, and shared between all the cores. In some modern processors in a multi-core processor cluster, there is also an L3 Common Cache (L3C$) between all the processors in the cluster. The caches are used to shorten the latency of the processor’s access to instructions and data while executing the software as they are implemented using SRAMs and running at the processor clock frequency, which is relatively higher than the remaining logic surrounding the processor in the SoC. Also, external memory access latency is usually many orders of magnitude higher than the access time to the internal SRAM implementing the cache within the processor...