CPU Cache Hierarchy
Because the bandwidth of main memory (typically to GB/s) is often an order of magnitude lower than a modern processor's data consumption rate—for example, a GHz quad-core CPU executing -bit AVX operations across its cores can consume bytes ( bytes) per clock cycle, requiring a transfer rate of bytes per second—CPUs utilize a local cache hierarchy to prevent execution starvation. The hierarchy begins with extremely fast but tiny L1 caches, which are typically split into separate data and instruction caches. If data is not found in L1, the search progresses downward to larger but slightly slower L2 caches (often exclusive per-core), and finally to substantial L3 caches that are shared across multiple cores. This tiered structure minimizes access latency by keeping frequently used data close to the execution units.
0
1
Tags
D2L
Dive into Deep Learning @ D2L