Concept

CPU Cache Hierarchy

Because the bandwidth of main memory (typically 2020 to 4040 GB/s) is often an order of magnitude lower than a modern processor's data consumption rate—for example, a 22 GHz quad-core CPU executing 256256-bit AVX operations across its 44 cores can consume 128128 bytes (4×324 \times 32 bytes) per clock cycle, requiring a transfer rate of 256×109256 \times 10^9 bytes per second—CPUs utilize a local cache hierarchy to prevent execution starvation. The hierarchy begins with extremely fast but tiny L1 caches, which are typically split into separate data and instruction caches. If data is not found in L1, the search progresses downward to larger but slightly slower L2 caches (often exclusive per-core), and finally to substantial L3 caches that are shared across multiple cores. This tiered structure minimizes access latency by keeping frequently used data close to the execution units.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L