1Cademy - CPU Cache Hierarchy

Learn Before

Concept

CPU Cache Hierarchy

Because the bandwidth of main memory (typically $20$ to $40$ GB/s) is often an order of magnitude lower than a modern processor's data consumption rate—for example, a $2$ GHz quad-core CPU executing $256$ -bit AVX operations across its $4$ cores can consume $128$ bytes ( $4 \times 32$ bytes) per clock cycle, requiring a transfer rate of $256 \times 10^9$ bytes per second—CPUs utilize a local cache hierarchy to prevent execution starvation. The hierarchy begins with extremely fast but tiny L1 caches, which are typically split into separate data and instruction caches. If data is not found in L1, the search progresses downward to larger but slightly slower L2 caches (often exclusive per-core), and finally to substantial L3 caches that are shared across multiple cores. This tiered structure minimizes access latency by keeping frequently used data close to the execution units.

0

1

Updated 2026-06-19

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn Before

Related

Learn After