Learn Before
Concept
CPU Execution and Operation Latencies
Beyond memory access, various CPU execution steps and software operations exhibit distinct latency profiles. Basic execution steps are extremely fast; for example, a floating-point addition, multiplication, or fused multiply-add (FMA) takes approximately ns (roughly cycles). Control flow errors, such as a branch misprediction, incur a pipeline flush penalty of about ns ( to cycles). Thread synchronization mechanisms, such as a mutex lock or unlock, take roughly ns. In contrast, higher-level software operations require significantly more time; for instance, compressing KB of data using a fast algorithm like Google Snappy takes approximately μs.
0
1
Updated 2026-05-18
Tags
D2L
Dive into Deep Learning @ D2L