Learn Before
Concept

CPU Execution and Operation Latencies

Beyond memory access, various CPU execution steps and software operations exhibit distinct latency profiles. Basic execution steps are extremely fast; for example, a floating-point addition, multiplication, or fused multiply-add (FMA) takes approximately 1.51.5 ns (roughly 44 cycles). Control flow errors, such as a branch misprediction, incur a pipeline flush penalty of about 66 ns (1515 to 2020 cycles). Thread synchronization mechanisms, such as a mutex lock or unlock, take roughly 2525 ns. In contrast, higher-level software operations require significantly more time; for instance, compressing 11 KB of data using a fast algorithm like Google Snappy takes approximately 33 μs.

0

1

Updated 2026-05-18

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L