1Cademy - Hardware Accelerators for Training

Learn Before

Graphics Processing Unit (GPU) in Deep Learning

Concept

Hardware Accelerators for Training

Hardware accelerators optimized for deep learning training must handle significantly higher computational and memory demands than inference devices. During training, all intermediate activations must be stored in memory to compute gradients during backpropagation. Additionally, accumulating gradients requires higher numerical precision—at minimum FP16 or mixed precision with FP32—to avoid issues like numerical underflow or overflow. Consequently, training accelerators (such as NVIDIA V100 GPUs) require vastly faster and larger memory technologies (e.g., HBM2 as opposed to GDDR6) and greater overall processing power.

Updated 2026-05-18

Contributors are:

Who are from:

References

Dive into Deep Learning

Learn After

Numerical Overflow from Small Data Types

Learn Before

Related

Learn After