Learn Before
Hardware Accelerators for Training
Hardware accelerators optimized for deep learning training must handle significantly higher computational and memory demands than inference devices. During training, all intermediate activations must be stored in memory to compute gradients during backpropagation. Additionally, accumulating gradients requires higher numerical precision—at minimum FP16 or mixed precision with FP32—to avoid issues like numerical underflow or overflow. Consequently, training accelerators (such as NVIDIA V100 GPUs) require vastly faster and larger memory technologies (e.g., HBM2 as opposed to GDDR6) and greater overall processing power.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
CPU vs. GPU Architecture in Deep Learning
AlexNet Convolutional Neural Network
cuda-convnet
General-Purpose GPUs (GPGPUs)
GPU Hardware Configurations in Deep Learning
High-Bandwidth GPU Memory Technologies
Hardware Accelerators for Inference
Hardware Accelerators for Training
NVIDIA Collective Communications Library (NCCL)
Parallelization on Multiple GPUs