Learn Before
Hardware Accelerators for Inference
Hardware accelerators optimized for deep learning inference are designed specifically to compute the forward propagation of a neural network. Because no intermediate data needs to be stored for backpropagation, these devices require significantly less memory capacity. Furthermore, inference tasks can typically tolerate lower numerical precision without heavily impacting predictions, allowing these accelerators to efficiently utilize formats like FP16 or INT8. For example, NVIDIA's Turing T4 GPUs are specifically tailored for these streamlined inference workloads.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
CPU vs. GPU Architecture in Deep Learning
AlexNet Convolutional Neural Network
cuda-convnet
General-Purpose GPUs (GPGPUs)
GPU Hardware Configurations in Deep Learning
High-Bandwidth GPU Memory Technologies
Hardware Accelerators for Inference
Hardware Accelerators for Training
NVIDIA Collective Communications Library (NCCL)
Parallelization on Multiple GPUs