Concept

Mixed Precision Training

To mitigate the high computational cost of training Large Language Models, even when using distributed systems, mixed precision training is a common efficiency-enhancing technique. This method involves using lower-precision numerical formats, such as FP16 or FP8, for most computations like gradient calculation, while reserving higher-precision formats like FP32 or FP64 for critical operations like updating the model's master parameters to maintain numerical stability.

0

1

Updated 2026-04-21

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences