Example

Example of Scaling BERT Parameters to 3.9 Billion

The Megatron research by Shoeybi et al. (2019) provides a significant example of training a massive 3.9 billion-parameter BERT-style model. To overcome the extreme computational requirements and manage the complex parallel computation of such a scale, the training process necessitated the synchronized use of hundreds of GPUs.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences