Learn Before
Example
Example of Scaling BERT Parameters to 1.5 Billion
The development of a 1.5 billion-parameter DeBERTa model by He et al. (2021) serves as a prime example of scaling up BERT-like architectures. This massive scale was achieved by significantly expanding both the hidden size and the overall depth of the model.
0
1
Updated 2026-04-17
Tags
Foundations of Large Language Models
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences