Example

Example of Scaling BERT Parameters to 1.5 Billion

The development of a 1.5 billion-parameter DeBERTa model by He et al. (2021) serves as a prime example of scaling up BERT-like architectures. This massive scale was achieved by significantly expanding both the hidden size and the overall depth of the model.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences