Definition

Model Depth in Transformers

The expressive power of Transformer networks can be effectively enhanced by increasing the model depth, denoted by LL, which represents the total number of stacked processing layers. In standard BERT architectures, the depth LL is typically configured to either 12 or 24. However, employing networks with even greater depth is a viable strategy to achieve further performance enhancements.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related