Learn Before
Example

BERT-base Hyperparameters

The BERT-base model is configured with a specific set of hyperparameters that determine its overall size and architectural capacity. These key settings include a hidden size (dd) of 768, a model depth of 12 Transformer layers (LL), and 12 attention heads (nheadn_{\mathrm{head}}). Together, this structural configuration produces a model containing a total of 110 million parameters.

Image 0

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences