Learn Before
  • BERT Model Sizes and Hyperparameters

BERT-base Hyperparameters

The BERT-base model is configured with specific hyperparameters that determine its size and architecture. These include a hidden size (dd) of 768, 12 Transformer layers (LL), and 12 attention heads (nheadn_{head}). This configuration results in a model with a total of 110 million parameters.

Image 0

0

1

6 days ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • BERT-base Hyperparameters

  • BERT-large Hyperparameters

  • Challenges of Large-Scale BERT Models