Learn Before
BERT Model Sizes and Hyperparameters
BERT-base Hyperparameters
The BERT-base model is configured with specific hyperparameters that determine its size and architecture. These include a hidden size () of 768, 12 Transformer layers (), and 12 attention heads (). This configuration results in a model with a total of 110 million parameters.

0
1
6 days ago
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
BERT-base Hyperparameters
BERT-large Hyperparameters
Challenges of Large-Scale BERT Models