1Cademy - You are presented with two common configurations for a bidirectional, transformer-based language model. Match each model scale to its corresponding set of architectural hyperparameters.

Learn Before

BERT Model Sizes and Hyperparameters

Matching

You are presented with two common configurations for a bidirectional, transformer-based language model. Match each model scale to its corresponding set of architectural hyperparameters.

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

BERT-base Hyperparameters
BERT-large Hyperparameters
Challenges of Large-Scale BERT Models
A team is developing a large, bidirectional, transformer-based language model. Their initial design has 12 processing layers, a hidden state dimension of 768, and 12 attention heads. To significantly increase the model's capacity, they are considering two potential modifications. Which single change would result in a greater increase in the model's total number of parameters?
Model Selection for a Resource-Constrained Application
You are presented with two common configurations for a bidirectional, transformer-based language model. Match each model scale to its corresponding set of architectural hyperparameters.

Learn Before

Related