1Cademy - A research team is deciding between two pre-trained language models for a complex text classification task. Model A has 12 transformer layers, a hidden size of 768, and 12 attention heads. Model B has 24 transformer layers, a hidden size of 1,024, and 16 attention heads. What is the most critical trade-off the team must evaluate when considering Model B over Model A?

Learn Before

BERT-large Hyperparameters

Multiple Choice

A research team is deciding between two pre-trained language models for a complex text classification task. Model A has 12 transformer layers, a hidden size of 768, and 12 attention heads. Model B has 24 transformer layers, a hidden size of 1,024, and 16 attention heads. What is the most critical trade-off the team must evaluate when considering Model B over Model A?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related