Learn Before
Match each hyperparameter of the BERT-large model to its correct value.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Recall in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team is deciding between two pre-trained language models for a complex text classification task. Model A has 12 transformer layers, a hidden size of 768, and 12 attention heads. Model B has 24 transformer layers, a hidden size of 1,024, and 16 attention heads. What is the most critical trade-off the team must evaluate when considering Model B over Model A?
Match each hyperparameter of the BERT-large model to its correct value.
The BERT-large model, which has a total of 340 million parameters, is built using 24 Transformer layers and a hidden size of 1,024. This architecture utilizes ____ attention heads in each layer.