Learn Before
Choosing the Number of Attention Heads for a Specific Task
Based on the scenario below, which configuration is more suitable for the described task? Justify your answer by explaining how the number of attention heads relates to the model's ability to process the input.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer observes that their language model struggles to understand sentences with multiple, distinct syntactic relationships (e.g., identifying both the subject-verb and modifier-noun relationships in 'The quick brown fox, which was very agile, jumps over the lazy dog.'). The model's self-attention mechanism is currently configured with a single attention head. Which of the following changes is most likely to directly address this specific problem, and why?
Evaluating the Trade-offs of the Number of Attention Heads
Choosing the Number of Attention Heads for a Specific Task