Learn Before
Self-Attention layer understanding - Step 4 - Multi Headed Attention
Number of Attention Heads (nhead)
In the multi-head self-attention mechanism, the number of heads, denoted as nhead
, is a key hyperparameter that must be specified. This value determines the number of different subspaces in which the attention mechanism operates, allowing the model to focus on different aspects of the input simultaneously. A larger nhead
value corresponds to a greater number of attention subspaces. In practice, it is common to set the number of heads to four or more ().
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Self-Attention layer understanding - Step 5 - Adding the time
Number of Attention Heads (nhead)
Query, Key, and Value Projections in Multi-Head Attention
Scalar per Head in Multi-Head Attention