Learn Before
  • Self-Attention layer understanding - Step 4 - Multi Headed Attention

Number of Attention Heads (nhead)

In the multi-head self-attention mechanism, the number of heads, denoted as nhead, is a key hyperparameter that must be specified. This value determines the number of different subspaces in which the attention mechanism operates, allowing the model to focus on different aspects of the input simultaneously. A larger nhead value corresponds to a greater number of attention subspaces. In practice, it is common to set the number of heads to four or more (nhead4n_{head} \geq 4).

0

1

7 days ago

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related
  • Self-Attention layer understanding - Step 5 - Adding the time

  • Number of Attention Heads (nhead)

  • Query, Key, and Value Projections in Multi-Head Attention

  • Scalar per Head in Multi-Head Attention