Learn Before
Concept

Scalar per Head in Multi-Head Attention

In multi-head attention mechanisms, each individual attention head can be associated with a unique scalar value. This allows for different behaviors or biases to be applied on a per-head basis, as seen in techniques like ALiBi.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related