Learn Before
Concept

Masks for Self-attention

In self-attention mechanisms, masks dictate which tokens within a sequence are allowed to interact with one another. This can be conceptualized by distinguishing between valid attention, where information is permitted to flow between tokens, and blocked attention, where the interaction is explicitly suppressed. For example, when processing a sequence of tokens from x0x_0 to x4x_4, a specific mask might allow a token like x1x_1 to receive valid attention from x0x_0, x2x_2, and x4x_4, while assigning blocked attention to x3x_3. By using these masks, models can selectively control the contextual information available to each token.

0

1

Updated 2026-04-16

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related