Learn Before
Masks for Self-attention
In self-attention mechanisms, masks dictate which tokens within a sequence are allowed to interact with one another. This can be conceptualized by distinguishing between valid attention, where information is permitted to flow between tokens, and blocked attention, where the interaction is explicitly suppressed. For example, when processing a sequence of tokens from to , a specific mask might allow a token like to receive valid attention from , , and , while assigning blocked attention to . By using these masks, models can selectively control the contextual information available to each token.
0
1
Tags
Foundations of Large Language Models
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences