Learn Before
Comparison of Self-Attention Masking Results
A comparison of the self-attention masking results across causal language modeling, masked language modeling, and permuted language modeling can be visualized using matrices. In these representations, a blue cell at coordinates signifies valid attention, indicating that the token at position attends to the token at position . Conversely, a gray cell denotes blocked attention, meaning the token at position does not attend to the token at position . Additionally, represents the embedding of the symbol , which combines both the token embedding and the positional embedding.
0
1
Tags
Foundations of Large Language Models
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences