Learn Before
Consequences of Misconfigured Attention in Generative Models
A developer is building a language model designed to generate stories one word at a time. During setup, they mistakenly allow the attention mechanism for each position to access information from all other positions in the sequence, including future ones. Explain the fundamental problem with this approach for a text generation task and describe the most likely, undesirable outcome for the model's output.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a generative language model, an attention mechanism processes a sequence of 4 tokens. To ensure that the prediction for each token only depends on the preceding tokens and itself, a mask is applied to the raw attention score matrix before the final weighting step. Given the initial score matrix below, where rows represent the 'query' token and columns represent the 'key' token, which of the following matrices correctly shows the result of applying this causal mask? (Note: '-inf' represents a very large negative number that effectively nullifies the score.)
Initial Matrix: [[ 0.8, 1.2, 0.5, 2.1 ], [ 1.5, 0.6, 1.9, 0.3 ], [ 0.9, 2.2, 1.1, 0.7 ], [ 1.3, 0.4, 1.6, 0.2 ]]
Consequences of Misconfigured Attention in Generative Models
Appropriate Application of an Attention Mechanism