1Cademy - In a generative language model, an attention mechanism processes a sequence of 4 tokens. To ensure that the prediction for each token only depends on the preceding tokens and itself, a mask is applied to the raw attention score matrix before the final weighting step. Given the initial score matrix below, where rows represent the query token and columns represent the key token, which of the following matrices correctly shows the result of applying this causal mask? (Note: -inf represents a very large negative number that effectively nullifies the score.) Initial Matrix: [[ 0.8, 1.2, 0.5, 2.1 ], [ 1.5, 0.6, 1.9, 0.3 ], [ 0.9, 2.2, 1.1, 0.7 ], [ 1.3, 0.4, 1.6, 0.2 ]]

Learn Before

Causal Attention

Multiple Choice

In a generative language model, an attention mechanism processes a sequence of 4 tokens. To ensure that the prediction for each token only depends on the preceding tokens and itself, a mask is applied to the raw attention score matrix before the final weighting step. Given the initial score matrix below, where rows represent the 'query' token and columns represent the 'key' token, which of the following matrices correctly shows the result of applying this causal mask? (Note: '-inf' represents a very large negative number that effectively nullifies the score.)

Initial Matrix: [[ 0.8, 1.2, 0.5, 2.1 ], [ 1.5, 0.6, 1.9, 0.3 ], [ 0.9, 2.2, 1.1, 0.7 ], [ 1.3, 0.4, 1.6, 0.2 ]]

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related