1Cademy - An autoregressive model is generating a sequence of tokens one by one. It is currently calculating the attention output for the token at position 4 (i.e., the fifth token in the sequence). To ensure the model only uses information it has already seen, which set of key (K) and value (V) vectors must be used as input to the attention mechanism for the query vector at position 4 (q₄)?

Learn Before

Causal Attention Input Structure

Multiple Choice

An autoregressive model is generating a sequence of tokens one by one. It is currently calculating the attention output for the token at position 4 (i.e., the fifth token in the sequence). To ensure the model only uses information it has already seen, which set of key (K) and value (V) vectors must be used as input to the attention mechanism for the query vector at position 4 (q₄)?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related