1Cademy - An autoregressive language model is in the process of generating a sequence of tokens. When a single attention head calculates its output for the 4th token in the sequence, which set of key and value vectors does it use to ensure it only relies on previously generated information?

Learn Before

Causal Attention Output for a Single Head and Token

Multiple Choice

An autoregressive language model is in the process of generating a sequence of tokens. When a single attention head calculates its output for the 4th token in the sequence, which set of key and value vectors does it use to ensure it only relies on previously generated information?

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences