1Cademy - An engineer is training an autoregressive language model designed to generate text one word at a time. Due to a configuration error, the attention mechanism is allowed to see all tokens in the input sequence, including those that appear later in the sequence, rather than only the preceding ones. The model trains successfully to a very low loss on its training data. What is the most likely outcome when this trained model is later used to generate new text, starting from a prompt?

Learn Before

Role of Causal Attention in Autoregressive Language Models

Multiple Choice

An engineer is training an autoregressive language model designed to generate text one word at a time. Due to a configuration error, the attention mechanism is allowed to see all tokens in the input sequence, including those that appear later in the sequence, rather than only the preceding ones. The model trains successfully to a very low loss on its training data. What is the most likely outcome when this trained model is later used to generate new text, starting from a prompt?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related