An engineer is training an autoregressive language model designed to generate text one word at a time. Due to a configuration error, the attention mechanism is allowed to see all tokens in the input sequence, including those that appear later in the sequence, rather than only the preceding ones. The model trains successfully to a very low loss on its training data. What is the most likely outcome when this trained model is later used to generate new text, starting from a prompt?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is training an autoregressive language model designed to generate text one word at a time. Due to a configuration error, the attention mechanism is allowed to see all tokens in the input sequence, including those that appear later in the sequence, rather than only the preceding ones. The model trains successfully to a very low loss on its training data. What is the most likely outcome when this trained model is later used to generate new text, starting from a prompt?
Debugging an Autoregressive Model's Attention
Enforcing Autoregressive Behavior