Multiple Choice

An engineer is training an autoregressive language model designed to generate text one word at a time. Due to a configuration error, the attention mechanism is allowed to see all tokens in the input sequence, including those that appear later in the sequence, rather than only the preceding ones. The model trains successfully to a very low loss on its training data. What is the most likely outcome when this trained model is later used to generate new text, starting from a prompt?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science