Multiple Choice

An autoregressive model is generating a sequence of text token by token. When it is time to predict the token at position 't', the model's attention mechanism is designed to calculate relevance scores between the query at position 't' and the keys at all other positions in the sequence. However, a crucial modification is applied that prevents the query at 't' from incorporating information from any keys at positions greater than 't' (i.e., t+1, t+2, etc.). Which statement best analyzes the fundamental reason for this specific modification?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Data Science

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science