Multiple Choice

When designing an autoregressive language model, a key decision is how to model the conditional probability of the next token given the context, Pr(yi|x, y_{<i}). Consider two approaches:

  • Approach 1: Uses a fixed-size window, considering only the k most recent previous tokens (y_{i-k}, ..., y_{i-1}) to predict the next token yi.
  • Approach 2: Processes the entire preceding sequence (y_{<i}) to predict the next token yi.

Which statement best analyzes the fundamental trade-off between these two approaches regarding the modeling and efficient computation of this probability?

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science