1Cademy - When designing an autoregressive language model, a key decision is how to model the conditional probability of the next token given the context, `Pr(yi|x, y_{<i})`. Consider two approaches: * **Approach 1:** Uses a fixed-size window, considering only the `k` most recent previous tokens (`y_{i-k}, ..., y_{i-1}`) to predict the next token `yi`. * **Approach 2:** Processes the entire preceding sequence (`y_{<i}`) to predict the next token `yi`. Which statement best analyzes the fundamental trade-off between these two approaches regarding the modeling and efficient computation of this probability?

Learn Before

Modeling and Efficient Computation of Conditional Token Probabilities

Multiple Choice

When designing an autoregressive language model, a key decision is how to model the conditional probability of the next token given the context, Pr(yi|x, y_{<i}). Consider two approaches:

Approach 1: Uses a fixed-size window, considering only the k most recent previous tokens (y_{i-k}, ..., y_{i-1}) to predict the next token yi.
Approach 2: Processes the entire preceding sequence (y_{<i}) to predict the next token yi.

Which statement best analyzes the fundamental trade-off between these two approaches regarding the modeling and efficient computation of this probability?

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related