When designing an autoregressive language model, a key decision is how to model the conditional probability of the next token given the context, Pr(yi|x, y_{<i}). Consider two approaches:
- Approach 1: Uses a fixed-size window, considering only the
kmost recent previous tokens (y_{i-k}, ..., y_{i-1}) to predict the next tokenyi. - Approach 2: Processes the entire preceding sequence (
y_{<i}) to predict the next tokenyi.
Which statement best analyzes the fundamental trade-off between these two approaches regarding the modeling and efficient computation of this probability?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Language Model Design Trade-offs
When designing an autoregressive language model, a key decision is how to model the conditional probability of the next token given the context,
Pr(yi|x, y_{<i}). Consider two approaches:- Approach 1: Uses a fixed-size window, considering only the
kmost recent previous tokens (y_{i-k}, ..., y_{i-1}) to predict the next tokenyi. - Approach 2: Processes the entire preceding sequence (
y_{<i}) to predict the next tokenyi.
Which statement best analyzes the fundamental trade-off between these two approaches regarding the modeling and efficient computation of this probability?
- Approach 1: Uses a fixed-size window, considering only the
Computational Scaling in Autoregressive Models