Debugging a Language Model's Probability Calculation
Based on the principles of sequential probability calculation in an auto-regressive model, what is the fundamental error in the model's observed behavior described in the case study below? Explain why this approach violates the core assumption of this type of model.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An auto-regressive model processes a sequence of four tokens:
token_0, token_1, token_2, token_3. The model calculates the probability of each token based on the numerical representations (embeddings) of all preceding tokens. Which of the following expressions correctly represents how the model would calculate the probability oftoken_2?An auto-regressive language model is calculating the probability of the three-token sequence
x_0, x_1, x_2. Arrange the following probability calculations in the order they would be performed by the model.Debugging a Language Model's Probability Calculation