Interpreting Autoregressive Model Inputs
Analyze the following scenario and identify the fundamental flaw in the engineer's reasoning. Based on your analysis, describe the correct way to structure the model's input.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Probability Normalization over a Candidate Set
An autoregressive model is given an input prompt,
x, which is the sequence 'The best movie I ever saw was'. The model has already generated the partial output sequence,y_{<i}, which is 'about a'. The model's next task is to predict the probability of the next token,y_i, based on the standard conditional probability notationPr(y_i|x, y_{<i}). What is the actual, full sequence of tokens the model uses as its context to make this prediction?In the context of autoregressive sequence generation, the notation
Pr(y_i|x, y_{<i})implies that the model treats the inputxand the previously generated tokensy_{<i}as two separate, distinct sources of information for predicting the next tokeny_i.Interpreting Autoregressive Model Inputs