In an autoregressive language model, after processing a sequence of input tokens, a corresponding sequence of hidden state vectors is produced by the final decoder layer. To predict the probability distribution for the single token that will come next, what is the correct procedure and why?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
The Search Problem in LLM Inference
Next-Token Probability Calculation in a Transformer Decoder
In an autoregressive language model, after processing a sequence of input tokens, a corresponding sequence of hidden state vectors is produced by the final decoder layer. To predict the probability distribution for the single token that will come next, what is the correct procedure and why?
An autoregressive model generates text one token at a time. Arrange the following computational steps in the correct order to calculate the probability distribution for the very next token, given the current sequence of tokens.
Debugging a Language Model's Output Distribution