1Cademy - In an autoregressive language model, after processing a sequence of input tokens, a corresponding sequence of hidden state vectors is produced by the final decoder layer. To predict the probability distribution for the single token that will come next, what is the correct procedure and why?

Learn Before

Next-Token Probability Calculation in Autoregressive Decoders

Multiple Choice

In an autoregressive language model, after processing a sequence of input tokens, a corresponding sequence of hidden state vectors is produced by the final decoder layer. To predict the probability distribution for the single token that will come next, what is the correct procedure and why?

Updated 2025-10-02

Contributors are: