1Cademy - Step-by-Step Sequence Log-Probability Computation

Learn Before

Direct Computation of Output Sequence Log-Probability in LLMs

Activity (Process)

Step-by-Step Sequence Log-Probability Computation

The process of computing the log-probability of an output sequence given an input, $\log \Pr(\mathbf{y}|\mathbf{x})$ , in a Transformer language model involves several sequential operations. First, the input $\mathbf{x}$ and output $\mathbf{y}$ are concatenated. For each position $i'$ , the corresponding token embedding is processed through a stack of Transformer layers, where self-attention networks update the KV cache and compute attention outputs. If the position corresponds to a generated token, the model utilizes a Softmax layer to determine the token's prediction probability. Finally, the total log-probability is calculated by summing these individual token log-probabilities over the entire generated sequence.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related