Activity (Process)

Step-by-Step Sequence Log-Probability Computation

The process of computing the log-probability of an output sequence given an input, logPr(yx)\log \Pr(\mathbf{y}|\mathbf{x}), in a Transformer language model involves several sequential operations. First, the input x\mathbf{x} and output y\mathbf{y} are concatenated. For each position ii', the corresponding token embedding is processed through a stack of Transformer layers, where self-attention networks update the KV cache and compute attention outputs. If the position corresponds to a generated token, the model utilizes a Softmax layer to determine the token's prediction probability. Finally, the total log-probability is calculated by summing these individual token log-probabilities over the entire generated sequence.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences