Calculating Conditional Log-Probability Using an LLM
The conditional log-probability, log Pr(y|x), is computed using a large language model by finding the difference between the joint and marginal log-probabilities. The joint log-probability, log Pr([x, y]), is determined by first concatenating the input x and output y into a single sequence. A forward pass is then performed over this sequence, where for each token position, an embedding is computed and fed as the initial representation into the Transformer layers. In parallel, the marginal log-probability, log Pr(x), is calculated by running the model on the input sequence x alone. The final conditional log-probability is then given by the formula:
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
SFT as Language Model Training on Concatenated Sequences
Calculating Conditional Log-Probability Using an LLM
Selective Loss Computation in Joint Probability Language Modeling
Calculating Conditional Log-Probability
An engineer is evaluating a language model and calculates the following log-probabilities for an input sequence
xand an output sequencey: the joint log-probabilitylog Pr([x, y])and the marginal log-probabilitylog Pr(x). They observe that the value oflog Pr([x, y])is significantly more negative than the value oflog Pr(x). Based on the fundamental relationship between joint, conditional, and marginal probabilities, what is the most accurate conclusion?A language model is being evaluated. For a given input sequence
xand a potential output sequencey, the model calculateslog Pr([x, y]) = -3.5andlog Pr(x) = -5.2. Based on these values, it is reasonable to conclude that the model's probability calculations are functioning correctly.
Learn After
Initial Representation for Concatenated [x, y] Sequences
A data scientist is using a large language model to determine the conditional log-probability of a specific completion
yfollowing a given promptx. Their process involves concatenating the two sequences into[x, y]and then performing a single forward pass to compute the log-probability of this combined sequence, which they take as their final result. Which statement best analyzes the flaw in this methodology?You are tasked with using a large language model to compute the conditional log-probability of an output sequence
ygiven an input sequencex. Arrange the following computational steps into the correct chronological order.Calculating Conditional Log-Probability from Model Outputs