Learn Before
Applying Log-Likelihood Calculation to a Training Dataset
The log-likelihood of a sequence is computed by aggregating the log-probabilities of each token conditioned on its preceding context. This sequence-level computation is formally expressed as , where the subscript affixed to both and denotes the parameters of the language model. This metric provides a foundation for optimizing the model across a training dataset.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Log-Probability of a Ranked Sequence
Log-Likelihood Objective for Language Model Training
A language model is generating a sequence of tokens. It has computed the following conditional log-probabilities for a three-token sequence, where each token's probability is dependent on the ones that came before it:
- Log-probability of the first token: -1.8
- Log-probability of the second token, given the first: -2.5
- Log-probability of the third token, given the first two: -1.2
Based on these values, what is the total log-likelihood of this entire three-token sequence?
Evaluating Sentence Plausibility
A language model has calculated the total log-likelihood for the sequence of tokens: ["The", "quick", "brown", "fox"]. The calculation involves summing the conditional log-probabilities of each token given the preceding ones. If the third token is changed from "brown" to "lazy", creating the new sequence ["The", "quick", "lazy", "fox"], which set of conditional log-probabilities must be re-calculated to find the new total log-likelihood?
Applying Log-Likelihood Calculation to a Training Dataset
Learn After
Maximum Likelihood Training Objective for a Dataset of Sequences
A language model is defined by the following table of conditional log-probabilities, where
<s>is the start-of-sequence token and<eos>is the end-of-sequence token:| Log-Probability | Value | |---|---| |
log Pr(A | <s>)| -0.5 | |log Pr(B | <s>)| -1.5 | |log Pr(B | A)| -0.2 | |log Pr(A | B)| -1.0 | |log Pr(<eos> | A)| -2.0 | |log Pr(<eos> | B)| -0.1 |Given a training dataset
Dcontaining two sequences:- Sequence 1:
(A, B, <eos>) - Sequence 2:
(B, A, <eos>)
Calculate the log-likelihood for each individual sequence in the dataset. Which of the following options correctly lists the results?
- Sequence 1:
Verifying Language Model Performance on a Small Dataset
You are tasked with evaluating a language model's performance on a dataset composed of multiple text sequences. Arrange the following steps in the correct logical order to compute the log-likelihood for each individual sequence in the dataset.