Learn Before
Comparing Output Sequence Probabilities
A language model is tasked with completing the sentence 'The cat sat on the...'. It considers two possible two-word completions: 'warm mat' and 'soft rug'. The model has calculated the following conditional probabilities:
Pr('warm' | 'The cat sat on the') = 0.5Pr('soft' | 'The cat sat on the') = 0.4Pr('mat' | 'The cat sat on the warm') = 0.8Pr('rug' | 'The cat sat on the soft') = 0.9
Analyze both sequences to determine which one the model would select if its objective is to find the output with the highest conditional probability. Show your calculations and explain your reasoning.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model's inference process aims to find an output sequence
ythat maximizes the conditional probabilityPr(y|x)given an inputx. Suppose the model has the input 'The sun is shining and the sky is' and calculates the probabilities for the next word as follows:Pr('blue' | 'The sun is shining and the sky is') = 0.65Pr('clear' | 'The sun is shining and the sky is') = 0.25Pr('vast' | 'The sun is shining and the sky is') = 0.09Pr('falling' | 'The sun is shining and the sky is') = 0.01
Based only on the objective of maximizing the conditional probability, which of the following statements correctly identifies the best next word and the reason for its selection?
A language model's objective is to find the output sequence with the highest overall conditional probability. Given the input 'The weather is', the model needs to generate a two-word sequence. It has calculated the following probabilities:
Probabilities for the first word:
- Pr('nice' | 'The weather is') = 0.6
- Pr('cold' | 'The weather is') = 0.4
Probabilities for the second word, depending on the first:
- Pr('today' | 'The weather is nice') = 0.5
- Pr('and' | 'The weather is cold') = 0.9
Based on the objective of maximizing the total sequence probability, which of the following sequences is the optimal choice and why?
Comparing Output Sequence Probabilities
Formula for Optimal Output Sequence in LLMs