Learn Before
A language model's objective is to find the output sequence with the highest overall conditional probability. Given the input 'The weather is', the model needs to generate a two-word sequence. It has calculated the following probabilities:
Probabilities for the first word:
- Pr('nice' | 'The weather is') = 0.6
- Pr('cold' | 'The weather is') = 0.4
Probabilities for the second word, depending on the first:
- Pr('today' | 'The weather is nice') = 0.5
- Pr('and' | 'The weather is cold') = 0.9
Based on the objective of maximizing the total sequence probability, which of the following sequences is the optimal choice and why?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model's inference process aims to find an output sequence
ythat maximizes the conditional probabilityPr(y|x)given an inputx. Suppose the model has the input 'The sun is shining and the sky is' and calculates the probabilities for the next word as follows:Pr('blue' | 'The sun is shining and the sky is') = 0.65Pr('clear' | 'The sun is shining and the sky is') = 0.25Pr('vast' | 'The sun is shining and the sky is') = 0.09Pr('falling' | 'The sun is shining and the sky is') = 0.01
Based only on the objective of maximizing the conditional probability, which of the following statements correctly identifies the best next word and the reason for its selection?
A language model's objective is to find the output sequence with the highest overall conditional probability. Given the input 'The weather is', the model needs to generate a two-word sequence. It has calculated the following probabilities:
Probabilities for the first word:
- Pr('nice' | 'The weather is') = 0.6
- Pr('cold' | 'The weather is') = 0.4
Probabilities for the second word, depending on the first:
- Pr('today' | 'The weather is nice') = 0.5
- Pr('and' | 'The weather is cold') = 0.9
Based on the objective of maximizing the total sequence probability, which of the following sequences is the optimal choice and why?
Comparing Output Sequence Probabilities
Formula for Optimal Output Sequence in LLMs