Search for Optimal Output Sequence in LLMs
The search process in language model inference aims to identify an output sequence y that is either optimal or sub-optimal based on its conditional log-probability, log Pr(y|x), given an input x. The objective is to find a sequence that maximizes this metric.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A language model is generating a response based on a user's input. For this input, the model can generate many different possible sequences of words. The model's core task is to select the single best sequence from all these possibilities. According to the mathematical objective that governs this selection, which principle should the model follow?
Autoregressive Decomposition of the LLM Inference Objective
Optimal Sequence Selection
Search for Optimal Output Sequence in LLMs
Interpreting the LLM Search Objective
Learn After
A language model's inference process aims to find an output sequence
ythat maximizes the conditional probabilityPr(y|x)given an inputx. Suppose the model has the input 'The sun is shining and the sky is' and calculates the probabilities for the next word as follows:Pr('blue' | 'The sun is shining and the sky is') = 0.65Pr('clear' | 'The sun is shining and the sky is') = 0.25Pr('vast' | 'The sun is shining and the sky is') = 0.09Pr('falling' | 'The sun is shining and the sky is') = 0.01
Based only on the objective of maximizing the conditional probability, which of the following statements correctly identifies the best next word and the reason for its selection?
A language model's objective is to find the output sequence with the highest overall conditional probability. Given the input 'The weather is', the model needs to generate a two-word sequence. It has calculated the following probabilities:
Probabilities for the first word:
- Pr('nice' | 'The weather is') = 0.6
- Pr('cold' | 'The weather is') = 0.4
Probabilities for the second word, depending on the first:
- Pr('today' | 'The weather is nice') = 0.5
- Pr('and' | 'The weather is cold') = 0.9
Based on the objective of maximizing the total sequence probability, which of the following sequences is the optimal choice and why?
Comparing Output Sequence Probabilities
Formula for Optimal Output Sequence in LLMs