An autoregressive language model is generating a sequence and must choose between two possible next phrases to complete the input 'The mountain path was...'. The model has calculated the conditional log-probabilities for the tokens in each potential phrase as follows:
Phrase A: 'steep and rocky'
log Pr('steep' | 'The mountain path was') = -0.9log Pr('and' | 'The mountain path was steep') = -1.5log Pr('rocky' | 'The mountain path was steep and') = -0.7
Phrase B: 'long but scenic'
log Pr('long' | 'The mountain path was') = -1.2log Pr('but' | 'The mountain path was long') = -1.3log Pr('scenic' | 'The mountain path was long but') = -0.4
Based on these values, which phrase is the model more likely to generate, and what is its total log-probability?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An autoregressive language model is generating a sequence and must choose between two possible next phrases to complete the input 'The mountain path was...'. The model has calculated the conditional log-probabilities for the tokens in each potential phrase as follows:
Phrase A: 'steep and rocky'
log Pr('steep' | 'The mountain path was') = -0.9log Pr('and' | 'The mountain path was steep') = -1.5log Pr('rocky' | 'The mountain path was steep and') = -0.7
Phrase B: 'long but scenic'
log Pr('long' | 'The mountain path was') = -1.2log Pr('but' | 'The mountain path was long') = -1.3log Pr('scenic' | 'The mountain path was long but') = -0.4
Based on these values, which phrase is the model more likely to generate, and what is its total log-probability?
Analyzing Suboptimal Autoregressive Generation
An autoregressive language model must calculate the total log-probability for the generated sentence: "The sky is blue." The process involves summing the conditional log-probabilities of each token in sequence. Below are four potential breakdowns of this calculation. Which one correctly represents the sequence of calculations the model must perform?