In a text generation process that combines outputs from K different prompts, the next token ŷ_j is chosen according to the following decision rule:
ŷ_j = argmax(y_j) Σ(k=1 to K) log Pr(y_j | x_k, ŷ_1, ..., ŷ_{j-1})
What is the primary analytical reason for summing the log-probabilities (log Pr) of a candidate token across all prompts, rather than multiplying the raw probabilities (Pr)?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Applying Token-Level Averaging for Text Generation
In a text generation process that combines outputs from
Kdifferent prompts, the next tokenŷ_jis chosen according to the following decision rule:ŷ_j = argmax(y_j) Σ(k=1 to K) log Pr(y_j | x_k, ŷ_1, ..., ŷ_{j-1})What is the primary analytical reason for summing the log-probabilities (
log Pr) of a candidate token across all prompts, rather than multiplying the raw probabilities (Pr)?A language model uses an ensemble of 3 prompts (K=3) to generate the next token in a sequence. The model selects the token
ŷ_jthat maximizes the sum of log-probabilities across all prompts, according to the formula:ŷ_j = argmax(y_j) Σ(k=1 to K) log Pr(y_j | ...).Given the following log-probabilities for two candidate tokens, 'cat' and 'dog', which token will the model select?