Learn Before
Formula for Token-Level Model Averaging in Prompt Ensembling
In prompt ensembling, token-level model averaging determines the predicted token at the -th step of the model combination. The token is selected by maximizing the sum of log-probabilities across all prompts. This decision rule is expressed by the formula: . Here, the probability of predicting the token is conditioned on the -th prompt's input and all previously generated tokens through .

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Token-Level Model Averaging in Prompt Ensembling
Imagine two language models are tasked with completing the sentence: 'The weather today is exceptionally...'. At this specific step, they must choose the very next word. Their internal calculations produce the following probability scores for the top three candidate words:
- Model 1:
warm(0.6),sunny(0.3),bright(0.1) - Model 2:
warm(0.2),sunny(0.7),bright(0.1)
If a system combines these models by averaging their token-level probability distributions to make a decision, which word will it select as the next word in the sequence, and why?
- Model 1:
Analysis of Text Generation Combination Methods
Choosing a Generation Combination Strategy
Learn After
Applying Token-Level Averaging for Text Generation
In a text generation process that combines outputs from
Kdifferent prompts, the next tokenŷ_jis chosen according to the following decision rule:ŷ_j = argmax(y_j) Σ(k=1 to K) log Pr(y_j | x_k, ŷ_1, ..., ŷ_{j-1})What is the primary analytical reason for summing the log-probabilities (
log Pr) of a candidate token across all prompts, rather than multiplying the raw probabilities (Pr)?A language model uses an ensemble of 3 prompts (K=3) to generate the next token in a sequence. The model selects the token
ŷ_jthat maximizes the sum of log-probabilities across all prompts, according to the formula:ŷ_j = argmax(y_j) Σ(k=1 to K) log Pr(y_j | ...).Given the following log-probabilities for two candidate tokens, 'cat' and 'dog', which token will the model select?