1Cademy - In a text generation process that combines outputs from `K` different prompts, the next token `ŷ_j` is chosen according to the following decision rule: <br><br>`ŷ_j = argmax(y_j) Σ(k=1 to K) log Pr(y_j | x_k, ŷ_1, ..., ŷ_{j-1})`<br><br>What is the primary analytical reason for summing the *log-probabilities* (`log Pr`) of a candidate token across all prompts, rather than multiplying the raw probabilities (`Pr`)?

Learn Before

Formula for Token-Level Model Averaging in Prompt Ensembling

Multiple Choice

In a text generation process that combines outputs from K different prompts, the next token ŷ_j is chosen according to the following decision rule:

ŷ_j = argmax(y_j) Σ(k=1 to K) log Pr(y_j | x_k, ŷ_1, ..., ŷ_{j-1})

What is the primary analytical reason for summing the log-probabilities (log Pr) of a candidate token across all prompts, rather than multiplying the raw probabilities (Pr)?

Updated 2025-10-03

Contributors are: