Risk of an Output in Minimum Bayes Risk Decoding
In Minimum Bayes Risk (MBR) decoding, the risk associated with selecting a particular output y from a set of candidates Ω is defined as the expected cost over all possible reference outputs y_r. This is calculated by summing the product of the risk function R(y, y_r) and the probability of the reference output Pr(y_r|x) for every candidate in Ω. The formula is:
$\text{Risk}(y) = \mathbb{E}_{y_r \sim \Pr(y_r|\mathbf{x})} R(y, y_r) = \sum_{y_r \in \Omega} R(y, y_r) \cdot \Pr(y_r|\mathbf{x})$

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
Risk Function in Minimum Bayes Risk Decoding
Risk of an Output in Minimum Bayes Risk Decoding
A language model is prompted to solve the math problem 'What is 7 + 8?'. To improve reliability, the model generates five different outputs using a sampling strategy: [15, 14, 15, 15, 16]. A selection process is then used to choose the final answer by identifying the candidate that minimizes the expected disagreement with the other generated candidates. Which output will be selected?
Connecting Self-Consistency to a Formal Framework
Evaluating a Text Generation Strategy
Risk of an Output in Minimum Bayes Risk Decoding
A language model is generating a summary for a document. The ideal, human-written reference summary is
yr: "The study found a significant link between diet and health."The model considers two candidate summaries:
y1: "The study found a significant link between diet and health."y2: "Research showed a connection between food and wellness."
Given the purpose of a risk function,
R(candidate, reference), which of the following statements best describes how it would operate in this scenario?Evaluating a Simple Risk Function
Evaluating a Risk Function for Legal Translation
Learn After
Calculating Expected Risk for a Candidate Output
In a system that selects an output by minimizing an expected cost, the cost for a candidate output
y_Ais calculated by summing the pairwise costs betweeny_Aand every possible reference outputy_r, with each cost weighted by the probability of that reference output. The pairwise cost function is designed to be high for dissimilar outputs and low for similar outputs.Suppose you are calculating the expected cost for candidate
y_A. The probability of a very dissimilar candidatey_Bincreases, while the probability of a very similar candidatey_Cdecreases by an identical amount. All other probabilities and pairwise costs remain constant. What is the most likely effect on the expected cost ofy_A?When calculating the expected cost for a candidate output, a reference output that is very dissimilar to the candidate (resulting in a high pairwise cost) will always have a large impact on the final calculated expected cost, even if that reference output has a very low probability of being correct.