Multiple Choice

A language model is generating a completion for an input x. The model has a base probability distribution, π(y|x), for four potential completions (y). To steer the model's output, a reward function, r(x, y), is applied to create a new unnormalized score for each completion using the formula: Score(y) = π(y|x) * exp(r(x, y)). Given the values below, which completion will have the highest score?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science