Multiple Choice

Consider a scenario where for a given input (\mathbf{x}), there are only two possible outputs, (\mathbf{y}_1) and (\mathbf{y}2). A reference model assigns probabilities (\pi{\text{ref}}(\mathbf{y}1|\mathbf{x}) = 0.6) and (\pi{\text{ref}}(\mathbf{y}_2|\mathbf{x}) = 0.4). A reward function gives scores (r(\mathbf{x}, \mathbf{y}1) = 2) and (r(\mathbf{x}, \mathbf{y}2) = 1). Assuming the scaling factor (\beta) is 1, what is the value of the normalization factor (Z(\mathbf{x})), which is calculated as (Z(\mathbf{x}) = \sum{\mathbf{y}} \pi{\text{ref}}(\mathbf{y}|\mathbf{x}) \exp(r(\mathbf{x}, \mathbf{y})))?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science